Centralizing Logs in an Isolated AWS Account

As a member of the information security team here at Blend, I recently teamed up with the business analytics team to re-architect our log pipeline to increase the security and availability of both the log delivery system and access to the logs themselves. Since the logs provide crucial insight into the production environment for many different teams at Blend, we found ourselves with a list of different requirements to accommodate each team. Although the log pipeline requirements seem fairly standard throughout the industry, we realized that an end-to-end solution that addresses all the requirements had not yet been documented. In this post, I outline the requirements of our log pipeline (with respect to each of the teams that require the logs) along with the strategy we implemented to address each of the requirements.

To follow along with the CloudFormation templates we used to deploy the log pipeline, see this project on Github.

Requirements

1. Historical data analysis (last 30 days)

The information security, infrastructure, and business teams need to perform historical data analysis on the logs over the last 30 days to monitor the system, alert on certain events, and make decisions by examining patterns in past data. We chose AWS Elasticsearch service because it is NoSQL, so it supports logs with different fields (useful since we’re collecting logs from many different services and systems). It’s also easily searchable for events within a specified time period (e.g. all failed logins between midnight and 4am last night).

2. Stream data analysis

In addition to making decisions or examining patterns in past data, the information security, infrastructure, and business teams need to analyze log stream data to monitor the system, alert on certain events, and make decisions continuously as new logs are processed. We chose AWS Kinesis (either Stream or Firehose) because it’s able to handle large streams of data and deliver them to AWS Kinesis Analytics or AWS Lambda Functions.

3. Tail/Grep logs

The development and infrastructure teams need to be able to tail/grep the logs for debugging. We chose AWS CloudWatch Logs because it’s access-controlled by AWS IAM and accessible through the AWS console or command line tool.

4. Archive all logs indefinitely

For forensics, the information security team needs to archive all logs indefinitely so they can go back and reference them if needed. We chose AWS S3 and AWS Glacier because they offer inexpensive storage for large datasets.

Implementation

1. Deploy an AWS Elasticsearch Domain (information security account)

First, we deploy an AWS-managed Elasticsearch Domain. We allow AWS to maintain the underlying EC2 instances and operating systems (including security updates and patches) while we handle tuning the capacity of the cluster for the incoming logs.

2. Deploy a Kinesis Firehose (information security account)

Then, we need a way to ingest the incoming logs into Elasticsearch. AWS provides a service called Kinesis Firehose that allows us to do just that. We deploy an AWS Kinesis Firehose that forwards a stream of incoming logs to both our Elasticsearch Domain and an S3 bucket (GZIP compressed and KMS encrypted), where the data in S3 will expire to Glacier after 180 days.

3. Deploy a Kinesis Stream with a subscribed data transformation Lambda function (information security account)

Next, we need a Kinesis Stream to accept the incoming log streams from all of the other accounts. The number of shards of the Kinesis Stream can be configured to allow more throughput, but we will only use one shard for now. We also include an AWS Lambda Function that forwards all data from the Kinesis Stream to the Kinesis Firehose after unzipping the GZIP compressed logs (default format from CloudWatch Logs).

4. Create a CloudWatch logs destination (information security account)

We also need to be able to accept CloudWatch logs from other AWS accounts to ingest them into the Kinesis Stream. For this, we will create a CloudWatch Logs Destination that forwards incoming logs directly to our Kinesis Stream.

5. Create a CloudWatch logs subscription filter (all accounts)

Next, we need to forward the logs from the AWS CloudWatch Logs group from one AWS account to the one used by information security. The easiest way to forward these logs from one group to another AWS account is to set up a subscription filter on the log group. This allows the user to configure the logs to be forwarded if the given log matches a specific pattern in the filter. For our case, we include a blank string filter since we want to forward all logs that are collected in that log group.

6. Install a log agent on each EC2 instance (all accounts)

The last step is to install an agent on each machine to forward the locally-collected logs to the CloudWatch Logs log group. We use fluentd as our logging agent.

So there you have it: a full end-to-end solution for centralizing logs in an isolated AWS account to achieve historical data analysis, stream analysis, and data archiving of the logs while retaining the ability to grep/tail the logs in each respective account access-controlled through AWS IAM. For CloudFormation templates to launch this solution, see our log-pipeline project on Github.


Interested in working on problems like this? Visit our careers page!