Data Firehose

Get started with Data Firehose on LocalStack

Introduction

Data Firehose is a service provided by AWS that allows you to extract, transform and load streaming data into various destinations, such as Amazon S3, Amazon Redshift, and Elasticsearch. With Data Firehose, you can ingest and deliver real-time data from different sources as it automates data delivery, handles buffering and compression, and scales according to the data volume.

LocalStack allows you to use the Data Firehose APIs in your local environment to load and transform real-time data. The supported APIs are available on our API coverage page, which provides information on the extent of Data Firehose’s integration with LocalStack.

Getting started

This guide is designed for users new to Data Firehouse and assumes basic knowledge of the AWS CLI and our awslocal wrapper script.

Start your LocalStack container using your preferred method. We will demonstrate how to use Firehose to load Kinesis data into Elasticsearch with S3 Backup with the AWS CLI.

Create an Elasticsearch domain

You can create an Elasticsearch domain using the create-elasticsearch-domain command. Execute the following command to create a domain named es-local:

$ awslocal es create-elasticsearch-domain --domain-name es-local

Save the value of the Endpoint field from the response, as it will be required further down to confirm the setup.

Create the source Kinensis stream

Now let us create our target S3 bucket and our source Kinesis stream:

Before creating the stream, we need to create an S3 bucket to store our backup data. You can do this using the mb command:

$ awslocal s3 mb s3://kinesis-activity-backup-local

You can now use the CreateStream API to create a Kinesis stream named kinesis-es-local-stream with two shards:

$ awslocal kinesis create-stream \
  --stream-name kinesis-es-local-stream \
  --shard-count 2

Create a Firehouse delivery stream

You can now create the Firehose delivery stream. In this configuration, Elasticsearch serves as the destination, while S3 serves as the repository for our AllDocuments backup. Within the kinesis-stream-source-configuration, it is required to specify the ARN of our Kinesis stream and the role that will allow you the access to the stream.

The elasticsearch-destination-configuration sets vital parameters, which includes the access role, DomainARN of the Elasticsearch domain where you wish to publish, and the settings including the IndexName and TypeName for the Elasticsearch setup. Additionally to backup all documents to S3, the S3BackupMode parameter is set to AllDocuments, which is accompanied by S3Configuration.

You can use the CreateDeliveryStream API to create a Firehose delivery stream named activity-to-elasticsearch-local:

$ awslocal firehose create-delivery-stream \
  --delivery-stream-name activity-to-elasticsearch-local \
  --delivery-stream-type KinesisStreamAsSource \
  --kinesis-stream-source-configuration "KinesisStreamARN=arn:aws:kinesis:us-east-1:000000000000:stream/kinesis-es-local-stream,RoleARN=arn:aws:iam::000000000000:role/Firehose-Reader-Role" \
  --elasticsearch-destination-configuration "RoleARN=arn:aws:iam::000000000000:role/Firehose-Reader-Role,DomainARN=arn:aws:es:us-east-1:000000000000:domain/es-local,IndexName=activity,TypeName=activity,S3BackupMode=AllDocuments,S3Configuration={RoleARN=arn:aws:iam::000000000000:role/Firehose-Reader-Role,BucketARN=arn:aws:s3:::kinesis-activity-backup-local}"

On successful execution, the command will return the DeliveryStreamARN of the created delivery stream:

{
    "DeliveryStreamARN": "arn:aws:firehose:us-east-1:000000000000:deliverystream/activity-to-elasticsearch-local"
}

Testing the setup

Before testing the integration, it’s necessary to confirm if the local Elasticsearch cluster is up. You can use the describe-elasticsearch-domain command to check the status of the Elasticsearch cluster. Run the following command:

$ awslocal es describe-elasticsearch-domain \
  --domain-name es-local | jq ".DomainStatus.Processing"

Once the command returns false, you can move forward with data ingestion. The data can be added to the source Kinesis stream or directly to the Firehose delivery stream.

You can add data to the Kinesis stream using the PutRecord API. The following command adds a record to the stream:

$ awslocal kinesis put-record \
  --stream-name kinesis-es-local-stream \
  --data '{ "target": "barry" }' \
  --partition-key partition

You can use the PutRecord API to add data to the Firehose delivery stream. The following command adds a record to the stream:

$ awslocal firehose put-record \
  --delivery-stream-name activity-to-elasticsearch-local \
  --record '{ "Data": "eyJ0YXJnZXQiOiAiSGVsbG8gd29ybGQifQ==" }'

To review the entries in Elasticsearch, you can employ curl for simplicity. Remember to replace the URL with the Endpoint field from the initial create-elasticsearch-domain operation.

$ curl -s http://es-local.us-east-1.es.localhost.localstack.cloud:443/activity/_search | jq '.hits.hits'

You will get an output similar to the following:

[
  {
    "_index": "activity",
    "_type": "activity",
    "_id": "f38e2c49-d101-46aa-9ce2-0d2ea8fcd133",
    "_score": 1,
    "_source": {
      "target": "Hello world"
    }
  },
  {
    "_index": "activity",
    "_type": "activity",
    "_id": "d2f1c125-b3b0-4c7c-ba90-8acf4075a682",
    "_score": 1,
    "_source": {
      "target": "barry"
    }
  }
]

If you receive a comparable output, your Firehose delivery stream setup is accurate! Additionally, take a look at the designated S3 bucket to ensure the backup process is functioning correctly.

Examples

The following code snippets and sample applications provide practical examples of how to use Data Firehose in LocalStack for various use cases: