Elastic MapReduce (EMR)

Elastic MapReduce (EMR)

LocalStack Pro allows running data analytics workloads locally via the EMR API. EMR utilizes various tools in the Hadoop and Spark ecosystem, and your EMR instance is automatically configured to connect seamlessly to the LocalStack S3 API.

To create a virtual EMR cluster locally from the command line (assuming you have awslocal installed):

$ awslocal emr create-cluster --release-label emr-5.9.0 --instance-groups InstanceGroupType=MASTER,InstanceCount=1,InstanceType=m4.large InstanceGroupType=CORE,InstanceCount=1,InstanceType=m4.large
    "ClusterId": "j-A2KF3EKLAOWRI"

The command above will spin up one more more Docker containers on your local machine that can be used to run analytics workloads using Spark, Hadoop, Pig, and other tools.

Note that you can also specify startup commands using the --steps=... command line argument to the create-cluster command. A simple demo project with more details can be found in this Github repository.

Last modified July 26, 2022: fix some typos (#214) (6ab8502d)