Elastic MapReduce (EMR)
Introduction
Section titled “Introduction”Amazon Elastic MapReduce (EMR) is a fully managed big data processing service that allows developers to effortlessly create, deploy, and manage big data applications. EMR supports various big data processing frameworks, including Hadoop MapReduce, Apache Spark, Apache Hive, and Apache Pig. Developers can leverage these frameworks and their rich ecosystem of tools and libraries to perform complex data transformations, machine learning tasks, and real-time data processing.
LocalStack supports EMR and allows developers to run data analytics workloads locally. EMR utilizes various tools in the Hadoop and Spark ecosystem, and your EMR instance is automatically configured to connect seamlessly to LocalStack’s S3 API. LocalStack also supports EMR Serverless to create applications and job runs, to run your Spark/PySpark jobs locally.
The supported APIs are available on the API coverage section for EMR and EMR Serverless, which provides information on the extent of EMR’s integration with LocalStack.
Getting started
Section titled “Getting started”This guide is designed for users new to EMR and assumes basic knowledge of the AWS CLI and our awslocal
wrapper script.
Start your LocalStack container using your preferred method. We will create a virtual EMR cluster using the AWS CLI. To create an EMR cluster, run the following command:
awslocal emr create-cluster \ --release-label emr-5.9.0 \ --instance-groups InstanceGroupType=MASTER,InstanceCount=1,InstanceType=m4.large InstanceGroupType=CORE,InstanceCount=1,InstanceType=m4.large
You will see a response similar to the following:
{ "ClusterId": "j-A2KF3EKLAOWRI"}
You can also specify startup commands using the --steps=...
command line argument to the CreateCluster
API.
Examples
Section titled “Examples”The following code snippets and sample applications provide practical examples of how to use EMR in LocalStack for various use cases:
API Coverage
Section titled “API Coverage”Operation ▲ | Implemented | Image |
---|
API Coverage (EMR Serverless)
Section titled “API Coverage (EMR Serverless)”Operation ▲ | Implemented | Image |
---|