Bedrock

Introduction

Bedrock is a fully managed service provided by Amazon Web Services (AWS) that makes foundation models from various LLM providers accessible via an API.

LocalStack allows you to use the Bedrock APIs to test and develop AI-powered applications in your local environment.

The supported APIs are available on the API coverage section for Bedrock and Bedrock Runtime, which provides information on the extent of Bedrock’s integration with LocalStack.

Getting started

This guide is designed for users new to AWS Bedrock and assumes basic knowledge of the AWS CLI and our awslocal wrapper script.

Start your LocalStack container using your preferred method with or without pre-warming the Bedrock engine. We will demonstrate how to use Bedrock by following these steps:

Listing available foundation models
Invoking a model for inference
Using the conversation API
Using batch processing

Pre-warming the Bedrock engine

The startup of the Bedrock engine can take some time. Per default, we only start it once you send a request to one of the bedrock-runtime APIs. However, if you want to start the engine when localstack starts to avoid long wait times on your first request you can set the flag BEDROCK_PREWARM.

On startup, the DEFAULT_BEDROCK_MODEL is pulled from the Ollama library and loaded into memory. However, you can define an additional list of models in BEDROCK_PULL_MODELS to pull additional models when the Bedrock engine starts up. This way you avoid long wait times when switching between models on demand with requests.

List available foundation models

You can view all available foundation models using the ListFoundationModels API. This will show you which models are available on AWS Bedrock.

Run the following command:

awslocal bedrock list-foundation-models

Invoke a model

You can use the InvokeModel API to send requests to a specific model. In this example, we selected the Llama 3 model to process a simple prompt. However, the actual model will be defined by the DEFAULT_BEDROCK_MODEL environment variable.

Run the following command:

awslocal bedrock-runtime invoke-model \
    --model-id "meta.llama3-8b-instruct-v1:0" \
    --body '{
        "prompt": "<|begin_of_text|><|start_header_id|>user<|end_header_id|>\nSay Hello!\n<|eot_id|>\n<|start_header_id|>assistant<|end_header_id|>",
        "max_gen_len": 2,
        "temperature": 0.9
    }' --cli-binary-format raw-in-base64-out outfile.txt

The output will be available in the outfile.txt.

Use the conversation API

Bedrock provides a higher-level conversation API that makes it easier to maintain context in a chat-like interaction using the Converse API. You can specify both system prompts and user messages.

Run the following command:

awslocal bedrock-runtime converse \
    --model-id "meta.llama3-8b-instruct-v1:0" \
    --messages '[{
        "role": "user",
        "content": [{
            "text": "Say Hello!"
        }]
    }]' \
    --system '[{
        "text": "You'\''re a chatbot that can only say '\''Hello!'\''"
    }]'

Model Invocation Batch Processing

Bedrock offers the feature to handle large batches of model invocation requests defined in S3 buckets using the CreateModelInvocationJob API.

First, you need to create a JSONL file named batch_input.jsonl that contains all your prompts:

{"prompt": "Tell me a quick fact about Vienna.", "max_tokens": 50, "temperature": 0.5}
{"prompt": "Tell me a quick fact about Zurich.", "max_tokens": 50, "temperature": 0.5}
{"prompt": "Tell me a quick fact about Las Vegas.", "max_tokens": 50, "temperature": 0.5}

Then, you need to define buckets for the input as well as the output and upload the file in the input bucket:

awslocal s3 mb s3://in-bucket
awslocal s3 cp batch_input.jsonl s3://in-bucket
awslocal s3 mb s3://out-bucket

Afterwards you can run the invocation job like this:

awslocal bedrock create-model-invocation-job \
  --job-name "my-batch-job" \
  --model-id "mistral.mistral-small-2402-v1:0" \
  --role-arn "arn:aws:iam::123456789012:role/MyBatchInferenceRole" \
  --input-data-config '{"s3InputDataConfig": {"s3Uri": "s3://in-bucket"}}' \
  --output-data-config '{"s3OutputDataConfig": {"s3Uri": "s3://out-bucket"}}'

{
    "jobArn": "arn:aws:bedrock:us-east-1:000000000000:model-invocation-job/12345678"
}

The results will be at the S3 URL s3://out-bucket/12345678/batch_input.jsonl.out

Available models

LocalStack’s Bedrock emulation supports models from the Ollama Models library.

To use a model, retrieve its ID from Ollama and set DEFAULT_BEDROCK_MODEL to that ID. LocalStack will pull the model from Ollama and use it for emulation.

For example, to use the Mistral model, set the environment variable while starting LocalStack:

DEFAULT_BEDROCK_MODEL=mistral localstack start

You can also define models directly in the request, by setting the model-id parameter to ollama.<ollama-model-id>. For example, if you want to access deepseek-r1, you can do it like this:

awslocal bedrock-runtime converse \
    --model-id "ollama.deepseek-r1" \
    --messages '[{
        "role": "user",
        "content": [{
            "text": "Say Hello!"
        }]
    }]'

Troubleshooting

Users of Docker Desktop on macOS or Windows might run into the issue of Bedrock becoming unresponsive after some usage. A common reason for that is insufficient storage or memory space in the Docker Desktop VM. To resolve this issue you can increase those amounts directly in Docker Desktop or clean up unused artifacts with the Docker CLI like this

docker system prune

You could also try to use a model with lower requirements. To achieve that you can search for models in the Ollama Models library with a low parameter count or smaller size.

Limitations

At this point, we have only tested text-based models in LocalStack. Other models available with Ollama might also work, but are not officially supported by the Bedrock implementation.
Currently, GPU models are not supported by the LocalStack Bedrock implementation.

API Coverage

Operation ▲	Implemented ▼	Image	Verified on Kubernetes

Page 1 of 0

API Coverage (Bedrock Runtime)

Operation ▲	Implemented ▼	Image	Verified on Kubernetes

Page 1 of 0

Was this page helpful?