Bedrock
3 minute read
Introduction
Bedrock is a fully managed service provided by Amazon Web Services (AWS) that makes foundation models from various LLM providers accessible via an API. LocalStack allows you to use the Bedrock APIs to test and develop AI-powered applications in your local environment. The supported APIs are available on our API Coverage Page, which provides information on the extent of Bedrock’s integration with LocalStack.
Getting started
This guide is designed for users new to AWS Bedrock and assumes basic knowledge of the AWS CLI and our awslocal
wrapper script.
Start your LocalStack container using your preferred method with or without pre-warming the Bedrock engine. We will demonstrate how to use Bedrock by following these steps:
- Listing available foundation models
- Invoking a model for inference
- Using the conversation API
- Using batch processing
Pre-warming the Bedrock engine
The startup of the Bedrock engine can take some time.
Per default, we only start it once you send a request to one of the bedrock-runtime
APIs.
However, if you want to start the engine when localstack starts to avoid long wait times on your first request you can set the flag BEDROCK_PREWARM
.
List available foundation models
You can view all available foundation models using the ListFoundationModels
API.
This will show you which models are available on AWS Bedrock.
Note
The actual model that will be used for emulation will differ from the ones defined in this list. You can define the used model withDEFAULT_BEDROCK_MODEL
Run the following command:
$ awslocal bedrock list-foundation-models
Invoke a model
You can use the InvokeModel
API to send requests to a specific model.
In this example, we selected the Llama 3 model to process a simple prompt.
However, the actual model will be defined by the DEFAULT_BEDROCK_MODEL
environment variable.
Run the following command:
$ awslocal bedrock-runtime invoke-model \
--model-id "meta.llama3-8b-instruct-v1:0" \
--body '{
"prompt": "<|begin_of_text|><|start_header_id|>user<|end_header_id|>\nSay Hello!\n<|eot_id|>\n<|start_header_id|>assistant<|end_header_id|>",
"max_gen_len": 2,
"temperature": 0.9
}' --cli-binary-format raw-in-base64-out outfile.txt
The output will be available in the outfile.txt
.
Use the conversation API
Bedrock provides a higher-level conversation API that makes it easier to maintain context in a chat-like interaction using the Converse
API.
You can specify both system prompts and user messages.
Run the following command:
$ awslocal bedrock-runtime converse \
--model-id "meta.llama3-8b-instruct-v1:0" \
--messages '[{
"role": "user",
"content": [{
"text": "Say Hello!"
}]
}]' \
--system '[{
"text": "You'\''re a chatbot that can only say '\''Hello!'\''"
}]'
Model Invocation Batch Processing
Bedrock offers the feature to handle large batches of model invocation requests defined in S3 buckets using the CreateModelInvocationJob
API.
First, you need to create a JSONL
file that contains all your prompts:
$ cat batch_input.jsonl
{"prompt": "Tell me a quick fact about Vienna.", "max_tokens": 50, "temperature": 0.5}
{"prompt": "Tell me a quick fact about Zurich.", "max_tokens": 50, "temperature": 0.5}
{"prompt": "Tell me a quick fact about Las Vegas.", "max_tokens": 50, "temperature": 0.5}
Then, you need to define buckets for the input as well as the output and upload the file in the input bucket:
$ awslocal s3 mb s3://in-bucket
make_bucket: in-bucket
$ awslocal s3 cp batch_input.jsonl s3://in-bucket
upload: ./batch_input.jsonl to s3://in-bucket/batch_input.jsonl
$ awslocal s3 mb s3://out-bucket
make_bucket: out-bucket
Afterwards you can run the invocation job like this:
$ awslocal bedrock create-model-invocation-job \
--job-name "my-batch-job" \
--model-id "mistral.mistral-small-2402-v1:0" \
--role-arn "arn:aws:iam::123456789012:role/MyBatchInferenceRole" \
--input-data-config '{"s3InputDataConfig": {"s3Uri": "s3://in-bucket"}}' \
--output-data-config '{"s3OutputDataConfig": {"s3Uri": "s3://out-bucket"}}'
{
"jobArn": "arn:aws:bedrock:us-east-1:000000000000:model-invocation-job/12345678"
}
The results will be at the S3 URL s3://out-bucket/12345678/batch_input.jsonl.out
Limitations
- At this point, we have only tested text-based models in LocalStack. Other models available with Ollama might also work, but are not officially supported by the Bedrock implementation.
- Currently, GPU models are not supported by the LocalStack Bedrock implementation.