Bedrock (boto3) SDK
Pass-through endpoints for Bedrock - call provider-specific endpoint, in native format (no translation).
Feature | Supported | Notes |
---|---|---|
Cost Tracking | โ | For /invoke and /converse endpoints |
Load Balancing | โ | You can load balance /invoke , /converse routes across multiple deployments |
End-user Tracking | โ | Tell us if you need this |
Streaming | โ |
Just replace https://bedrock-runtime.{aws_region_name}.amazonaws.com
with LITELLM_PROXY_BASE_URL/bedrock
๐
Overviewโ
LiteLLM supports two ways to call Bedrock endpoints:
1. Using config.yaml (Recommended for model endpoints)โ
Define your Bedrock models in config.yaml
and reference them by name. The proxy handles authentication and routing.
Use for: /converse
, /converse-stream
, /invoke
, /invoke-with-response-stream
model_list:
- model_name: my-bedrock-model
litellm_params:
model: bedrock/us.anthropic.claude-3-5-sonnet-20240620-v1:0
aws_region_name: us-west-2
custom_llm_provider: bedrock
curl -X POST 'http://0.0.0.0:4000/bedrock/model/my-bedrock-model/converse' \
-H 'Authorization: Bearer sk-1234' \
-H 'Content-Type: application/json' \
-d '{"messages": [{"role": "user", "content": [{"text": "Hello"}]}]}'
2. Direct passthrough (For non-model endpoints)โ
Set AWS credentials via environment variables and call Bedrock endpoints directly.
Use for: Guardrails, Knowledge Bases, Agents, and other non-model endpoints
export AWS_ACCESS_KEY_ID=""
export AWS_SECRET_ACCESS_KEY=""
export AWS_REGION_NAME="us-west-2"
curl "http://0.0.0.0:4000/bedrock/guardrail/my-guardrail-id/version/1/apply" \
-H 'Authorization: Bearer sk-1234' \
-H 'Content-Type: application/json' \
-d '{"contents": [{"text": {"text": "Hello"}}], "source": "INPUT"}'
Supports ALL Bedrock Endpoints (including streaming).
Quick Startโ
Let's call the Bedrock /converse
endpoint
- Create a
config.yaml
file with your Bedrock model
model_list:
- model_name: my-bedrock-model
litellm_params:
model: bedrock/us.anthropic.claude-3-5-sonnet-20240620-v1:0
aws_region_name: us-west-2
custom_llm_provider: bedrock
Set your AWS credentials:
export AWS_ACCESS_KEY_ID="" # Access key
export AWS_SECRET_ACCESS_KEY="" # Secret access key
- Start LiteLLM Proxy
litellm --config config.yaml
# RUNNING on http://0.0.0.0:4000
- Test it!
Let's call the Bedrock converse endpoint using the model name from config:
curl -X POST 'http://0.0.0.0:4000/bedrock/model/my-bedrock-model/converse' \
-H 'Authorization: Bearer sk-1234' \
-H 'Content-Type: application/json' \
-d '{
"messages": [
{
"role": "user",
"content": [{"text": "Hello, how are you?"}]
}
],
"inferenceConfig": {
"maxTokens": 100
}
}'
Setup with config.yamlโ
Use config.yaml to define Bedrock models and use them via passthrough endpoints.
1. Define models in config.yamlโ
model_list:
- model_name: my-claude-model
litellm_params:
model: bedrock/us.anthropic.claude-3-5-sonnet-20240620-v1:0
aws_region_name: us-west-2
custom_llm_provider: bedrock
- model_name: my-cohere-model
litellm_params:
model: bedrock/cohere.command-r-v1:0
aws_region_name: us-east-1
custom_llm_provider: bedrock
2. Start proxy with configโ
litellm --config config.yaml
# RUNNING on http://0.0.0.0:4000
3. Call Bedrock Converse endpointโ
Use the model_name
from config in the URL path:
curl -X POST 'http://0.0.0.0:4000/bedrock/model/my-claude-model/converse' \
-H 'Authorization: Bearer sk-1234' \
-H 'Content-Type: application/json' \
-d '{
"messages": [
{
"role": "user",
"content": [{"text": "Hello, how are you?"}]
}
],
"inferenceConfig": {
"temperature": 0.5,
"maxTokens": 100
}
}'
4. Call Bedrock Converse Stream endpointโ
For streaming responses, use the /converse-stream
endpoint:
curl -X POST 'http://0.0.0.0:4000/bedrock/model/my-claude-model/converse-stream' \
-H 'Authorization: Bearer sk-1234' \
-H 'Content-Type: application/json' \
-d '{
"messages": [
{
"role": "user",
"content": [{"text": "Tell me a short story"}]
}
],
"inferenceConfig": {
"temperature": 0.7,
"maxTokens": 200
}
}'
Supported Bedrock Endpoints with config.yamlโ
When using models from config.yaml, you can call any Bedrock endpoint:
Endpoint | Description | Example |
---|---|---|
/model/{model_name}/converse | Converse API | http://0.0.0.0:4000/bedrock/model/my-claude-model/converse |
/model/{model_name}/converse-stream | Streaming Converse | http://0.0.0.0:4000/bedrock/model/my-claude-model/converse-stream |
/model/{model_name}/invoke | Legacy Invoke API | http://0.0.0.0:4000/bedrock/model/my-claude-model/invoke |
/model/{model_name}/invoke-with-response-stream | Legacy Streaming | http://0.0.0.0:4000/bedrock/model/my-claude-model/invoke-with-response-stream |
The proxy automatically resolves the model_name
to the actual Bedrock model ID and region configured in your config.yaml
.
Load Balancing Across Multiple Deploymentsโ
Define multiple Bedrock deployments with the same model_name
to enable automatic load balancing.
1. Define multiple deployments in config.yamlโ
model_list:
# First deployment - us-west-2
- model_name: my-claude-model
litellm_params:
model: bedrock/us.anthropic.claude-3-5-sonnet-20240620-v1:0
aws_region_name: us-west-2
custom_llm_provider: bedrock
# Second deployment - us-east-1 (load balanced)
- model_name: my-claude-model
litellm_params:
model: bedrock/us.anthropic.claude-3-5-sonnet-20240620-v1:0
aws_region_name: us-east-1
custom_llm_provider: bedrock
2. Start proxy with configโ
litellm --config config.yaml
# RUNNING on http://0.0.0.0:4000
3. Call the endpoint - requests are automatically load balancedโ
curl -X POST 'http://0.0.0.0:4000/bedrock/model/my-claude-model/invoke' \
-H 'Authorization: Bearer sk-1234' \
-H 'Content-Type: application/json' \
-d '{
"max_tokens": 100,
"messages": [
{
"role": "user",
"content": "Hello, how are you?"
}
],
"anthropic_version": "bedrock-2023-05-31"
}'
The proxy will automatically distribute requests across both us-west-2
and us-east-1
deployments. This works for all Bedrock endpoints: /invoke
, /invoke-with-response-stream
, /converse
, and /converse-stream
.
Using boto3 SDK with load balancingโ
You can also call the load-balanced endpoint using the boto3 SDK:
import boto3
import json
import os
# Set dummy AWS credentials (required by boto3, but not used by LiteLLM proxy)
os.environ['AWS_ACCESS_KEY_ID'] = 'dummy'
os.environ['AWS_SECRET_ACCESS_KEY'] = 'dummy'
os.environ['AWS_BEARER_TOKEN_BEDROCK'] = "sk-1234" # your litellm proxy api key
# Point boto3 to the LiteLLM proxy
bedrock_runtime = boto3.client(
service_name='bedrock-runtime',
region_name='us-west-2',
endpoint_url='http://0.0.0.0:4000/bedrock'
)
# Call the load-balanced model
response = bedrock_runtime.invoke_model(
modelId='my-claude-model', # Your model_name from config.yaml
contentType='application/json',
accept='application/json',
body=json.dumps({
"max_tokens": 100,
"messages": [
{
"role": "user",
"content": "Hello, how are you?"
}
],
"anthropic_version": "bedrock-2023-05-31"
})
)
# Parse response
response_body = json.loads(response['body'].read())
print(response_body['content'][0]['text'])
The proxy will automatically load balance your boto3 requests across all configured deployments.
Examplesโ
Anything after http://0.0.0.0:4000/bedrock
is treated as a provider-specific route, and handled accordingly.
Key Changes:
Original Endpoint | Replace With |
---|---|
https://bedrock-runtime.{aws_region_name}.amazonaws.com | http://0.0.0.0:4000/bedrock (LITELLM_PROXY_BASE_URL="http://0.0.0.0:4000") |
AWS4-HMAC-SHA256.. | Bearer anything (use Bearer LITELLM_VIRTUAL_KEY if Virtual Keys are setup on proxy) |
Example 1: Converse APIโ
LiteLLM Proxy Callโ
curl -X POST 'http://0.0.0.0:4000/bedrock/model/cohere.command-r-v1:0/converse' \
-H 'Authorization: Bearer sk-anything' \
-H 'Content-Type: application/json' \
-d '{
"messages": [
{"role": "user",
"content": [{"text": "Hello"}]
}
]
}'
Direct Bedrock API Callโ
curl -X POST 'https://bedrock-runtime.us-west-2.amazonaws.com/model/cohere.command-r-v1:0/converse' \
-H 'Authorization: AWS4-HMAC-SHA256..' \
-H 'Content-Type: application/json' \
-d '{
"messages": [
{"role": "user",
"content": [{"text": "Hello"}]
}
]
}'
Example 2: Apply Guardrailโ
Setup: Set AWS credentials for direct passthrough
export AWS_ACCESS_KEY_ID="your-access-key"
export AWS_SECRET_ACCESS_KEY="your-secret-key"
export AWS_REGION_NAME="us-west-2"
Start proxy:
litellm
# RUNNING on http://0.0.0.0:4000
LiteLLM Proxy Callโ
curl "http://0.0.0.0:4000/bedrock/guardrail/guardrailIdentifier/version/guardrailVersion/apply" \
-H 'Authorization: Bearer sk-anything' \
-H 'Content-Type: application/json' \
-X POST \
-d '{
"contents": [{"text": {"text": "Hello world"}}],
"source": "INPUT"
}'
Direct Bedrock API Callโ
curl "https://bedrock-runtime.us-west-2.amazonaws.com/guardrail/guardrailIdentifier/version/guardrailVersion/apply" \
-H 'Authorization: AWS4-HMAC-SHA256..' \
-H 'Content-Type: application/json' \
-X POST \
-d '{
"contents": [{"text": {"text": "Hello world"}}],
"source": "INPUT"
}'
Example 3: Query Knowledge Baseโ
Setup: Set AWS credentials for direct passthrough
export AWS_ACCESS_KEY_ID="your-access-key"
export AWS_SECRET_ACCESS_KEY="your-secret-key"
export AWS_REGION_NAME="us-west-2"
Start proxy:
litellm
# RUNNING on http://0.0.0.0:4000
LiteLLM Proxy Callโ
curl -X POST "http://0.0.0.0:4000/bedrock/knowledgebases/{knowledgeBaseId}/retrieve" \
-H 'Authorization: Bearer sk-anything' \
-H 'Content-Type: application/json' \
-d '{
"nextToken": "string",
"retrievalConfiguration": {
"vectorSearchConfiguration": {
"filter": { ... },
"numberOfResults": number,
"overrideSearchType": "string"
}
},
"retrievalQuery": {
"text": "string"
}
}'
Direct Bedrock API Callโ
curl -X POST "https://bedrock-agent-runtime.us-west-2.amazonaws.com/knowledgebases/{knowledgeBaseId}/retrieve" \
-H 'Authorization: AWS4-HMAC-SHA256..' \
-H 'Content-Type: application/json' \
-d '{
"nextToken": "string",
"retrievalConfiguration": {
"vectorSearchConfiguration": {
"filter": { ... },
"numberOfResults": number,
"overrideSearchType": "string"
}
},
"retrievalQuery": {
"text": "string"
}
}'
Advanced - Use with Virtual Keysโ
Pre-requisites
Use this, to avoid giving developers the raw AWS Keys, but still letting them use AWS Bedrock endpoints.
Usageโ
- Setup environment
export DATABASE_URL=""
export LITELLM_MASTER_KEY=""
export AWS_ACCESS_KEY_ID="" # Access key
export AWS_SECRET_ACCESS_KEY="" # Secret access key
export AWS_REGION_NAME="" # us-east-1, us-east-2, us-west-1, us-west-2
litellm
# RUNNING on http://0.0.0.0:4000
- Generate virtual key
curl -X POST 'http://0.0.0.0:4000/key/generate' \
-H 'Authorization: Bearer sk-1234' \
-H 'Content-Type: application/json' \
-d '{}'
Expected Response
{
...
"key": "sk-1234ewknldferwedojwojw"
}
- Test it!
curl -X POST 'http://0.0.0.0:4000/bedrock/model/cohere.command-r-v1:0/converse' \
-H 'Authorization: Bearer sk-1234ewknldferwedojwojw' \
-H 'Content-Type: application/json' \
-d '{
"messages": [
{"role": "user",
"content": [{"text": "Hello"}]
}
]
}'
Advanced - Bedrock Agentsโ
Call Bedrock Agents via LiteLLM proxy
Setup: Set AWS credentials on your LiteLLM proxy server
export AWS_ACCESS_KEY_ID="your-access-key"
export AWS_SECRET_ACCESS_KEY="your-secret-key"
export AWS_REGION_NAME="us-west-2"
Start proxy:
litellm
# RUNNING on http://0.0.0.0:4000
Usage from Python:
import os
import boto3
# Set dummy AWS credentials (required by boto3, but not used by LiteLLM proxy)
os.environ["AWS_ACCESS_KEY_ID"] = "dummy"
os.environ["AWS_SECRET_ACCESS_KEY"] = "dummy"
os.environ["AWS_BEARER_TOKEN_BEDROCK"] = "sk-1234" # your litellm proxy api key
# Create the client
runtime_client = boto3.client(
service_name="bedrock-agent-runtime",
region_name="us-west-2",
endpoint_url="http://0.0.0.0:4000/bedrock"
)
response = runtime_client.invoke_agent(
agentId="L1RT58GYRW",
agentAliasId="MFPSBCXYTW",
sessionId="12345",
inputText="Who do you know?"
)
completion = ""
for event in response.get("completion"):
chunk = event["chunk"]
completion += chunk["bytes"].decode()
print(completion)