AWS AI Services
Amazon Bedrock: Getting Started with Foundation Models on AWS
Complete guide to getting started with Amazon Bedrock for Australian businesses, covering model selection, API integration, security best practices, and real-world implementation strategies.
CloudPoint Team
Amazon Bedrock makes foundation models from leading AI providers accessible through a fully managed AWS service. For Australian businesses looking to leverage generative AI without managing infrastructure, Bedrock offers a secure, scalable path to building AI-powered applications.
What is Amazon Bedrock?
Amazon Bedrock is a fully managed service that provides access to foundation models (FMs) from top AI providers through a single API. It eliminates the need to manage infrastructure while offering enterprise-grade security, privacy, and compliance.
Key Capabilities
Model Access:
- Claude (Anthropic) - Advanced reasoning and coding
- Titan (Amazon) - Text and embeddings
- Llama 2 (Meta) - Open-source models
- Jurassic-2 (AI21 Labs) - Text generation
- Command (Cohere) - Conversational AI
- Stable Diffusion (Stability AI) - Image generation
Core Features:
- Single API for multiple models
- No infrastructure management
- Pay-per-use pricing
- Built-in security and compliance
- Private model customisation
- Knowledge bases and agents
Why Choose Bedrock?
For Australian Businesses
Data Sovereignty: Keep data within Australian AWS regions (ap-southeast-2 Sydney).
Compliance Ready:
- IRAP assessed
- Industry regulations aligned
- Privacy Act compliant
- SOC 2 certified
Cost Efficient: Pay only for what you use, no upfront commitments.
Quick Implementation: Go from idea to production in weeks, not months.
Use Cases
Customer Service: Intelligent chatbots and support automation
Content Generation: Marketing copy, product descriptions, documentation
Document Analysis: Contract review, data extraction, summarisation
Code Generation: Development assistance, code review, documentation
Data Analysis: Insights extraction, report generation, trend analysis
Getting Started
1. Enable Bedrock Access
Request model access through the AWS Console:
# List available foundation models
aws bedrock list-foundation-models \
--region ap-southeast-2
# Request model access (via Console)
# Navigate to Bedrock → Model access → Manage model access
Model Access Requirements:
- Some models available immediately
- Others require access request
- Approval typically within hours
- Region-specific availability
2. Set Up IAM Permissions
Create IAM policy for Bedrock access:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"bedrock:InvokeModel",
"bedrock:InvokeModelWithResponseStream"
],
"Resource": "arn:aws:bedrock:ap-southeast-2::foundation-model/*"
},
{
"Effect": "Allow",
"Action": [
"bedrock:ListFoundationModels",
"bedrock:GetFoundationModel"
],
"Resource": "*"
}
]
}
3. First API Call
Using AWS SDK for Python (boto3):
import boto3
import json
# Initialize Bedrock client
bedrock = boto3.client(
service_name='bedrock-runtime',
region_name='ap-southeast-2'
)
# Prepare request for Claude
prompt = "Explain Amazon Bedrock in simple terms"
body = json.dumps({
"prompt": f"\n\nHuman: {prompt}\n\nAssistant:",
"max_tokens_to_sample": 500,
"temperature": 0.7,
"top_p": 0.9,
})
# Invoke model
response = bedrock.invoke_model(
modelId='anthropic.claude-v2',
body=body
)
# Parse response
response_body = json.loads(response['body'].read())
print(response_body['completion'])
4. Using TypeScript/JavaScript
import { BedrockRuntimeClient, InvokeModelCommand } from "@aws-sdk/client-bedrock-runtime";
const client = new BedrockRuntimeClient({
region: "ap-southeast-2"
});
async function invokeClaude(prompt: string) {
const input = {
modelId: "anthropic.claude-v2",
contentType: "application/json",
accept: "application/json",
body: JSON.stringify({
prompt: `\n\nHuman: ${prompt}\n\nAssistant:`,
max_tokens_to_sample: 500,
temperature: 0.7,
}),
};
const command = new InvokeModelCommand(input);
const response = await client.send(command);
const responseBody = JSON.parse(
new TextDecoder().decode(response.body)
);
return responseBody.completion;
}
// Usage
const result = await invokeClaude("What are the benefits of cloud computing?");
console.log(result);
Choosing the Right Model
Model Comparison
Claude (Anthropic):
- Best for: Complex reasoning, coding, analysis
- Strengths: Safety, instruction following, long context
- Context window: Up to 200K tokens
- Use when: Quality and safety are critical
Titan (Amazon):
- Best for: Text generation, embeddings
- Strengths: Cost-effective, reliable, AWS-optimised
- Context window: Up to 8K tokens
- Use when: Building embeddings, cost-sensitive workloads
Llama 2 (Meta):
- Best for: Open-source requirements, fine-tuning
- Strengths: Transparent, customisable
- Context window: 4K tokens
- Use when: Need full control and customisation
Command (Cohere):
- Best for: Conversational AI, search
- Strengths: Multilingual, RAG-optimised
- Context window: Up to 128K tokens
- Use when: Building search or chat applications
Selection Criteria
Consider:
- Task complexity
- Response quality requirements
- Context length needed
- Cost constraints
- Latency requirements
- Compliance needs
Streaming Responses
For better user experience with long responses:
import boto3
import json
bedrock = boto3.client(
service_name='bedrock-runtime',
region_name='ap-southeast-2'
)
body = json.dumps({
"prompt": "\n\nHuman: Write a detailed explanation of quantum computing\n\nAssistant:",
"max_tokens_to_sample": 2000,
"temperature": 0.7,
})
response = bedrock.invoke_model_with_response_stream(
modelId='anthropic.claude-v2',
body=body
)
# Process streaming response
stream = response['body']
for event in stream:
chunk = event.get('chunk')
if chunk:
chunk_data = json.loads(chunk['bytes'])
if 'completion' in chunk_data:
print(chunk_data['completion'], end='', flush=True)
Error Handling
Implement robust error handling:
import boto3
from botocore.exceptions import ClientError
import time
def invoke_bedrock_with_retry(prompt, max_retries=3):
bedrock = boto3.client(
service_name='bedrock-runtime',
region_name='ap-southeast-2'
)
for attempt in range(max_retries):
try:
response = bedrock.invoke_model(
modelId='anthropic.claude-v2',
body=json.dumps({
"prompt": f"\n\nHuman: {prompt}\n\nAssistant:",
"max_tokens_to_sample": 500,
})
)
return json.loads(response['body'].read())
except ClientError as e:
error_code = e.response['Error']['Code']
if error_code == 'ThrottlingException':
# Rate limit hit, wait and retry
wait_time = 2 ** attempt
time.sleep(wait_time)
continue
elif error_code == 'ModelTimeoutException':
# Model timeout, retry with shorter input
print("Model timeout, retrying...")
continue
elif error_code == 'ValidationException':
# Invalid input, don't retry
raise ValueError(f"Invalid input: {e}")
else:
# Unknown error
raise
raise Exception("Max retries exceeded")
Cost Optimisation
Pricing Model
Bedrock charges per token (input and output):
Claude:
- Input: ~$0.01 per 1K tokens
- Output: ~$0.03 per 1K tokens
Titan:
- Input: ~$0.0003 per 1K tokens
- Output: ~$0.0004 per 1K tokens
Cost Reduction Strategies
1. Optimise Prompts:
# Inefficient - verbose prompt
prompt = """
Please analyse the following text and provide a comprehensive summary.
The summary should be detailed and include all key points.
Text: [long text here]
"""
# Efficient - concise prompt
prompt = "Summarise: [long text here]"
2. Cache System Prompts: For repeated interactions, cache system instructions:
# Use prompt caching for repeated system instructions
system_prompt = "You are a helpful AWS expert..." # Cached
user_query = "How do I secure S3 buckets?" # New each time
3. Right-size Responses:
# Request only what you need
body = json.dumps({
"prompt": prompt,
"max_tokens_to_sample": 200, # Not 2000 if you only need brief response
"temperature": 0.7,
})
4. Use Appropriate Models:
- Titan for simple tasks
- Claude for complex reasoning
- Don’t use premium models for basic tasks
Security Best Practices
1. Data Privacy
Keep Data in Australia:
# Always specify Sydney region
bedrock = boto3.client(
service_name='bedrock-runtime',
region_name='ap-southeast-2' # Sydney
)
Encryption:
- Data encrypted in transit (TLS)
- Data encrypted at rest
- No data retention by model providers
- Your data not used for model training
2. Access Control
Use least privilege IAM policies:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": "bedrock:InvokeModel",
"Resource": [
"arn:aws:bedrock:ap-southeast-2::foundation-model/anthropic.claude-v2"
],
"Condition": {
"StringEquals": {
"aws:RequestedRegion": "ap-southeast-2"
}
}
}
]
}
3. Input Validation
Sanitise inputs before sending to models:
import re
def sanitise_input(user_input: str) -> str:
# Remove potential injection attempts
cleaned = re.sub(r'[\x00-\x1f\x7f-\x9f]', '', user_input)
# Limit length
max_length = 10000
if len(cleaned) > max_length:
cleaned = cleaned[:max_length]
return cleaned
# Use sanitised input
safe_input = sanitise_input(user_input)
prompt = f"\n\nHuman: {safe_input}\n\nAssistant:"
4. Content Filtering
Implement guardrails for content:
def check_content_safety(text: str) -> bool:
"""Check if content meets safety requirements"""
# Check for sensitive data patterns
sensitive_patterns = [
r'\b\d{3}-\d{2}-\d{4}\b', # SSN pattern
r'\b\d{16}\b', # Credit card pattern
r'password|secret|api[_-]?key', # Credentials
]
for pattern in sensitive_patterns:
if re.search(pattern, text, re.IGNORECASE):
return False
return True
# Validate before sending
if check_content_safety(user_input):
response = invoke_bedrock(user_input)
else:
raise ValueError("Input contains sensitive data")
Monitoring and Logging
CloudWatch Integration
import boto3
from datetime import datetime
cloudwatch = boto3.client('cloudwatch', region_name='ap-southeast-2')
def log_bedrock_invocation(model_id, tokens_used, latency_ms, success):
"""Log Bedrock usage metrics to CloudWatch"""
cloudwatch.put_metric_data(
Namespace='BedrockApp',
MetricData=[
{
'MetricName': 'TokensUsed',
'Value': tokens_used,
'Unit': 'Count',
'Timestamp': datetime.utcnow(),
'Dimensions': [
{'Name': 'ModelId', 'Value': model_id}
]
},
{
'MetricName': 'Latency',
'Value': latency_ms,
'Unit': 'Milliseconds',
'Dimensions': [
{'Name': 'ModelId', 'Value': model_id}
]
},
{
'MetricName': 'Invocations',
'Value': 1,
'Unit': 'Count',
'Dimensions': [
{'Name': 'Success', 'Value': str(success)}
]
}
]
)
Next Steps
Production Readiness
1. Implement Rate Limiting: Protect against runaway costs:
from datetime import datetime, timedelta
from collections import defaultdict
class RateLimiter:
def __init__(self, max_requests_per_minute=60):
self.max_requests = max_requests_per_minute
self.requests = defaultdict(list)
def allow_request(self, user_id: str) -> bool:
now = datetime.utcnow()
minute_ago = now - timedelta(minutes=1)
# Remove old requests
self.requests[user_id] = [
ts for ts in self.requests[user_id]
if ts > minute_ago
]
# Check limit
if len(self.requests[user_id]) >= self.max_requests:
return False
self.requests[user_id].append(now)
return True
2. Build Prompt Templates: Standardise common interactions:
PROMPT_TEMPLATES = {
'summarise': """Summarise the following text concisely:
{text}
Summary:""",
'extract': """Extract {entity_type} from the following text:
{text}
{entity_type}:""",
'analyse': """Analyse the following {content_type} and provide insights:
{content}
Analysis:"""
}
def build_prompt(template_name: str, **kwargs) -> str:
template = PROMPT_TEMPLATES[template_name]
return template.format(**kwargs)
3. Add Caching Layer: Cache common requests:
import hashlib
import json
from functools import lru_cache
@lru_cache(maxsize=1000)
def get_cached_response(prompt_hash: str):
# Cache implementation
pass
def invoke_with_cache(prompt: str):
# Create hash of prompt
prompt_hash = hashlib.sha256(prompt.encode()).hexdigest()
# Check cache
cached = get_cached_response(prompt_hash)
if cached:
return cached
# Invoke model
response = invoke_bedrock(prompt)
# Cache result
return response
Common Pitfalls
1. Not Handling Rate Limits
Always implement exponential backoff and retry logic.
2. Ignoring Token Costs
Monitor token usage closely, especially with large context windows.
3. Hardcoding Model IDs
Use configuration for model selection to enable easy switching:
import os
MODEL_CONFIG = {
'default': os.getenv('BEDROCK_MODEL_ID', 'anthropic.claude-v2'),
'fast': 'amazon.titan-text-express-v1',
'quality': 'anthropic.claude-v2'
}
4. Insufficient Error Handling
Handle all error types appropriately, especially throttling.
5. Missing Input Validation
Always validate and sanitise user inputs.
Conclusion
Amazon Bedrock provides Australian businesses with secure, compliant access to leading foundation models. By following these best practices for security, cost optimisation, and error handling, you can build production-ready generative AI applications quickly.
CloudPoint specialises in implementing Amazon Bedrock solutions for Australian businesses and regulated industries. We can help you select the right models, implement security controls, and build cost-effective AI applications that meet local compliance requirements.
Contact us for a Bedrock implementation consultation and accelerate your AI journey.
Ready to Get Started with Amazon Bedrock?
CloudPoint helps Australian businesses implement Amazon Bedrock solutions that solve real business problems—securely and practically. Get in touch to explore AI opportunities.