Skip to main content

AWS AI Services

Amazon Bedrock: Getting Started with Foundation Models on AWS

Complete guide to getting started with Amazon Bedrock for Australian businesses, covering model selection, API integration, security best practices, and real-world implementation strategies.

CloudPoint

CloudPoint Team

Amazon Bedrock makes foundation models from leading AI providers accessible through a fully managed AWS service. For Australian businesses looking to leverage generative AI without managing infrastructure, Bedrock offers a secure, scalable path to building AI-powered applications.

What is Amazon Bedrock?

Amazon Bedrock is a fully managed service that provides access to foundation models (FMs) from top AI providers through a single API. It eliminates the need to manage infrastructure while offering enterprise-grade security, privacy, and compliance.

Key Capabilities

Model Access:

  • Claude (Anthropic) - Advanced reasoning and coding
  • Titan (Amazon) - Text and embeddings
  • Llama 2 (Meta) - Open-source models
  • Jurassic-2 (AI21 Labs) - Text generation
  • Command (Cohere) - Conversational AI
  • Stable Diffusion (Stability AI) - Image generation

Core Features:

  • Single API for multiple models
  • No infrastructure management
  • Pay-per-use pricing
  • Built-in security and compliance
  • Private model customisation
  • Knowledge bases and agents

Why Choose Bedrock?

For Australian Businesses

Data Sovereignty: Keep data within Australian AWS regions (ap-southeast-2 Sydney).

Compliance Ready:

  • IRAP assessed
  • Industry regulations aligned
  • Privacy Act compliant
  • SOC 2 certified

Cost Efficient: Pay only for what you use, no upfront commitments.

Quick Implementation: Go from idea to production in weeks, not months.

Use Cases

Customer Service: Intelligent chatbots and support automation

Content Generation: Marketing copy, product descriptions, documentation

Document Analysis: Contract review, data extraction, summarisation

Code Generation: Development assistance, code review, documentation

Data Analysis: Insights extraction, report generation, trend analysis

Getting Started

1. Enable Bedrock Access

Request model access through the AWS Console:

# List available foundation models
aws bedrock list-foundation-models \
  --region ap-southeast-2

# Request model access (via Console)
# Navigate to Bedrock → Model access → Manage model access

Model Access Requirements:

  • Some models available immediately
  • Others require access request
  • Approval typically within hours
  • Region-specific availability

2. Set Up IAM Permissions

Create IAM policy for Bedrock access:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "bedrock:InvokeModel",
        "bedrock:InvokeModelWithResponseStream"
      ],
      "Resource": "arn:aws:bedrock:ap-southeast-2::foundation-model/*"
    },
    {
      "Effect": "Allow",
      "Action": [
        "bedrock:ListFoundationModels",
        "bedrock:GetFoundationModel"
      ],
      "Resource": "*"
    }
  ]
}

3. First API Call

Using AWS SDK for Python (boto3):

import boto3
import json

# Initialize Bedrock client
bedrock = boto3.client(
    service_name='bedrock-runtime',
    region_name='ap-southeast-2'
)

# Prepare request for Claude
prompt = "Explain Amazon Bedrock in simple terms"

body = json.dumps({
    "prompt": f"\n\nHuman: {prompt}\n\nAssistant:",
    "max_tokens_to_sample": 500,
    "temperature": 0.7,
    "top_p": 0.9,
})

# Invoke model
response = bedrock.invoke_model(
    modelId='anthropic.claude-v2',
    body=body
)

# Parse response
response_body = json.loads(response['body'].read())
print(response_body['completion'])

4. Using TypeScript/JavaScript

import { BedrockRuntimeClient, InvokeModelCommand } from "@aws-sdk/client-bedrock-runtime";

const client = new BedrockRuntimeClient({
  region: "ap-southeast-2"
});

async function invokeClaude(prompt: string) {
  const input = {
    modelId: "anthropic.claude-v2",
    contentType: "application/json",
    accept: "application/json",
    body: JSON.stringify({
      prompt: `\n\nHuman: ${prompt}\n\nAssistant:`,
      max_tokens_to_sample: 500,
      temperature: 0.7,
    }),
  };

  const command = new InvokeModelCommand(input);
  const response = await client.send(command);

  const responseBody = JSON.parse(
    new TextDecoder().decode(response.body)
  );

  return responseBody.completion;
}

// Usage
const result = await invokeClaude("What are the benefits of cloud computing?");
console.log(result);

Choosing the Right Model

Model Comparison

Claude (Anthropic):

  • Best for: Complex reasoning, coding, analysis
  • Strengths: Safety, instruction following, long context
  • Context window: Up to 200K tokens
  • Use when: Quality and safety are critical

Titan (Amazon):

  • Best for: Text generation, embeddings
  • Strengths: Cost-effective, reliable, AWS-optimised
  • Context window: Up to 8K tokens
  • Use when: Building embeddings, cost-sensitive workloads

Llama 2 (Meta):

  • Best for: Open-source requirements, fine-tuning
  • Strengths: Transparent, customisable
  • Context window: 4K tokens
  • Use when: Need full control and customisation

Command (Cohere):

  • Best for: Conversational AI, search
  • Strengths: Multilingual, RAG-optimised
  • Context window: Up to 128K tokens
  • Use when: Building search or chat applications

Selection Criteria

Consider:

  1. Task complexity
  2. Response quality requirements
  3. Context length needed
  4. Cost constraints
  5. Latency requirements
  6. Compliance needs

Streaming Responses

For better user experience with long responses:

import boto3
import json

bedrock = boto3.client(
    service_name='bedrock-runtime',
    region_name='ap-southeast-2'
)

body = json.dumps({
    "prompt": "\n\nHuman: Write a detailed explanation of quantum computing\n\nAssistant:",
    "max_tokens_to_sample": 2000,
    "temperature": 0.7,
})

response = bedrock.invoke_model_with_response_stream(
    modelId='anthropic.claude-v2',
    body=body
)

# Process streaming response
stream = response['body']
for event in stream:
    chunk = event.get('chunk')
    if chunk:
        chunk_data = json.loads(chunk['bytes'])
        if 'completion' in chunk_data:
            print(chunk_data['completion'], end='', flush=True)

Error Handling

Implement robust error handling:

import boto3
from botocore.exceptions import ClientError
import time

def invoke_bedrock_with_retry(prompt, max_retries=3):
    bedrock = boto3.client(
        service_name='bedrock-runtime',
        region_name='ap-southeast-2'
    )

    for attempt in range(max_retries):
        try:
            response = bedrock.invoke_model(
                modelId='anthropic.claude-v2',
                body=json.dumps({
                    "prompt": f"\n\nHuman: {prompt}\n\nAssistant:",
                    "max_tokens_to_sample": 500,
                })
            )
            return json.loads(response['body'].read())

        except ClientError as e:
            error_code = e.response['Error']['Code']

            if error_code == 'ThrottlingException':
                # Rate limit hit, wait and retry
                wait_time = 2 ** attempt
                time.sleep(wait_time)
                continue

            elif error_code == 'ModelTimeoutException':
                # Model timeout, retry with shorter input
                print("Model timeout, retrying...")
                continue

            elif error_code == 'ValidationException':
                # Invalid input, don't retry
                raise ValueError(f"Invalid input: {e}")

            else:
                # Unknown error
                raise

    raise Exception("Max retries exceeded")

Cost Optimisation

Pricing Model

Bedrock charges per token (input and output):

Claude:

  • Input: ~$0.01 per 1K tokens
  • Output: ~$0.03 per 1K tokens

Titan:

  • Input: ~$0.0003 per 1K tokens
  • Output: ~$0.0004 per 1K tokens

Cost Reduction Strategies

1. Optimise Prompts:

# Inefficient - verbose prompt
prompt = """
Please analyse the following text and provide a comprehensive summary.
The summary should be detailed and include all key points.
Text: [long text here]
"""

# Efficient - concise prompt
prompt = "Summarise: [long text here]"

2. Cache System Prompts: For repeated interactions, cache system instructions:

# Use prompt caching for repeated system instructions
system_prompt = "You are a helpful AWS expert..."  # Cached
user_query = "How do I secure S3 buckets?"  # New each time

3. Right-size Responses:

# Request only what you need
body = json.dumps({
    "prompt": prompt,
    "max_tokens_to_sample": 200,  # Not 2000 if you only need brief response
    "temperature": 0.7,
})

4. Use Appropriate Models:

  • Titan for simple tasks
  • Claude for complex reasoning
  • Don’t use premium models for basic tasks

Security Best Practices

1. Data Privacy

Keep Data in Australia:

# Always specify Sydney region
bedrock = boto3.client(
    service_name='bedrock-runtime',
    region_name='ap-southeast-2'  # Sydney
)

Encryption:

  • Data encrypted in transit (TLS)
  • Data encrypted at rest
  • No data retention by model providers
  • Your data not used for model training

2. Access Control

Use least privilege IAM policies:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": "bedrock:InvokeModel",
      "Resource": [
        "arn:aws:bedrock:ap-southeast-2::foundation-model/anthropic.claude-v2"
      ],
      "Condition": {
        "StringEquals": {
          "aws:RequestedRegion": "ap-southeast-2"
        }
      }
    }
  ]
}

3. Input Validation

Sanitise inputs before sending to models:

import re

def sanitise_input(user_input: str) -> str:
    # Remove potential injection attempts
    cleaned = re.sub(r'[\x00-\x1f\x7f-\x9f]', '', user_input)

    # Limit length
    max_length = 10000
    if len(cleaned) > max_length:
        cleaned = cleaned[:max_length]

    return cleaned

# Use sanitised input
safe_input = sanitise_input(user_input)
prompt = f"\n\nHuman: {safe_input}\n\nAssistant:"

4. Content Filtering

Implement guardrails for content:

def check_content_safety(text: str) -> bool:
    """Check if content meets safety requirements"""

    # Check for sensitive data patterns
    sensitive_patterns = [
        r'\b\d{3}-\d{2}-\d{4}\b',  # SSN pattern
        r'\b\d{16}\b',  # Credit card pattern
        r'password|secret|api[_-]?key',  # Credentials
    ]

    for pattern in sensitive_patterns:
        if re.search(pattern, text, re.IGNORECASE):
            return False

    return True

# Validate before sending
if check_content_safety(user_input):
    response = invoke_bedrock(user_input)
else:
    raise ValueError("Input contains sensitive data")

Monitoring and Logging

CloudWatch Integration

import boto3
from datetime import datetime

cloudwatch = boto3.client('cloudwatch', region_name='ap-southeast-2')

def log_bedrock_invocation(model_id, tokens_used, latency_ms, success):
    """Log Bedrock usage metrics to CloudWatch"""

    cloudwatch.put_metric_data(
        Namespace='BedrockApp',
        MetricData=[
            {
                'MetricName': 'TokensUsed',
                'Value': tokens_used,
                'Unit': 'Count',
                'Timestamp': datetime.utcnow(),
                'Dimensions': [
                    {'Name': 'ModelId', 'Value': model_id}
                ]
            },
            {
                'MetricName': 'Latency',
                'Value': latency_ms,
                'Unit': 'Milliseconds',
                'Dimensions': [
                    {'Name': 'ModelId', 'Value': model_id}
                ]
            },
            {
                'MetricName': 'Invocations',
                'Value': 1,
                'Unit': 'Count',
                'Dimensions': [
                    {'Name': 'Success', 'Value': str(success)}
                ]
            }
        ]
    )

Next Steps

Production Readiness

1. Implement Rate Limiting: Protect against runaway costs:

from datetime import datetime, timedelta
from collections import defaultdict

class RateLimiter:
    def __init__(self, max_requests_per_minute=60):
        self.max_requests = max_requests_per_minute
        self.requests = defaultdict(list)

    def allow_request(self, user_id: str) -> bool:
        now = datetime.utcnow()
        minute_ago = now - timedelta(minutes=1)

        # Remove old requests
        self.requests[user_id] = [
            ts for ts in self.requests[user_id]
            if ts > minute_ago
        ]

        # Check limit
        if len(self.requests[user_id]) >= self.max_requests:
            return False

        self.requests[user_id].append(now)
        return True

2. Build Prompt Templates: Standardise common interactions:

PROMPT_TEMPLATES = {
    'summarise': """Summarise the following text concisely:

{text}

Summary:""",

    'extract': """Extract {entity_type} from the following text:

{text}

{entity_type}:""",

    'analyse': """Analyse the following {content_type} and provide insights:

{content}

Analysis:"""
}

def build_prompt(template_name: str, **kwargs) -> str:
    template = PROMPT_TEMPLATES[template_name]
    return template.format(**kwargs)

3. Add Caching Layer: Cache common requests:

import hashlib
import json
from functools import lru_cache

@lru_cache(maxsize=1000)
def get_cached_response(prompt_hash: str):
    # Cache implementation
    pass

def invoke_with_cache(prompt: str):
    # Create hash of prompt
    prompt_hash = hashlib.sha256(prompt.encode()).hexdigest()

    # Check cache
    cached = get_cached_response(prompt_hash)
    if cached:
        return cached

    # Invoke model
    response = invoke_bedrock(prompt)

    # Cache result
    return response

Common Pitfalls

1. Not Handling Rate Limits

Always implement exponential backoff and retry logic.

2. Ignoring Token Costs

Monitor token usage closely, especially with large context windows.

3. Hardcoding Model IDs

Use configuration for model selection to enable easy switching:

import os

MODEL_CONFIG = {
    'default': os.getenv('BEDROCK_MODEL_ID', 'anthropic.claude-v2'),
    'fast': 'amazon.titan-text-express-v1',
    'quality': 'anthropic.claude-v2'
}

4. Insufficient Error Handling

Handle all error types appropriately, especially throttling.

5. Missing Input Validation

Always validate and sanitise user inputs.

Conclusion

Amazon Bedrock provides Australian businesses with secure, compliant access to leading foundation models. By following these best practices for security, cost optimisation, and error handling, you can build production-ready generative AI applications quickly.

CloudPoint specialises in implementing Amazon Bedrock solutions for Australian businesses and regulated industries. We can help you select the right models, implement security controls, and build cost-effective AI applications that meet local compliance requirements.

Contact us for a Bedrock implementation consultation and accelerate your AI journey.


Ready to Get Started with Amazon Bedrock?

CloudPoint helps Australian businesses implement Amazon Bedrock solutions that solve real business problems—securely and practically. Get in touch to explore AI opportunities.

Learn more about our AI Services →