AWS AI Services
Building RAG Applications with Amazon Bedrock and Knowledge Bases
Complete guide to building Retrieval-Augmented Generation applications using Amazon Bedrock Knowledge Bases for Australian businesses, covering architecture, implementation, and optimisation strategies.
CloudPoint Team
Retrieval-Augmented Generation (RAG) combines the power of foundation models with your organisation’s proprietary data. Amazon Bedrock Knowledge Bases makes implementing RAG straightforward, enabling Australian businesses to build AI applications that answer questions using their own documents, policies, and knowledge.
What is RAG?
RAG enhances foundation model responses by retrieving relevant information from your data sources before generating answers. Instead of relying solely on the model’s training data, RAG applications can access current, organisation-specific information.
Why RAG Matters
Accuracy: Responses grounded in your actual data, not hallucinations
Currency: Access to latest information, not just training data cutoff
Relevance: Answers specific to your business context
Trust: Responses include source citations
Privacy: Keep sensitive data within your AWS environment
Architecture Overview
Components
Data Sources:
- S3 buckets with documents
- Confluence spaces
- SharePoint sites
- Salesforce knowledge bases
- Web pages
Vector Database:
- Amazon OpenSearch Serverless
- Pinecone
- Redis Enterprise Cloud
Embedding Model:
- Amazon Titan Embeddings
- Cohere Embed
Foundation Model:
- Claude (Anthropic)
- Titan (Amazon)
- Other Bedrock models
Data Flow
- Ingestion: Documents chunked and embedded
- Storage: Vectors stored in vector database
- Query: User question embedded
- Retrieval: Similar vectors retrieved
- Augmentation: Retrieved content added to prompt
- Generation: Foundation model generates response
Setting Up Knowledge Bases
1. Prepare Data Source
Create S3 bucket with documents:
# Create bucket
aws s3 mb s3://my-knowledge-base-docs-ap-southeast-2 \
--region ap-southeast-2
# Upload documents
aws s3 sync ./documents/ s3://my-knowledge-base-docs-ap-southeast-2/
# Enable versioning
aws s3api put-bucket-versioning \
--bucket my-knowledge-base-docs-ap-southeast-2 \
--versioning-configuration Status=Enabled
Supported Formats:
- TXT
- MD
- HTML
- DOC/DOCX
- CSV
2. Create OpenSearch Serverless Collection
import boto3
aoss = boto3.client('opensearchserverless', region_name='ap-southeast-2')
# Create collection
response = aoss.create_collection(
name='knowledge-base-vectors',
type='VECTORSEARCH',
description='Vector storage for Bedrock Knowledge Base'
)
collection_id = response['createCollectionDetail']['id']
print(f"Collection created: {collection_id}")
3. Create Knowledge Base
Using AWS Console or Infrastructure as Code:
# CloudFormation template
Resources:
KnowledgeBase:
Type: AWS::Bedrock::KnowledgeBase
Properties:
Name: CompanyKnowledgeBase
Description: Internal company documentation and policies
RoleArn: !GetAtt KnowledgeBaseRole.Arn
KnowledgeBaseConfiguration:
Type: VECTOR
VectorKnowledgeBaseConfiguration:
EmbeddingModelArn: !Sub 'arn:aws:bedrock:${AWS::Region}::foundation-model/amazon.titan-embed-text-v1'
StorageConfiguration:
Type: OPENSEARCH_SERVERLESS
OpensearchServerlessConfiguration:
CollectionArn: !GetAtt VectorCollection.Arn
VectorIndexName: bedrock-knowledge-base-index
FieldMapping:
VectorField: bedrock-knowledge-base-vector
TextField: AMAZON_BEDROCK_TEXT_CHUNK
MetadataField: AMAZON_BEDROCK_METADATA
DataSource:
Type: AWS::Bedrock::DataSource
Properties:
Name: S3DocumentsSource
KnowledgeBaseId: !Ref KnowledgeBase
DataSourceConfiguration:
Type: S3
S3Configuration:
BucketArn: !GetAtt DocumentsBucket.Arn
VectorIngestionConfiguration:
ChunkingConfiguration:
ChunkingStrategy: FIXED_SIZE
FixedSizeChunkingConfiguration:
MaxTokens: 300
OverlapPercentage: 20
4. Sync Data
import boto3
bedrock_agent = boto3.client('bedrock-agent', region_name='ap-southeast-2')
# Start ingestion job
response = bedrock_agent.start_ingestion_job(
knowledgeBaseId='KB_ID_HERE',
dataSourceId='DS_ID_HERE'
)
job_id = response['ingestionJob']['ingestionJobId']
print(f"Ingestion job started: {job_id}")
# Monitor progress
import time
while True:
status = bedrock_agent.get_ingestion_job(
knowledgeBaseId='KB_ID_HERE',
dataSourceId='DS_ID_HERE',
ingestionJobId=job_id
)
state = status['ingestionJob']['status']
print(f"Status: {state}")
if state in ['COMPLETE', 'FAILED']:
break
time.sleep(10)
Querying Knowledge Bases
Basic Query
import boto3
import json
bedrock_agent_runtime = boto3.client(
'bedrock-agent-runtime',
region_name='ap-southeast-2'
)
def query_knowledge_base(query: str, knowledge_base_id: str):
"""Query knowledge base and return response"""
response = bedrock_agent_runtime.retrieve_and_generate(
input={'text': query},
retrieveAndGenerateConfiguration={
'type': 'KNOWLEDGE_BASE',
'knowledgeBaseConfiguration': {
'knowledgeBaseId': knowledge_base_id,
'modelArn': 'arn:aws:bedrock:ap-southeast-2::foundation-model/anthropic.claude-v2'
}
}
)
return {
'answer': response['output']['text'],
'citations': response.get('citations', [])
}
# Usage
result = query_knowledge_base(
query="What is our remote work policy?",
knowledge_base_id="KB123456"
)
print("Answer:", result['answer'])
print("\nSources:")
for citation in result['citations']:
for reference in citation.get('retrievedReferences', []):
print(f"- {reference['location']['s3Location']['uri']}")
Advanced Query with Filters
def query_with_filters(query: str, knowledge_base_id: str, metadata_filters: dict):
"""Query with metadata filtering"""
response = bedrock_agent_runtime.retrieve_and_generate(
input={'text': query},
retrieveAndGenerateConfiguration={
'type': 'KNOWLEDGE_BASE',
'knowledgeBaseConfiguration': {
'knowledgeBaseId': knowledge_base_id,
'modelArn': 'arn:aws:bedrock:ap-southeast-2::foundation-model/anthropic.claude-v2',
'retrievalConfiguration': {
'vectorSearchConfiguration': {
'numberOfResults': 5,
'overrideSearchType': 'HYBRID', # Vector + keyword
'filter': metadata_filters
}
}
}
}
)
return response['output']['text']
# Example: Filter by department
filters = {
'equals': {
'key': 'department',
'value': 'engineering'
}
}
answer = query_with_filters(
query="What are our coding standards?",
knowledge_base_id="KB123456",
metadata_filters=filters
)
Optimising Chunking Strategy
Fixed-Size Chunking
Best for consistent document types:
chunking_config = {
'chunkingStrategy': 'FIXED_SIZE',
'fixedSizeChunkingConfiguration': {
'maxTokens': 300, # Chunk size
'overlapPercentage': 20 # Overlap between chunks
}
}
Parameters:
- maxTokens: 200-500 for most use cases
- overlapPercentage: 10-20% to preserve context
Semantic Chunking
Better for varied content:
chunking_config = {
'chunkingStrategy': 'SEMANTIC',
'semanticChunkingConfiguration': {
'maxTokens': 300,
'bufferSize': 0, # Tokens to include from surrounding chunks
'breakpointPercentileThreshold': 95
}
}
Hierarchical Chunking
For structured documents:
chunking_config = {
'chunkingStrategy': 'HIERARCHICAL',
'hierarchicalChunkingConfiguration': {
'levelConfigurations': [
{'maxTokens': 1500}, # Parent chunks
{'maxTokens': 300} # Child chunks
],
'overlapTokens': 60
}
}
Custom Chunking
For specialized needs:
import tiktoken
def custom_chunk_document(text: str, max_tokens: int = 300, overlap: int = 50):
"""Custom chunking with token counting"""
encoding = tiktoken.get_encoding("cl100k_base")
tokens = encoding.encode(text)
chunks = []
start = 0
while start < len(tokens):
end = start + max_tokens
chunk_tokens = tokens[start:end]
chunks.append(encoding.decode(chunk_tokens))
start = end - overlap # Overlap
return chunks
# Process documents
chunks = custom_chunk_document(document_text)
# Upload to S3 with metadata
for i, chunk in enumerate(chunks):
s3.put_object(
Bucket='knowledge-base-bucket',
Key=f'documents/doc1/chunk_{i}.txt',
Body=chunk,
Metadata={
'chunk_index': str(i),
'source_document': 'policy.pdf'
}
)
Adding Metadata
Metadata improves retrieval accuracy:
import json
def upload_with_metadata(file_path: str, bucket: str, metadata: dict):
"""Upload document with metadata"""
s3 = boto3.client('s3')
# Upload document
s3.upload_file(
file_path,
bucket,
file_path,
ExtraArgs={'Metadata': metadata}
)
# Create metadata JSON file
metadata_file = f"{file_path}.metadata.json"
with open(metadata_file, 'w') as f:
json.dump({
'metadataAttributes': metadata
}, f)
# Upload metadata file
s3.upload_file(
metadata_file,
bucket,
metadata_file
)
# Usage
upload_with_metadata(
'policies/remote-work.pdf',
'knowledge-base-bucket',
{
'department': 'HR',
'category': 'policy',
'last_reviewed': '2025-01-15',
'applies_to': 'all_employees'
}
)
Building a RAG Application
Complete Python Application
import boto3
import streamlit as st
from typing import List, Dict
class RAGApplication:
def __init__(self, knowledge_base_id: str, region: str = 'ap-southeast-2'):
self.kb_id = knowledge_base_id
self.runtime = boto3.client(
'bedrock-agent-runtime',
region_name=region
)
def query(
self,
question: str,
num_results: int = 5,
filters: Dict = None
) -> Dict:
"""Query knowledge base with optional filters"""
config = {
'type': 'KNOWLEDGE_BASE',
'knowledgeBaseConfiguration': {
'knowledgeBaseId': self.kb_id,
'modelArn': 'arn:aws:bedrock:ap-southeast-2::foundation-model/anthropic.claude-v2',
'retrievalConfiguration': {
'vectorSearchConfiguration': {
'numberOfResults': num_results,
'overrideSearchType': 'HYBRID'
}
}
}
}
if filters:
config['knowledgeBaseConfiguration']['retrievalConfiguration']['vectorSearchConfiguration']['filter'] = filters
response = self.runtime.retrieve_and_generate(
input={'text': question},
retrieveAndGenerateConfiguration=config
)
return {
'answer': response['output']['text'],
'sources': self._extract_sources(response)
}
def _extract_sources(self, response: Dict) -> List[Dict]:
"""Extract source documents from response"""
sources = []
for citation in response.get('citations', []):
for ref in citation.get('retrievedReferences', []):
location = ref.get('location', {})
sources.append({
'uri': location.get('s3Location', {}).get('uri', ''),
'text': ref.get('content', {}).get('text', '')[:200]
})
return sources
# Streamlit UI
st.title("Company Knowledge Base")
app = RAGApplication(knowledge_base_id=st.secrets['KB_ID'])
question = st.text_input("Ask a question:")
if question:
with st.spinner("Searching knowledge base..."):
result = app.query(question)
st.subheader("Answer")
st.write(result['answer'])
st.subheader("Sources")
for source in result['sources']:
with st.expander(source['uri']):
st.write(source['text'])
TypeScript Implementation
import { BedrockAgentRuntimeClient, RetrieveAndGenerateCommand } from "@aws-sdk/client-bedrock-agent-runtime";
interface RAGQuery {
question: string;
knowledgeBaseId: string;
numResults?: number;
}
interface RAGResponse {
answer: string;
sources: Array<{
uri: string;
excerpt: string;
}>;
}
class RAGService {
private client: BedrockAgentRuntimeClient;
constructor(region: string = 'ap-southeast-2') {
this.client = new BedrockAgentRuntimeClient({ region });
}
async query({ question, knowledgeBaseId, numResults = 5 }: RAGQuery): Promise<RAGResponse> {
const command = new RetrieveAndGenerateCommand({
input: { text: question },
retrieveAndGenerateConfiguration: {
type: 'KNOWLEDGE_BASE',
knowledgeBaseConfiguration: {
knowledgeBaseId,
modelArn: 'arn:aws:bedrock:ap-southeast-2::foundation-model/anthropic.claude-v2',
retrievalConfiguration: {
vectorSearchConfiguration: {
numberOfResults: numResults,
overrideSearchType: 'HYBRID'
}
}
}
}
});
const response = await this.client.send(command);
return {
answer: response.output?.text || '',
sources: this.extractSources(response)
};
}
private extractSources(response: any): Array<{ uri: string; excerpt: string }> {
const sources = [];
for (const citation of response.citations || []) {
for (const ref of citation.retrievedReferences || []) {
sources.push({
uri: ref.location?.s3Location?.uri || '',
excerpt: ref.content?.text?.substring(0, 200) || ''
});
}
}
return sources;
}
}
// Usage in Express API
import express from 'express';
const app = express();
const ragService = new RAGService();
app.post('/api/query', async (req, res) => {
const { question } = req.body;
try {
const result = await ragService.query({
question,
knowledgeBaseId: process.env.KB_ID!
});
res.json(result);
} catch (error) {
console.error('RAG query error:', error);
res.status(500).json({ error: 'Query failed' });
}
});
Cost Optimisation
Embedding Costs
Titan Embeddings: ~$0.0001 per 1K tokens
Strategies:
- Deduplicate documents before embedding
- Use appropriate chunk sizes
- Implement caching for common queries
- Archive old/unused documents
Query Costs
Each query incurs:
- Embedding cost (query)
- Vector search cost
- Model inference cost
Optimisation:
from functools import lru_cache
import hashlib
@lru_cache(maxsize=1000)
def cached_query(query_hash: str):
"""Cache common queries"""
pass
def query_with_cache(question: str, kb_id: str):
# Create hash
query_hash = hashlib.md5(question.lower().encode()).hexdigest()
# Check cache
cached = cached_query(query_hash)
if cached:
return cached
# Execute query
result = query_knowledge_base(question, kb_id)
return result
Monitoring and Evaluation
Track Performance
import boto3
from datetime import datetime
cloudwatch = boto3.client('cloudwatch', region_name='ap-southeast-2')
def log_rag_metrics(query: str, latency_ms: int, num_sources: int):
"""Log RAG query metrics"""
cloudwatch.put_metric_data(
Namespace='RAGApplication',
MetricData=[
{
'MetricName': 'QueryLatency',
'Value': latency_ms,
'Unit': 'Milliseconds',
'Timestamp': datetime.utcnow()
},
{
'MetricName': 'SourcesRetrieved',
'Value': num_sources,
'Unit': 'Count'
}
]
)
Evaluate Answer Quality
def evaluate_answer_relevance(question: str, answer: str, sources: List[str]):
"""Use LLM to evaluate answer quality"""
prompt = f"""Evaluate the following answer on a scale of 1-5:
Question: {question}
Answer: {answer}
Sources used: {len(sources)}
Rate the answer on:
1. Relevance to question
2. Accuracy based on sources
3. Completeness
Provide scores and brief explanation."""
# Use Bedrock to evaluate
evaluation = invoke_bedrock(prompt)
return evaluation
Best Practices
Document Preparation
- Clean PDFs: Remove headers/footers, fix OCR errors
- Structure: Use clear headings and sections
- Format: Convert to searchable text when possible
- Update: Keep documents current, remove outdated content
Metadata Strategy
# Comprehensive metadata
metadata = {
'document_type': 'policy',
'department': 'engineering',
'version': '2.1',
'effective_date': '2025-01-01',
'review_date': '2026-01-01',
'author': 'john.smith',
'tags': 'security,compliance,procedures'
}
Query Optimisation
def optimise_query(raw_query: str) -> str:
"""Enhance query for better retrieval"""
# Expand abbreviations
expansions = {
'AWS': 'Amazon Web Services',
'EC2': 'Elastic Compute Cloud'
}
enhanced = raw_query
for abbr, full in expansions.items():
enhanced = enhanced.replace(abbr, f"{abbr} {full}")
return enhanced
Conclusion
Amazon Bedrock Knowledge Bases simplifies building RAG applications, enabling Australian businesses to leverage their proprietary data with foundation models. By following these implementation patterns and optimisation strategies, you can build accurate, cost-effective knowledge applications.
CloudPoint specialises in implementing RAG solutions for Australian businesses and regulated industries. We can help you prepare your data, optimise chunking strategies, and build production-ready knowledge base applications.
Contact us for a RAG implementation consultation.
Ready to Build RAG Applications?
CloudPoint implements RAG solutions using Amazon Bedrock Knowledge Bases—connecting AI to your business data securely. Get in touch to discuss your requirements.