Building RAG Applications with Amazon Bedrock and Knowledge Bases

Retrieval-Augmented Generation (RAG) combines the power of foundation models with your organisation’s proprietary data. Amazon Bedrock Knowledge Bases makes implementing RAG straightforward, enabling Australian businesses to build AI applications that answer questions using their own documents, policies, and knowledge.

What is RAG?

RAG enhances foundation model responses by retrieving relevant information from your data sources before generating answers. Instead of relying solely on the model’s training data, RAG applications can access current, organisation-specific information.

Why RAG Matters

Accuracy: Responses grounded in your actual data, not hallucinations

Currency: Access to latest information, not just training data cutoff

Relevance: Answers specific to your business context

Trust: Responses include source citations

Privacy: Keep sensitive data within your AWS environment

Architecture Overview

Components

Data Sources:

S3 buckets with documents
Confluence spaces
SharePoint sites
Salesforce knowledge bases
Web pages

Vector Database:

Amazon OpenSearch Serverless
Pinecone
Redis Enterprise Cloud

Embedding Model:

Amazon Titan Embeddings
Cohere Embed

Foundation Model:

Claude (Anthropic)
Titan (Amazon)
Other Bedrock models

Data Flow

Ingestion: Documents chunked and embedded
Storage: Vectors stored in vector database
Query: User question embedded
Retrieval: Similar vectors retrieved
Augmentation: Retrieved content added to prompt
Generation: Foundation model generates response

Setting Up Knowledge Bases

1. Prepare Data Source

Create S3 bucket with documents:

# Create bucket
aws s3 mb s3://my-knowledge-base-docs-ap-southeast-2 \
  --region ap-southeast-2

# Upload documents
aws s3 sync ./documents/ s3://my-knowledge-base-docs-ap-southeast-2/

# Enable versioning
aws s3api put-bucket-versioning \
  --bucket my-knowledge-base-docs-ap-southeast-2 \
  --versioning-configuration Status=Enabled

Supported Formats:

PDF
TXT
MD
HTML
DOC/DOCX
CSV

2. Create OpenSearch Serverless Collection

import boto3

aoss = boto3.client('opensearchserverless', region_name='ap-southeast-2')

# Create collection
response = aoss.create_collection(
    name='knowledge-base-vectors',
    type='VECTORSEARCH',
    description='Vector storage for Bedrock Knowledge Base'
)

collection_id = response['createCollectionDetail']['id']
print(f"Collection created: {collection_id}")

3. Create Knowledge Base

Using AWS Console or Infrastructure as Code:

# CloudFormation template
Resources:
  KnowledgeBase:
    Type: AWS::Bedrock::KnowledgeBase
    Properties:
      Name: CompanyKnowledgeBase
      Description: Internal company documentation and policies
      RoleArn: !GetAtt KnowledgeBaseRole.Arn
      KnowledgeBaseConfiguration:
        Type: VECTOR
        VectorKnowledgeBaseConfiguration:
          EmbeddingModelArn: !Sub 'arn:aws:bedrock:${AWS::Region}::foundation-model/amazon.titan-embed-text-v1'
      StorageConfiguration:
        Type: OPENSEARCH_SERVERLESS
        OpensearchServerlessConfiguration:
          CollectionArn: !GetAtt VectorCollection.Arn
          VectorIndexName: bedrock-knowledge-base-index
          FieldMapping:
            VectorField: bedrock-knowledge-base-vector
            TextField: AMAZON_BEDROCK_TEXT_CHUNK
            MetadataField: AMAZON_BEDROCK_METADATA

  DataSource:
    Type: AWS::Bedrock::DataSource
    Properties:
      Name: S3DocumentsSource
      KnowledgeBaseId: !Ref KnowledgeBase
      DataSourceConfiguration:
        Type: S3
        S3Configuration:
          BucketArn: !GetAtt DocumentsBucket.Arn
      VectorIngestionConfiguration:
        ChunkingConfiguration:
          ChunkingStrategy: FIXED_SIZE
          FixedSizeChunkingConfiguration:
            MaxTokens: 300
            OverlapPercentage: 20

4. Sync Data

import boto3

bedrock_agent = boto3.client('bedrock-agent', region_name='ap-southeast-2')

# Start ingestion job
response = bedrock_agent.start_ingestion_job(
    knowledgeBaseId='KB_ID_HERE',
    dataSourceId='DS_ID_HERE'
)

job_id = response['ingestionJob']['ingestionJobId']
print(f"Ingestion job started: {job_id}")

# Monitor progress
import time

while True:
    status = bedrock_agent.get_ingestion_job(
        knowledgeBaseId='KB_ID_HERE',
        dataSourceId='DS_ID_HERE',
        ingestionJobId=job_id
    )

    state = status['ingestionJob']['status']
    print(f"Status: {state}")

    if state in ['COMPLETE', 'FAILED']:
        break

    time.sleep(10)

Querying Knowledge Bases

Basic Query

import boto3
import json

bedrock_agent_runtime = boto3.client(
    'bedrock-agent-runtime',
    region_name='ap-southeast-2'
)

def query_knowledge_base(query: str, knowledge_base_id: str):
    """Query knowledge base and return response"""

    response = bedrock_agent_runtime.retrieve_and_generate(
        input={'text': query},
        retrieveAndGenerateConfiguration={
            'type': 'KNOWLEDGE_BASE',
            'knowledgeBaseConfiguration': {
                'knowledgeBaseId': knowledge_base_id,
                'modelArn': 'arn:aws:bedrock:ap-southeast-2::foundation-model/anthropic.claude-v2'
            }
        }
    )

    return {
        'answer': response['output']['text'],
        'citations': response.get('citations', [])
    }

# Usage
result = query_knowledge_base(
    query="What is our remote work policy?",
    knowledge_base_id="KB123456"
)

print("Answer:", result['answer'])
print("\nSources:")
for citation in result['citations']:
    for reference in citation.get('retrievedReferences', []):
        print(f"- {reference['location']['s3Location']['uri']}")

Advanced Query with Filters

def query_with_filters(query: str, knowledge_base_id: str, metadata_filters: dict):
    """Query with metadata filtering"""

    response = bedrock_agent_runtime.retrieve_and_generate(
        input={'text': query},
        retrieveAndGenerateConfiguration={
            'type': 'KNOWLEDGE_BASE',
            'knowledgeBaseConfiguration': {
                'knowledgeBaseId': knowledge_base_id,
                'modelArn': 'arn:aws:bedrock:ap-southeast-2::foundation-model/anthropic.claude-v2',
                'retrievalConfiguration': {
                    'vectorSearchConfiguration': {
                        'numberOfResults': 5,
                        'overrideSearchType': 'HYBRID',  # Vector + keyword
                        'filter': metadata_filters
                    }
                }
            }
        }
    )

    return response['output']['text']

# Example: Filter by department
filters = {
    'equals': {
        'key': 'department',
        'value': 'engineering'
    }
}

answer = query_with_filters(
    query="What are our coding standards?",
    knowledge_base_id="KB123456",
    metadata_filters=filters
)

Optimising Chunking Strategy

Fixed-Size Chunking

Best for consistent document types:

chunking_config = {
    'chunkingStrategy': 'FIXED_SIZE',
    'fixedSizeChunkingConfiguration': {
        'maxTokens': 300,  # Chunk size
        'overlapPercentage': 20  # Overlap between chunks
    }
}

Parameters:

maxTokens: 200-500 for most use cases
overlapPercentage: 10-20% to preserve context

Semantic Chunking

Better for varied content:

chunking_config = {
    'chunkingStrategy': 'SEMANTIC',
    'semanticChunkingConfiguration': {
        'maxTokens': 300,
        'bufferSize': 0,  # Tokens to include from surrounding chunks
        'breakpointPercentileThreshold': 95
    }
}

Hierarchical Chunking

For structured documents:

chunking_config = {
    'chunkingStrategy': 'HIERARCHICAL',
    'hierarchicalChunkingConfiguration': {
        'levelConfigurations': [
            {'maxTokens': 1500},  # Parent chunks
            {'maxTokens': 300}     # Child chunks
        ],
        'overlapTokens': 60
    }
}

Custom Chunking

For specialized needs:

import tiktoken

def custom_chunk_document(text: str, max_tokens: int = 300, overlap: int = 50):
    """Custom chunking with token counting"""

    encoding = tiktoken.get_encoding("cl100k_base")
    tokens = encoding.encode(text)

    chunks = []
    start = 0

    while start < len(tokens):
        end = start + max_tokens
        chunk_tokens = tokens[start:end]
        chunks.append(encoding.decode(chunk_tokens))
        start = end - overlap  # Overlap

    return chunks

# Process documents
chunks = custom_chunk_document(document_text)

# Upload to S3 with metadata
for i, chunk in enumerate(chunks):
    s3.put_object(
        Bucket='knowledge-base-bucket',
        Key=f'documents/doc1/chunk_{i}.txt',
        Body=chunk,
        Metadata={
            'chunk_index': str(i),
            'source_document': 'policy.pdf'
        }
    )

Adding Metadata

Metadata improves retrieval accuracy:

import json

def upload_with_metadata(file_path: str, bucket: str, metadata: dict):
    """Upload document with metadata"""

    s3 = boto3.client('s3')

    # Upload document
    s3.upload_file(
        file_path,
        bucket,
        file_path,
        ExtraArgs={'Metadata': metadata}
    )

    # Create metadata JSON file
    metadata_file = f"{file_path}.metadata.json"
    with open(metadata_file, 'w') as f:
        json.dump({
            'metadataAttributes': metadata
        }, f)

    # Upload metadata file
    s3.upload_file(
        metadata_file,
        bucket,
        metadata_file
    )

# Usage
upload_with_metadata(
    'policies/remote-work.pdf',
    'knowledge-base-bucket',
    {
        'department': 'HR',
        'category': 'policy',
        'last_reviewed': '2025-01-15',
        'applies_to': 'all_employees'
    }
)

Building a RAG Application

Complete Python Application

import boto3
import streamlit as st
from typing import List, Dict

class RAGApplication:
    def __init__(self, knowledge_base_id: str, region: str = 'ap-southeast-2'):
        self.kb_id = knowledge_base_id
        self.runtime = boto3.client(
            'bedrock-agent-runtime',
            region_name=region
        )

    def query(
        self,
        question: str,
        num_results: int = 5,
        filters: Dict = None
    ) -> Dict:
        """Query knowledge base with optional filters"""

        config = {
            'type': 'KNOWLEDGE_BASE',
            'knowledgeBaseConfiguration': {
                'knowledgeBaseId': self.kb_id,
                'modelArn': 'arn:aws:bedrock:ap-southeast-2::foundation-model/anthropic.claude-v2',
                'retrievalConfiguration': {
                    'vectorSearchConfiguration': {
                        'numberOfResults': num_results,
                        'overrideSearchType': 'HYBRID'
                    }
                }
            }
        }

        if filters:
            config['knowledgeBaseConfiguration']['retrievalConfiguration']['vectorSearchConfiguration']['filter'] = filters

        response = self.runtime.retrieve_and_generate(
            input={'text': question},
            retrieveAndGenerateConfiguration=config
        )

        return {
            'answer': response['output']['text'],
            'sources': self._extract_sources(response)
        }

    def _extract_sources(self, response: Dict) -> List[Dict]:
        """Extract source documents from response"""

        sources = []
        for citation in response.get('citations', []):
            for ref in citation.get('retrievedReferences', []):
                location = ref.get('location', {})
                sources.append({
                    'uri': location.get('s3Location', {}).get('uri', ''),
                    'text': ref.get('content', {}).get('text', '')[:200]
                })
        return sources

# Streamlit UI
st.title("Company Knowledge Base")

app = RAGApplication(knowledge_base_id=st.secrets['KB_ID'])

question = st.text_input("Ask a question:")

if question:
    with st.spinner("Searching knowledge base..."):
        result = app.query(question)

    st.subheader("Answer")
    st.write(result['answer'])

    st.subheader("Sources")
    for source in result['sources']:
        with st.expander(source['uri']):
            st.write(source['text'])

TypeScript Implementation

import { BedrockAgentRuntimeClient, RetrieveAndGenerateCommand } from "@aws-sdk/client-bedrock-agent-runtime";

interface RAGQuery {
  question: string;
  knowledgeBaseId: string;
  numResults?: number;
}

interface RAGResponse {
  answer: string;
  sources: Array<{
    uri: string;
    excerpt: string;
  }>;
}

class RAGService {
  private client: BedrockAgentRuntimeClient;

  constructor(region: string = 'ap-southeast-2') {
    this.client = new BedrockAgentRuntimeClient({ region });
  }

  async query({ question, knowledgeBaseId, numResults = 5 }: RAGQuery): Promise<RAGResponse> {
    const command = new RetrieveAndGenerateCommand({
      input: { text: question },
      retrieveAndGenerateConfiguration: {
        type: 'KNOWLEDGE_BASE',
        knowledgeBaseConfiguration: {
          knowledgeBaseId,
          modelArn: 'arn:aws:bedrock:ap-southeast-2::foundation-model/anthropic.claude-v2',
          retrievalConfiguration: {
            vectorSearchConfiguration: {
              numberOfResults: numResults,
              overrideSearchType: 'HYBRID'
            }
          }
        }
      }
    });

    const response = await this.client.send(command);

    return {
      answer: response.output?.text || '',
      sources: this.extractSources(response)
    };
  }

  private extractSources(response: any): Array<{ uri: string; excerpt: string }> {
    const sources = [];

    for (const citation of response.citations || []) {
      for (const ref of citation.retrievedReferences || []) {
        sources.push({
          uri: ref.location?.s3Location?.uri || '',
          excerpt: ref.content?.text?.substring(0, 200) || ''
        });
      }
    }

    return sources;
  }
}

// Usage in Express API
import express from 'express';

const app = express();
const ragService = new RAGService();

app.post('/api/query', async (req, res) => {
  const { question } = req.body;

  try {
    const result = await ragService.query({
      question,
      knowledgeBaseId: process.env.KB_ID!
    });

    res.json(result);
  } catch (error) {
    console.error('RAG query error:', error);
    res.status(500).json({ error: 'Query failed' });
  }
});

Cost Optimisation

Embedding Costs

Titan Embeddings: ~$0.0001 per 1K tokens

Strategies:

Deduplicate documents before embedding
Use appropriate chunk sizes
Implement caching for common queries
Archive old/unused documents

Query Costs

Each query incurs:

Embedding cost (query)
Vector search cost
Model inference cost

Optimisation:

from functools import lru_cache
import hashlib

@lru_cache(maxsize=1000)
def cached_query(query_hash: str):
    """Cache common queries"""
    pass

def query_with_cache(question: str, kb_id: str):
    # Create hash
    query_hash = hashlib.md5(question.lower().encode()).hexdigest()

    # Check cache
    cached = cached_query(query_hash)
    if cached:
        return cached

    # Execute query
    result = query_knowledge_base(question, kb_id)

    return result

Monitoring and Evaluation

Track Performance

import boto3
from datetime import datetime

cloudwatch = boto3.client('cloudwatch', region_name='ap-southeast-2')

def log_rag_metrics(query: str, latency_ms: int, num_sources: int):
    """Log RAG query metrics"""

    cloudwatch.put_metric_data(
        Namespace='RAGApplication',
        MetricData=[
            {
                'MetricName': 'QueryLatency',
                'Value': latency_ms,
                'Unit': 'Milliseconds',
                'Timestamp': datetime.utcnow()
            },
            {
                'MetricName': 'SourcesRetrieved',
                'Value': num_sources,
                'Unit': 'Count'
            }
        ]
    )

Evaluate Answer Quality

def evaluate_answer_relevance(question: str, answer: str, sources: List[str]):
    """Use LLM to evaluate answer quality"""

    prompt = f"""Evaluate the following answer on a scale of 1-5:

Question: {question}

Answer: {answer}

Sources used: {len(sources)}

Rate the answer on:
1. Relevance to question
2. Accuracy based on sources
3. Completeness

Provide scores and brief explanation."""

    # Use Bedrock to evaluate
    evaluation = invoke_bedrock(prompt)
    return evaluation

Best Practices

Document Preparation

Clean PDFs: Remove headers/footers, fix OCR errors
Structure: Use clear headings and sections
Format: Convert to searchable text when possible
Update: Keep documents current, remove outdated content

Metadata Strategy

# Comprehensive metadata
metadata = {
    'document_type': 'policy',
    'department': 'engineering',
    'version': '2.1',
    'effective_date': '2025-01-01',
    'review_date': '2026-01-01',
    'author': 'john.smith',
    'tags': 'security,compliance,procedures'
}

Query Optimisation

def optimise_query(raw_query: str) -> str:
    """Enhance query for better retrieval"""

    # Expand abbreviations
    expansions = {
        'AWS': 'Amazon Web Services',
        'EC2': 'Elastic Compute Cloud'
    }

    enhanced = raw_query
    for abbr, full in expansions.items():
        enhanced = enhanced.replace(abbr, f"{abbr} {full}")

    return enhanced

Conclusion

Amazon Bedrock Knowledge Bases simplifies building RAG applications, enabling Australian businesses to leverage their proprietary data with foundation models. By following these implementation patterns and optimisation strategies, you can build accurate, cost-effective knowledge applications.

CloudPoint specialises in implementing RAG solutions for Australian businesses and regulated industries. We can help you prepare your data, optimise chunking strategies, and build production-ready knowledge base applications.

Ready to Build RAG Applications?

CloudPoint implements RAG solutions using Amazon Bedrock Knowledge Bases—connecting AI to your business data securely. Get in touch to discuss your requirements.

Learn more about our AI Services →