AWS AI Services
Fine-Tuning Foundation Models on AWS: Complete Implementation Guide
Practical guide to fine-tuning foundation models on AWS using Amazon Bedrock and SageMaker for Australian businesses, covering data preparation, training, evaluation, and deployment.
CloudPoint Team
Fine-tuning foundation models allows you to customise pre-trained AI models for your specific use case, domain, or industry. For Australian businesses, fine-tuning enables AI systems that understand local context, terminology, and compliance requirements while maintaining the power of foundation models. This guide covers practical fine-tuning strategies using AWS services.
What is Fine-Tuning?
Fine-tuning is the process of taking a pre-trained foundation model and training it further on your specific data. Unlike training from scratch, fine-tuning preserves the model’s general knowledge while adapting it to your domain.
Fine-Tuning vs Training from Scratch
Training from Scratch:
- Requires massive datasets (billions of tokens)
- Costs millions of dollars
- Takes months to complete
- Needs extensive ML expertise
- High computational requirements
Fine-Tuning:
- Requires modest datasets (thousands of examples)
- Costs hundreds to thousands of dollars
- Takes hours to days
- Moderate ML expertise needed
- Reasonable computational requirements
Fine-Tuning vs RAG
RAG (Retrieval Augmented Generation):
- Adds external knowledge at inference time
- No model training required
- Quick to implement
- Good for factual information
- Higher latency per request
- Ongoing retrieval costs
Fine-Tuning:
- Embeds knowledge in model weights
- Requires training process
- Longer implementation time
- Good for style, format, reasoning patterns
- Lower inference latency
- One-time training cost
When to Fine-Tune
Ideal Use Cases
1. Domain-Specific Terminology:
# Example: Australian legal terminology
training_examples = [
{
"prompt": "What is a deed of company arrangement?",
"completion": "A Deed of Company Arrangement (DOCA) is a binding agreement between a company and its creditors governing how the company's affairs will be dealt with. It's an alternative to liquidation under Australian insolvency law, allowing a company to continue trading while addressing its debt obligations."
},
{
"prompt": "Explain fair work entitlements",
"completion": "Under the Fair Work Act 2009, Australian employees are entitled to minimum conditions including: National Employment Standards (NES), minimum wage, annual leave, personal/carer's leave, parental leave, notice of termination, and protection from unfair dismissal. Award and enterprise agreement conditions may provide additional entitlements."
}
]
2. Specific Output Formats:
# Example: Structured data extraction
training_examples = [
{
"prompt": "Extract invoice details: Invoice #INV-2025-001, Date: 15/01/2025, Total: $1,234.56 AUD",
"completion": '{"invoice_number": "INV-2025-001", "date": "2025-01-15", "total_aud": 1234.56, "currency": "AUD"}'
}
]
3. Brand Voice and Style:
# Example: Company-specific communication style
training_examples = [
{
"prompt": "Draft welcome email for new Australian customer",
"completion": "G'day and welcome to [Company]!\n\nWe're thrilled to have you on board. As a proudly Australian company, we're committed to delivering exceptional service and supporting local businesses...\n\nCheers,\nThe [Company] Team"
}
]
4. Industry-Specific Reasoning:
- Healthcare diagnostics
- Financial risk assessment
- Legal contract analysis
- Technical support
- Regulatory compliance
When NOT to Fine-Tune
Use RAG instead when:
- Need to update information frequently
- Working with factual knowledge bases
- Information changes regularly
- Don’t have training data
- Need explainable sources
Use prompt engineering when:
- Base model already performs well
- Need quick iteration
- Limited training data
- Behaviour can be specified in prompts
Fine-Tuning on Amazon Bedrock
Supported Models
Amazon Bedrock supports fine-tuning for:
- Amazon Titan Text models
- Cohere Command models
- Meta Llama 2 models
Data Preparation
Training data format (JSONL):
import json
def prepare_bedrock_training_data(examples: list) -> str:
"""
Prepare training data in Bedrock format.
Each line is a JSON object with prompt and completion.
"""
training_data = []
for example in examples:
# Bedrock format
training_record = {
"prompt": example["prompt"],
"completion": example["completion"]
}
training_data.append(json.dumps(training_record))
# Write to JSONL file
output_file = "bedrock_training_data.jsonl"
with open(output_file, 'w') as f:
f.write('\n'.join(training_data))
return output_file
# Example: Australian property listing generation
examples = [
{
"prompt": "Create listing for: 3 bed, 2 bath house, Bondi, Sydney, $2.5M",
"completion": """Stunning Bondi Beach Residence
This beautifully appointed 3-bedroom, 2-bathroom home epitomises coastal living in one of Sydney's most sought-after locations. Just moments from Bondi Beach, this property offers the perfect blend of lifestyle and luxury.
Features:
• 3 spacious bedrooms with built-in wardrobes
• 2 modern bathrooms with quality fixtures
• Open-plan living and dining
• Designer kitchen with premium appliances
• Private outdoor entertaining area
• Walking distance to beach, cafes, and transport
Price: $2,500,000
Contact: [Agent details]"""
},
{
"prompt": "Create listing for: 2 bed apartment, Melbourne CBD, $850K",
"completion": """Contemporary Melbourne CBD Apartment
Embrace urban sophistication in this modern 2-bedroom apartment located in the heart of Melbourne's CBD. Perfect for professionals or investors seeking premium city living.
Features:
• 2 generous bedrooms with mirrored built-ins
• Designer bathroom with floor-to-ceiling tiles
• Open-plan living with city views
• European laundry facilities
• Secure car space
• Building amenities: gym, concierge, rooftop terrace
Price: $850,000
Contact: [Agent details]"""
}
]
# Prepare training data
training_file = prepare_bedrock_training_data(examples)
Data Quality Guidelines
Minimum requirements:
- 32 examples minimum (more is better)
- 500+ examples recommended for good results
- Balanced representation of use cases
- High-quality, human-written completions
- Consistent formatting
Data validation:
import json
from collections import Counter
def validate_training_data(jsonl_file: str) -> dict:
"""Validate Bedrock training data quality."""
issues = []
stats = {
'total_examples': 0,
'avg_prompt_length': 0,
'avg_completion_length': 0,
'prompt_lengths': [],
'completion_lengths': []
}
with open(jsonl_file, 'r') as f:
for line_num, line in enumerate(f, 1):
try:
record = json.loads(line)
# Check required fields
if 'prompt' not in record or 'completion' not in record:
issues.append(f"Line {line_num}: Missing prompt or completion")
continue
prompt_len = len(record['prompt'])
completion_len = len(record['completion'])
# Check lengths
if prompt_len < 10:
issues.append(f"Line {line_num}: Prompt too short ({prompt_len} chars)")
if completion_len < 20:
issues.append(f"Line {line_num}: Completion too short ({completion_len} chars)")
if completion_len > 4000:
issues.append(f"Line {line_num}: Completion too long ({completion_len} chars)")
stats['prompt_lengths'].append(prompt_len)
stats['completion_lengths'].append(completion_len)
stats['total_examples'] += 1
except json.JSONDecodeError:
issues.append(f"Line {line_num}: Invalid JSON")
# Calculate stats
if stats['total_examples'] > 0:
stats['avg_prompt_length'] = sum(stats['prompt_lengths']) / stats['total_examples']
stats['avg_completion_length'] = sum(stats['completion_lengths']) / stats['total_examples']
return {
'valid': len(issues) == 0,
'issues': issues,
'stats': stats
}
# Validate data
validation_result = validate_training_data('bedrock_training_data.jsonl')
if validation_result['valid']:
print(f"✓ Data valid: {validation_result['stats']['total_examples']} examples")
else:
print(f"✗ Found {len(validation_result['issues'])} issues:")
for issue in validation_result['issues']:
print(f" - {issue}")
Creating Fine-Tuning Job
import boto3
import json
from datetime import datetime
class BedrockFineTuner:
def __init__(self, region='ap-southeast-2'):
self.bedrock = boto3.client('bedrock', region_name=region)
self.s3 = boto3.client('s3', region_name=region)
self.region = region
def upload_training_data(self, local_file: str, bucket: str) -> str:
"""Upload training data to S3."""
s3_key = f"bedrock-training/{datetime.now().strftime('%Y%m%d')}/{local_file}"
self.s3.upload_file(
local_file,
bucket,
s3_key,
ExtraArgs={'ServerSideEncryption': 'AES256'}
)
return f"s3://{bucket}/{s3_key}"
def create_fine_tuning_job(
self,
job_name: str,
base_model_id: str,
training_data_s3: str,
output_s3: str,
hyperparameters: dict = None
) -> str:
"""Create Bedrock fine-tuning job."""
if hyperparameters is None:
hyperparameters = {
'epochCount': '3',
'batchSize': '4',
'learningRate': '0.00001'
}
response = self.bedrock.create_model_customization_job(
jobName=job_name,
customModelName=f"{job_name}-model",
roleArn=self._get_bedrock_role_arn(),
baseModelIdentifier=base_model_id,
trainingDataConfig={
's3Uri': training_data_s3
},
outputDataConfig={
's3Uri': output_s3
},
hyperParameters=hyperparameters
)
return response['jobArn']
def monitor_training_job(self, job_arn: str) -> dict:
"""Monitor fine-tuning job progress."""
response = self.bedrock.get_model_customization_job(
jobIdentifier=job_arn
)
return {
'status': response['status'],
'failure_message': response.get('failureMessage'),
'output_model_arn': response.get('outputModelArn')
}
# Usage
tuner = BedrockFineTuner()
# Upload training data
training_s3 = tuner.upload_training_data(
'bedrock_training_data.jsonl',
'my-bedrock-training-bucket'
)
# Create fine-tuning job
job_arn = tuner.create_fine_tuning_job(
job_name='australian-property-listings',
base_model_id='amazon.titan-text-express-v1',
training_data_s3=training_s3,
output_s3='s3://my-bedrock-training-bucket/outputs/',
hyperparameters={
'epochCount': '5',
'batchSize': '8',
'learningRate': '0.00001'
}
)
print(f"Fine-tuning job created: {job_arn}")
# Monitor progress
status = tuner.monitor_training_job(job_arn)
print(f"Job status: {status['status']}")
Using Fine-Tuned Model
def invoke_fine_tuned_model(model_arn: str, prompt: str) -> str:
"""Use fine-tuned Bedrock model for inference."""
bedrock_runtime = boto3.client('bedrock-runtime', region_name='ap-southeast-2')
response = bedrock_runtime.invoke_model(
modelId=model_arn, # Use fine-tuned model ARN
body=json.dumps({
"inputText": prompt,
"textGenerationConfig": {
"maxTokenCount": 1000,
"temperature": 0.7,
"topP": 0.9
}
})
)
result = json.loads(response['body'].read())
return result['results'][0]['outputText']
# Usage
fine_tuned_model_arn = "arn:aws:bedrock:ap-southeast-2:123456789012:custom-model/..."
listing = invoke_fine_tuned_model(
fine_tuned_model_arn,
"Create listing for: 4 bed house, Brisbane, pool, $1.2M"
)
print(listing)
Fine-Tuning on SageMaker
Advantages of SageMaker Fine-Tuning
- Support for more model types
- Fine-grained control over training
- Custom loss functions and metrics
- Advanced hyperparameter tuning
- Distributed training support
- Integration with MLOps pipelines
Preparing Training Data
Example: Fine-tuning for Australian customer support:
import pandas as pd
import boto3
from sagemaker.huggingface import HuggingFace
from sagemaker import Session
class SageMakerFineTuner:
def __init__(self):
self.session = Session()
self.role = self._get_sagemaker_role()
self.bucket = self.session.default_bucket()
def prepare_dataset(self, examples: list) -> str:
"""Prepare dataset for Hugging Face fine-tuning."""
# Convert to DataFrame
df = pd.DataFrame(examples)
# Split train/validation
train_df = df.sample(frac=0.9, random_state=42)
val_df = df.drop(train_df.index)
# Save as CSV
train_df.to_csv('train.csv', index=False)
val_df.to_csv('validation.csv', index=False)
# Upload to S3
train_s3 = self._upload_to_s3('train.csv', 'training/train.csv')
val_s3 = self._upload_to_s3('validation.csv', 'training/validation.csv')
return train_s3, val_s3
def fine_tune_model(
self,
base_model: str,
train_s3: str,
val_s3: str,
hyperparameters: dict = None
) -> HuggingFace:
"""Fine-tune model using SageMaker."""
if hyperparameters is None:
hyperparameters = {
'epochs': 3,
'train_batch_size': 4,
'eval_batch_size': 4,
'learning_rate': 2e-5,
'model_name': base_model,
'fp16': True
}
# Configure training job
huggingface_estimator = HuggingFace(
entry_point='train.py',
source_dir='./scripts',
instance_type='ml.p3.2xlarge', # GPU instance
instance_count=1,
role=self.role,
transformers_version='4.26',
pytorch_version='1.13',
py_version='py39',
hyperparameters=hyperparameters,
output_path=f's3://{self.bucket}/outputs',
base_job_name='australian-support-finetuning'
)
# Start training
huggingface_estimator.fit({
'train': train_s3,
'validation': val_s3
})
return huggingface_estimator
# Example training data for Australian customer support
support_examples = [
{
"input": "Customer: How do I return a product?",
"output": "To return a product, please follow these steps:\n\n1. Log into your account at [website]\n2. Navigate to 'My Orders'\n3. Select the order containing the item to return\n4. Click 'Return Item' and select a reason\n5. Print the prepaid Australia Post return label\n6. Drop off at any Australia Post location\n\nRefunds are processed within 5-7 business days of receiving your return. For more information, visit our Returns Policy or contact our Australian support team on 1300 XXX XXX."
},
{
"input": "Customer: What are your delivery times to regional Australia?",
"output": "Delivery times to regional Australia vary by location:\n\n• Metro areas (Sydney, Melbourne, Brisbane): 2-3 business days\n• Regional cities: 3-5 business days\n• Remote areas: 5-10 business days\n\nWe use Australia Post and StarTrack for regional deliveries. You'll receive tracking information via email once your order ships. For urgent orders to regional areas, express shipping is available at checkout.\n\nNote: Delivery times exclude weekends and public holidays."
},
{
"input": "Customer: Do you charge GST?",
"output": "Yes, all prices on our website include GST (Goods and Services Tax) at the current rate of 10%, as required by Australian law.\n\nYour tax invoice will clearly show:\n• Base price (excluding GST)\n• GST amount\n• Total price (including GST)\n\nFor business customers, the tax invoice serves as proof of GST paid for your tax returns. If you need a copy of your tax invoice, you can download it from your account or contact our support team."
}
]
# Prepare and fine-tune
tuner = SageMakerFineTuner()
train_s3, val_s3 = tuner.prepare_dataset(support_examples)
estimator = tuner.fine_tune_model(
base_model='meta-llama/Llama-2-7b-hf',
train_s3=train_s3,
val_s3=val_s3
)
Training Script
Create scripts/train.py:
import argparse
import os
from transformers import (
AutoModelForCausalLM,
AutoTokenizer,
TrainingArguments,
Trainer,
DataCollatorForLanguageModeling
)
from datasets import load_dataset
import torch
def parse_args():
parser = argparse.ArgumentParser()
# Hyperparameters
parser.add_argument('--epochs', type=int, default=3)
parser.add_argument('--train_batch_size', type=int, default=4)
parser.add_argument('--eval_batch_size', type=int, default=4)
parser.add_argument('--learning_rate', type=float, default=2e-5)
parser.add_argument('--model_name', type=str, default='gpt2')
parser.add_argument('--fp16', type=bool, default=True)
# SageMaker parameters
parser.add_argument('--output_data_dir', type=str, default=os.environ['SM_OUTPUT_DATA_DIR'])
parser.add_argument('--model_dir', type=str, default=os.environ['SM_MODEL_DIR'])
parser.add_argument('--train', type=str, default=os.environ['SM_CHANNEL_TRAIN'])
parser.add_argument('--validation', type=str, default=os.environ['SM_CHANNEL_VALIDATION'])
return parser.parse_args()
def prepare_data(tokenizer, train_path, val_path):
"""Load and tokenize training data."""
# Load datasets
dataset = load_dataset('csv', data_files={
'train': f'{train_path}/train.csv',
'validation': f'{val_path}/validation.csv'
})
def tokenize_function(examples):
# Combine input and output
texts = [
f"Input: {inp}\nOutput: {out}"
for inp, out in zip(examples['input'], examples['output'])
]
return tokenizer(
texts,
truncation=True,
padding='max_length',
max_length=512
)
tokenized_datasets = dataset.map(
tokenize_function,
batched=True,
remove_columns=dataset['train'].column_names
)
return tokenized_datasets
def train(args):
"""Main training function."""
# Load tokenizer and model
tokenizer = AutoTokenizer.from_pretrained(args.model_name)
tokenizer.pad_token = tokenizer.eos_token
model = AutoModelForCausalLM.from_pretrained(args.model_name)
# Prepare data
datasets = prepare_data(tokenizer, args.train, args.validation)
# Data collator
data_collator = DataCollatorForLanguageModeling(
tokenizer=tokenizer,
mlm=False
)
# Training arguments
training_args = TrainingArguments(
output_dir=args.output_data_dir,
num_train_epochs=args.epochs,
per_device_train_batch_size=args.train_batch_size,
per_device_eval_batch_size=args.eval_batch_size,
learning_rate=args.learning_rate,
fp16=args.fp16,
evaluation_strategy='epoch',
save_strategy='epoch',
load_best_model_at_end=True,
logging_dir=f'{args.output_data_dir}/logs',
logging_steps=10,
report_to=['tensorboard']
)
# Trainer
trainer = Trainer(
model=model,
args=training_args,
train_dataset=datasets['train'],
eval_dataset=datasets['validation'],
data_collator=data_collator
)
# Train
trainer.train()
# Save model
trainer.save_model(args.model_dir)
tokenizer.save_pretrained(args.model_dir)
if __name__ == '__main__':
args = parse_args()
train(args)
Deploying Fine-Tuned Model
from sagemaker.huggingface import HuggingFaceModel
def deploy_fine_tuned_model(estimator) -> str:
"""Deploy fine-tuned model to SageMaker endpoint."""
# Create HuggingFace model from training job
huggingface_model = HuggingFaceModel(
model_data=estimator.model_data,
role=estimator.role,
transformers_version='4.26',
pytorch_version='1.13',
py_version='py39',
entry_point='inference.py',
source_dir='./scripts'
)
# Deploy to endpoint
predictor = huggingface_model.deploy(
initial_instance_count=1,
instance_type='ml.g4dn.xlarge', # GPU instance for inference
endpoint_name='australian-support-endpoint'
)
return predictor.endpoint_name
# Deploy
endpoint_name = deploy_fine_tuned_model(estimator)
print(f"Model deployed to: {endpoint_name}")
Using Deployed Model
import boto3
import json
def get_support_response(query: str, endpoint_name: str) -> str:
"""Get response from fine-tuned support model."""
runtime = boto3.client('sagemaker-runtime', region_name='ap-southeast-2')
response = runtime.invoke_endpoint(
EndpointName=endpoint_name,
ContentType='application/json',
Body=json.dumps({
'inputs': f"Input: {query}\nOutput:",
'parameters': {
'max_length': 300,
'temperature': 0.7,
'top_p': 0.9
}
})
)
result = json.loads(response['Body'].read())
return result[0]['generated_text']
# Usage
response = get_support_response(
"What is your shipping policy for Tasmania?",
"australian-support-endpoint"
)
print(response)
Evaluation and Testing
Evaluation Metrics
from sklearn.metrics import accuracy_score, precision_recall_fscore_support
import numpy as np
class ModelEvaluator:
def __init__(self, model_endpoint: str):
self.endpoint = model_endpoint
self.runtime = boto3.client('sagemaker-runtime', region_name='ap-southeast-2')
def evaluate_model(self, test_examples: list) -> dict:
"""Evaluate fine-tuned model on test set."""
predictions = []
ground_truth = []
for example in test_examples:
# Get prediction
prediction = self._get_prediction(example['input'])
predictions.append(prediction)
ground_truth.append(example['output'])
# Calculate metrics
metrics = {
'exact_match': self._calculate_exact_match(predictions, ground_truth),
'bleu_score': self._calculate_bleu(predictions, ground_truth),
'rouge_scores': self._calculate_rouge(predictions, ground_truth),
'semantic_similarity': self._calculate_semantic_similarity(predictions, ground_truth)
}
return metrics
def _calculate_exact_match(self, predictions: list, ground_truth: list) -> float:
"""Calculate exact match accuracy."""
matches = sum(p.strip() == gt.strip() for p, gt in zip(predictions, ground_truth))
return matches / len(predictions)
def _calculate_bleu(self, predictions: list, ground_truth: list) -> float:
"""Calculate BLEU score for text generation quality."""
from nltk.translate.bleu_score import sentence_bleu
scores = []
for pred, ref in zip(predictions, ground_truth):
score = sentence_bleu([ref.split()], pred.split())
scores.append(score)
return np.mean(scores)
def human_evaluation_template(self, predictions: list, ground_truth: list) -> pd.DataFrame:
"""Create template for human evaluation."""
eval_df = pd.DataFrame({
'input': [ex['input'] for ex in test_examples],
'model_output': predictions,
'expected_output': ground_truth,
'relevance_score': '', # 1-5
'accuracy_score': '', # 1-5
'australian_context': '', # 1-5
'tone_appropriateness': '', # 1-5
'comments': ''
})
eval_df.to_csv('human_evaluation_template.csv', index=False)
return eval_df
# Evaluate
evaluator = ModelEvaluator('australian-support-endpoint')
test_examples = [
{
"input": "How long does shipping take to Perth?",
"output": "Shipping to Perth typically takes 3-4 business days from our Sydney warehouse using Australia Post Express. For standard shipping, please allow 5-7 business days. Tracking information will be emailed once your order ships."
}
]
metrics = evaluator.evaluate_model(test_examples)
print(f"Model performance: {metrics}")
A/B Testing
class ABTestManager:
def __init__(self, base_model: str, fine_tuned_model: str):
self.base_model = base_model
self.fine_tuned_model = fine_tuned_model
self.runtime = boto3.client('bedrock-runtime', region_name='ap-southeast-2')
def route_request(self, user_id: str, query: str) -> dict:
"""Route request between base and fine-tuned model for A/B testing."""
# Route 50% to each model based on user_id hash
use_fine_tuned = hash(user_id) % 2 == 0
if use_fine_tuned:
response = self._invoke_model(self.fine_tuned_model, query)
variant = 'fine_tuned'
else:
response = self._invoke_model(self.base_model, query)
variant = 'base'
# Log for analysis
self._log_experiment(user_id, query, response, variant)
return {
'response': response,
'variant': variant
}
def analyse_results(self) -> dict:
"""Analyse A/B test results."""
# Query logs from CloudWatch or database
results = self._get_experiment_logs()
analysis = {
'base_model': {
'avg_response_time': self._calculate_avg_latency(results, 'base'),
'user_satisfaction': self._calculate_satisfaction(results, 'base'),
'task_completion': self._calculate_completion_rate(results, 'base')
},
'fine_tuned_model': {
'avg_response_time': self._calculate_avg_latency(results, 'fine_tuned'),
'user_satisfaction': self._calculate_satisfaction(results, 'fine_tuned'),
'task_completion': self._calculate_completion_rate(results, 'fine_tuned')
}
}
return analysis
Cost Optimisation
Bedrock Fine-Tuning Costs
def calculate_bedrock_finetuning_cost(
num_tokens: int,
num_epochs: int,
model_storage_months: int = 12
) -> dict:
"""
Calculate Bedrock fine-tuning costs (example prices in AUD).
Prices vary by model and region. Check AWS pricing for current rates.
"""
# Example pricing (convert USD to AUD, ~1.5x)
training_cost_per_token = 0.000012 * 1.5 # Per token
storage_cost_per_month = 2.25 * 1.5 # Per model per month
training_cost = (num_tokens * num_epochs * training_cost_per_token)
storage_cost = storage_cost_per_month * model_storage_months
return {
'training_cost_aud': round(training_cost, 2),
'storage_cost_aud': round(storage_cost, 2),
'total_cost_aud': round(training_cost + storage_cost, 2)
}
# Example: 1M tokens, 3 epochs, 12 months storage
cost = calculate_bedrock_finetuning_cost(1_000_000, 3, 12)
print(f"Total cost: ${cost['total_cost_aud']} AUD")
SageMaker Fine-Tuning Costs
def calculate_sagemaker_finetuning_cost(
instance_type: str,
training_hours: float,
endpoint_instance_type: str,
endpoint_hours_per_month: int = 730
) -> dict:
"""Calculate SageMaker fine-tuning and deployment costs."""
# Prices in AUD per hour (ap-southeast-2)
training_prices = {
'ml.p3.2xlarge': 4.862, # GPU for training
'ml.p3.8xlarge': 19.448,
'ml.g4dn.xlarge': 0.877
}
inference_prices = {
'ml.t2.medium': 0.065,
'ml.m5.large': 0.134,
'ml.g4dn.xlarge': 0.877
}
training_cost = training_prices[instance_type] * training_hours
inference_cost = inference_prices[endpoint_instance_type] * endpoint_hours_per_month
return {
'training_cost_aud': round(training_cost, 2),
'monthly_inference_cost_aud': round(inference_cost, 2),
'annual_cost_aud': round(training_cost + (inference_cost * 12), 2)
}
# Example: 4 hours training on p3.2xlarge, deploy on g4dn.xlarge
cost = calculate_sagemaker_finetuning_cost(
'ml.p3.2xlarge',
4,
'ml.g4dn.xlarge'
)
print(f"Training: ${cost['training_cost_aud']} AUD")
print(f"Monthly inference: ${cost['monthly_inference_cost_aud']} AUD")
print(f"Annual total: ${cost['annual_cost_aud']} AUD")
Cost Reduction Strategies
1. Data Efficiency:
# Use fewer, higher-quality examples
# 500 great examples > 5000 mediocre examples
2. Hyperparameter Optimisation:
# Reduce epochs if validation loss plateaus early
# Start with 3 epochs, increase only if needed
3. Instance Selection:
# Training: Use Spot instances for cost savings
# Inference: Right-size based on traffic patterns
4. Model Pruning:
# Use smaller base models when possible
# Fine-tune 7B model instead of 70B if performance is acceptable
Australian Compliance
Data Privacy
Privacy Act Considerations:
def anonymise_training_data(examples: list) -> list:
"""Remove PII from training data."""
import re
def redact_pii(text: str) -> str:
# Redact email addresses
text = re.sub(r'\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b', '[EMAIL]', text)
# Redact phone numbers
text = re.sub(r'\b(?:\+?61|0)[2-478](?:[ -]?[0-9]){8}\b', '[PHONE]', text)
# Redact names (basic approach, consider NER for better results)
text = re.sub(r'\b[A-Z][a-z]+ [A-Z][a-z]+\b', '[NAME]', text)
return text
anonymised = []
for example in examples:
anonymised.append({
'prompt': redact_pii(example['prompt']),
'completion': redact_pii(example['completion'])
})
return anonymised
Model Governance
Documentation template:
model_card = {
"model_details": {
"name": "Australian Customer Support Assistant",
"version": "1.0",
"base_model": "anthropic.claude-v2",
"fine_tuned_on": "2025-11-30",
"owner": "Customer Support Team"
},
"intended_use": {
"primary_use": "Automated customer support responses",
"out_of_scope": "Medical advice, legal advice, financial recommendations"
},
"training_data": {
"source": "Historical support tickets (anonymised)",
"size": "5,000 examples",
"date_range": "2024-01-01 to 2025-10-31",
"pii_handling": "All PII removed before training"
},
"evaluation": {
"test_accuracy": "92%",
"human_evaluation_score": "4.2/5",
"last_evaluated": "2025-11-30"
},
"australian_compliance": {
"privacy_act": "Compliant - No PII in training data",
"data_sovereignty": "Model trained and hosted in ap-southeast-2",
"retention_policy": "Training data deleted after 90 days"
}
}
Best Practices
1. Start Small
# Begin with small dataset
initial_examples = 100
# Evaluate results
# If performance good, deploy
# If not, add more examples incrementally
2. Quality Over Quantity
Focus on high-quality, diverse examples that represent real use cases.
3. Continuous Evaluation
# Monitor model performance in production
# Collect examples where model fails
# Retrain periodically with new data
4. Version Control
# Track model versions
# Document changes between versions
# Maintain rollback capability
5. Cost Monitoring
# Set CloudWatch alarms for training costs
# Monitor inference costs
# Review and optimise quarterly
Conclusion
Fine-tuning foundation models on AWS enables Australian businesses to create AI systems tailored to their specific needs, industry terminology, and local context. Whether using Bedrock for simplicity or SageMaker for advanced control, fine-tuning provides a middle ground between generic models and building from scratch.
Key takeaways:
- Fine-tune when you need domain expertise or specific output formats
- Use RAG when you need factual, updatable information
- Start with high-quality data, even if small
- Evaluate thoroughly before production deployment
- Monitor costs and performance continuously
CloudPoint helps Australian businesses implement fine-tuning strategies, from data preparation to production deployment. We ensure your fine-tuned models meet performance requirements while maintaining compliance with Australian regulations.
Contact us for a fine-tuning implementation consultation and build AI that understands your business.
Need Help Fine-Tuning Models?
CloudPoint helps Australian businesses fine-tune and deploy custom AI models on AWS. Get in touch to discuss your requirements.