Fine-Tuning Foundation Models on AWS: Complete Implementation Guide

Fine-tuning foundation models allows you to customise pre-trained AI models for your specific use case, domain, or industry. For Australian businesses, fine-tuning enables AI systems that understand local context, terminology, and compliance requirements while maintaining the power of foundation models. This guide covers practical fine-tuning strategies using AWS services.

What is Fine-Tuning?

Fine-tuning is the process of taking a pre-trained foundation model and training it further on your specific data. Unlike training from scratch, fine-tuning preserves the model’s general knowledge while adapting it to your domain.

Fine-Tuning vs Training from Scratch

Training from Scratch:

Requires massive datasets (billions of tokens)
Costs millions of dollars
Takes months to complete
Needs extensive ML expertise
High computational requirements

Fine-Tuning:

Requires modest datasets (thousands of examples)
Costs hundreds to thousands of dollars
Takes hours to days
Moderate ML expertise needed
Reasonable computational requirements

Fine-Tuning vs RAG

RAG (Retrieval Augmented Generation):

Adds external knowledge at inference time
No model training required
Quick to implement
Good for factual information
Higher latency per request
Ongoing retrieval costs

Fine-Tuning:

Embeds knowledge in model weights
Requires training process
Longer implementation time
Good for style, format, reasoning patterns
Lower inference latency
One-time training cost

When to Fine-Tune

Ideal Use Cases

1. Domain-Specific Terminology:

# Example: Australian legal terminology
training_examples = [
    {
        "prompt": "What is a deed of company arrangement?",
        "completion": "A Deed of Company Arrangement (DOCA) is a binding agreement between a company and its creditors governing how the company's affairs will be dealt with. It's an alternative to liquidation under Australian insolvency law, allowing a company to continue trading while addressing its debt obligations."
    },
    {
        "prompt": "Explain fair work entitlements",
        "completion": "Under the Fair Work Act 2009, Australian employees are entitled to minimum conditions including: National Employment Standards (NES), minimum wage, annual leave, personal/carer's leave, parental leave, notice of termination, and protection from unfair dismissal. Award and enterprise agreement conditions may provide additional entitlements."
    }
]

2. Specific Output Formats:

# Example: Structured data extraction
training_examples = [
    {
        "prompt": "Extract invoice details: Invoice #INV-2025-001, Date: 15/01/2025, Total: $1,234.56 AUD",
        "completion": '{"invoice_number": "INV-2025-001", "date": "2025-01-15", "total_aud": 1234.56, "currency": "AUD"}'
    }
]

3. Brand Voice and Style:

# Example: Company-specific communication style
training_examples = [
    {
        "prompt": "Draft welcome email for new Australian customer",
        "completion": "G'day and welcome to [Company]!\n\nWe're thrilled to have you on board. As a proudly Australian company, we're committed to delivering exceptional service and supporting local businesses...\n\nCheers,\nThe [Company] Team"
    }
]

4. Industry-Specific Reasoning:

Healthcare diagnostics
Financial risk assessment
Legal contract analysis
Technical support
Regulatory compliance

When NOT to Fine-Tune

Use RAG instead when:

Need to update information frequently
Working with factual knowledge bases
Information changes regularly
Don’t have training data
Need explainable sources

Use prompt engineering when:

Base model already performs well
Need quick iteration
Limited training data
Behaviour can be specified in prompts

Fine-Tuning on Amazon Bedrock

Supported Models

Amazon Bedrock supports fine-tuning for:

Amazon Titan Text models
Cohere Command models
Meta Llama 2 models

Data Preparation

Training data format (JSONL):

import json

def prepare_bedrock_training_data(examples: list) -> str:
    """
    Prepare training data in Bedrock format.

    Each line is a JSON object with prompt and completion.
    """
    training_data = []

    for example in examples:
        # Bedrock format
        training_record = {
            "prompt": example["prompt"],
            "completion": example["completion"]
        }
        training_data.append(json.dumps(training_record))

    # Write to JSONL file
    output_file = "bedrock_training_data.jsonl"
    with open(output_file, 'w') as f:
        f.write('\n'.join(training_data))

    return output_file

# Example: Australian property listing generation
examples = [
    {
        "prompt": "Create listing for: 3 bed, 2 bath house, Bondi, Sydney, $2.5M",
        "completion": """Stunning Bondi Beach Residence

This beautifully appointed 3-bedroom, 2-bathroom home epitomises coastal living in one of Sydney's most sought-after locations. Just moments from Bondi Beach, this property offers the perfect blend of lifestyle and luxury.

Features:
• 3 spacious bedrooms with built-in wardrobes
• 2 modern bathrooms with quality fixtures
• Open-plan living and dining
• Designer kitchen with premium appliances
• Private outdoor entertaining area
• Walking distance to beach, cafes, and transport

Price: $2,500,000
Contact: [Agent details]"""
    },
    {
        "prompt": "Create listing for: 2 bed apartment, Melbourne CBD, $850K",
        "completion": """Contemporary Melbourne CBD Apartment

Embrace urban sophistication in this modern 2-bedroom apartment located in the heart of Melbourne's CBD. Perfect for professionals or investors seeking premium city living.

Features:
• 2 generous bedrooms with mirrored built-ins
• Designer bathroom with floor-to-ceiling tiles
• Open-plan living with city views
• European laundry facilities
• Secure car space
• Building amenities: gym, concierge, rooftop terrace

Price: $850,000
Contact: [Agent details]"""
    }
]

# Prepare training data
training_file = prepare_bedrock_training_data(examples)

Data Quality Guidelines

Minimum requirements:

32 examples minimum (more is better)
500+ examples recommended for good results
Balanced representation of use cases
High-quality, human-written completions
Consistent formatting

Data validation:

import json
from collections import Counter

def validate_training_data(jsonl_file: str) -> dict:
    """Validate Bedrock training data quality."""

    issues = []
    stats = {
        'total_examples': 0,
        'avg_prompt_length': 0,
        'avg_completion_length': 0,
        'prompt_lengths': [],
        'completion_lengths': []
    }

    with open(jsonl_file, 'r') as f:
        for line_num, line in enumerate(f, 1):
            try:
                record = json.loads(line)

                # Check required fields
                if 'prompt' not in record or 'completion' not in record:
                    issues.append(f"Line {line_num}: Missing prompt or completion")
                    continue

                prompt_len = len(record['prompt'])
                completion_len = len(record['completion'])

                # Check lengths
                if prompt_len < 10:
                    issues.append(f"Line {line_num}: Prompt too short ({prompt_len} chars)")

                if completion_len < 20:
                    issues.append(f"Line {line_num}: Completion too short ({completion_len} chars)")

                if completion_len > 4000:
                    issues.append(f"Line {line_num}: Completion too long ({completion_len} chars)")

                stats['prompt_lengths'].append(prompt_len)
                stats['completion_lengths'].append(completion_len)
                stats['total_examples'] += 1

            except json.JSONDecodeError:
                issues.append(f"Line {line_num}: Invalid JSON")

    # Calculate stats
    if stats['total_examples'] > 0:
        stats['avg_prompt_length'] = sum(stats['prompt_lengths']) / stats['total_examples']
        stats['avg_completion_length'] = sum(stats['completion_lengths']) / stats['total_examples']

    return {
        'valid': len(issues) == 0,
        'issues': issues,
        'stats': stats
    }

# Validate data
validation_result = validate_training_data('bedrock_training_data.jsonl')

if validation_result['valid']:
    print(f"✓ Data valid: {validation_result['stats']['total_examples']} examples")
else:
    print(f"✗ Found {len(validation_result['issues'])} issues:")
    for issue in validation_result['issues']:
        print(f"  - {issue}")

Creating Fine-Tuning Job

import boto3
import json
from datetime import datetime

class BedrockFineTuner:
    def __init__(self, region='ap-southeast-2'):
        self.bedrock = boto3.client('bedrock', region_name=region)
        self.s3 = boto3.client('s3', region_name=region)
        self.region = region

    def upload_training_data(self, local_file: str, bucket: str) -> str:
        """Upload training data to S3."""

        s3_key = f"bedrock-training/{datetime.now().strftime('%Y%m%d')}/{local_file}"

        self.s3.upload_file(
            local_file,
            bucket,
            s3_key,
            ExtraArgs={'ServerSideEncryption': 'AES256'}
        )

        return f"s3://{bucket}/{s3_key}"

    def create_fine_tuning_job(
        self,
        job_name: str,
        base_model_id: str,
        training_data_s3: str,
        output_s3: str,
        hyperparameters: dict = None
    ) -> str:
        """Create Bedrock fine-tuning job."""

        if hyperparameters is None:
            hyperparameters = {
                'epochCount': '3',
                'batchSize': '4',
                'learningRate': '0.00001'
            }

        response = self.bedrock.create_model_customization_job(
            jobName=job_name,
            customModelName=f"{job_name}-model",
            roleArn=self._get_bedrock_role_arn(),
            baseModelIdentifier=base_model_id,
            trainingDataConfig={
                's3Uri': training_data_s3
            },
            outputDataConfig={
                's3Uri': output_s3
            },
            hyperParameters=hyperparameters
        )

        return response['jobArn']

    def monitor_training_job(self, job_arn: str) -> dict:
        """Monitor fine-tuning job progress."""

        response = self.bedrock.get_model_customization_job(
            jobIdentifier=job_arn
        )

        return {
            'status': response['status'],
            'failure_message': response.get('failureMessage'),
            'output_model_arn': response.get('outputModelArn')
        }

# Usage
tuner = BedrockFineTuner()

# Upload training data
training_s3 = tuner.upload_training_data(
    'bedrock_training_data.jsonl',
    'my-bedrock-training-bucket'
)

# Create fine-tuning job
job_arn = tuner.create_fine_tuning_job(
    job_name='australian-property-listings',
    base_model_id='amazon.titan-text-express-v1',
    training_data_s3=training_s3,
    output_s3='s3://my-bedrock-training-bucket/outputs/',
    hyperparameters={
        'epochCount': '5',
        'batchSize': '8',
        'learningRate': '0.00001'
    }
)

print(f"Fine-tuning job created: {job_arn}")

# Monitor progress
status = tuner.monitor_training_job(job_arn)
print(f"Job status: {status['status']}")

Using Fine-Tuned Model

def invoke_fine_tuned_model(model_arn: str, prompt: str) -> str:
    """Use fine-tuned Bedrock model for inference."""

    bedrock_runtime = boto3.client('bedrock-runtime', region_name='ap-southeast-2')

    response = bedrock_runtime.invoke_model(
        modelId=model_arn,  # Use fine-tuned model ARN
        body=json.dumps({
            "inputText": prompt,
            "textGenerationConfig": {
                "maxTokenCount": 1000,
                "temperature": 0.7,
                "topP": 0.9
            }
        })
    )

    result = json.loads(response['body'].read())
    return result['results'][0]['outputText']

# Usage
fine_tuned_model_arn = "arn:aws:bedrock:ap-southeast-2:123456789012:custom-model/..."

listing = invoke_fine_tuned_model(
    fine_tuned_model_arn,
    "Create listing for: 4 bed house, Brisbane, pool, $1.2M"
)

print(listing)

Fine-Tuning on SageMaker

Advantages of SageMaker Fine-Tuning

Support for more model types
Fine-grained control over training
Custom loss functions and metrics
Advanced hyperparameter tuning
Distributed training support
Integration with MLOps pipelines

Preparing Training Data

Example: Fine-tuning for Australian customer support:

import pandas as pd
import boto3
from sagemaker.huggingface import HuggingFace
from sagemaker import Session

class SageMakerFineTuner:
    def __init__(self):
        self.session = Session()
        self.role = self._get_sagemaker_role()
        self.bucket = self.session.default_bucket()

    def prepare_dataset(self, examples: list) -> str:
        """Prepare dataset for Hugging Face fine-tuning."""

        # Convert to DataFrame
        df = pd.DataFrame(examples)

        # Split train/validation
        train_df = df.sample(frac=0.9, random_state=42)
        val_df = df.drop(train_df.index)

        # Save as CSV
        train_df.to_csv('train.csv', index=False)
        val_df.to_csv('validation.csv', index=False)

        # Upload to S3
        train_s3 = self._upload_to_s3('train.csv', 'training/train.csv')
        val_s3 = self._upload_to_s3('validation.csv', 'training/validation.csv')

        return train_s3, val_s3

    def fine_tune_model(
        self,
        base_model: str,
        train_s3: str,
        val_s3: str,
        hyperparameters: dict = None
    ) -> HuggingFace:
        """Fine-tune model using SageMaker."""

        if hyperparameters is None:
            hyperparameters = {
                'epochs': 3,
                'train_batch_size': 4,
                'eval_batch_size': 4,
                'learning_rate': 2e-5,
                'model_name': base_model,
                'fp16': True
            }

        # Configure training job
        huggingface_estimator = HuggingFace(
            entry_point='train.py',
            source_dir='./scripts',
            instance_type='ml.p3.2xlarge',  # GPU instance
            instance_count=1,
            role=self.role,
            transformers_version='4.26',
            pytorch_version='1.13',
            py_version='py39',
            hyperparameters=hyperparameters,
            output_path=f's3://{self.bucket}/outputs',
            base_job_name='australian-support-finetuning'
        )

        # Start training
        huggingface_estimator.fit({
            'train': train_s3,
            'validation': val_s3
        })

        return huggingface_estimator

# Example training data for Australian customer support
support_examples = [
    {
        "input": "Customer: How do I return a product?",
        "output": "To return a product, please follow these steps:\n\n1. Log into your account at [website]\n2. Navigate to 'My Orders'\n3. Select the order containing the item to return\n4. Click 'Return Item' and select a reason\n5. Print the prepaid Australia Post return label\n6. Drop off at any Australia Post location\n\nRefunds are processed within 5-7 business days of receiving your return. For more information, visit our Returns Policy or contact our Australian support team on 1300 XXX XXX."
    },
    {
        "input": "Customer: What are your delivery times to regional Australia?",
        "output": "Delivery times to regional Australia vary by location:\n\n• Metro areas (Sydney, Melbourne, Brisbane): 2-3 business days\n• Regional cities: 3-5 business days\n• Remote areas: 5-10 business days\n\nWe use Australia Post and StarTrack for regional deliveries. You'll receive tracking information via email once your order ships. For urgent orders to regional areas, express shipping is available at checkout.\n\nNote: Delivery times exclude weekends and public holidays."
    },
    {
        "input": "Customer: Do you charge GST?",
        "output": "Yes, all prices on our website include GST (Goods and Services Tax) at the current rate of 10%, as required by Australian law.\n\nYour tax invoice will clearly show:\n• Base price (excluding GST)\n• GST amount\n• Total price (including GST)\n\nFor business customers, the tax invoice serves as proof of GST paid for your tax returns. If you need a copy of your tax invoice, you can download it from your account or contact our support team."
    }
]

# Prepare and fine-tune
tuner = SageMakerFineTuner()
train_s3, val_s3 = tuner.prepare_dataset(support_examples)

estimator = tuner.fine_tune_model(
    base_model='meta-llama/Llama-2-7b-hf',
    train_s3=train_s3,
    val_s3=val_s3
)

Training Script

Create scripts/train.py:

import argparse
import os
from transformers import (
    AutoModelForCausalLM,
    AutoTokenizer,
    TrainingArguments,
    Trainer,
    DataCollatorForLanguageModeling
)
from datasets import load_dataset
import torch

def parse_args():
    parser = argparse.ArgumentParser()

    # Hyperparameters
    parser.add_argument('--epochs', type=int, default=3)
    parser.add_argument('--train_batch_size', type=int, default=4)
    parser.add_argument('--eval_batch_size', type=int, default=4)
    parser.add_argument('--learning_rate', type=float, default=2e-5)
    parser.add_argument('--model_name', type=str, default='gpt2')
    parser.add_argument('--fp16', type=bool, default=True)

    # SageMaker parameters
    parser.add_argument('--output_data_dir', type=str, default=os.environ['SM_OUTPUT_DATA_DIR'])
    parser.add_argument('--model_dir', type=str, default=os.environ['SM_MODEL_DIR'])
    parser.add_argument('--train', type=str, default=os.environ['SM_CHANNEL_TRAIN'])
    parser.add_argument('--validation', type=str, default=os.environ['SM_CHANNEL_VALIDATION'])

    return parser.parse_args()

def prepare_data(tokenizer, train_path, val_path):
    """Load and tokenize training data."""

    # Load datasets
    dataset = load_dataset('csv', data_files={
        'train': f'{train_path}/train.csv',
        'validation': f'{val_path}/validation.csv'
    })

    def tokenize_function(examples):
        # Combine input and output
        texts = [
            f"Input: {inp}\nOutput: {out}"
            for inp, out in zip(examples['input'], examples['output'])
        ]

        return tokenizer(
            texts,
            truncation=True,
            padding='max_length',
            max_length=512
        )

    tokenized_datasets = dataset.map(
        tokenize_function,
        batched=True,
        remove_columns=dataset['train'].column_names
    )

    return tokenized_datasets

def train(args):
    """Main training function."""

    # Load tokenizer and model
    tokenizer = AutoTokenizer.from_pretrained(args.model_name)
    tokenizer.pad_token = tokenizer.eos_token

    model = AutoModelForCausalLM.from_pretrained(args.model_name)

    # Prepare data
    datasets = prepare_data(tokenizer, args.train, args.validation)

    # Data collator
    data_collator = DataCollatorForLanguageModeling(
        tokenizer=tokenizer,
        mlm=False
    )

    # Training arguments
    training_args = TrainingArguments(
        output_dir=args.output_data_dir,
        num_train_epochs=args.epochs,
        per_device_train_batch_size=args.train_batch_size,
        per_device_eval_batch_size=args.eval_batch_size,
        learning_rate=args.learning_rate,
        fp16=args.fp16,
        evaluation_strategy='epoch',
        save_strategy='epoch',
        load_best_model_at_end=True,
        logging_dir=f'{args.output_data_dir}/logs',
        logging_steps=10,
        report_to=['tensorboard']
    )

    # Trainer
    trainer = Trainer(
        model=model,
        args=training_args,
        train_dataset=datasets['train'],
        eval_dataset=datasets['validation'],
        data_collator=data_collator
    )

    # Train
    trainer.train()

    # Save model
    trainer.save_model(args.model_dir)
    tokenizer.save_pretrained(args.model_dir)

if __name__ == '__main__':
    args = parse_args()
    train(args)

Deploying Fine-Tuned Model

from sagemaker.huggingface import HuggingFaceModel

def deploy_fine_tuned_model(estimator) -> str:
    """Deploy fine-tuned model to SageMaker endpoint."""

    # Create HuggingFace model from training job
    huggingface_model = HuggingFaceModel(
        model_data=estimator.model_data,
        role=estimator.role,
        transformers_version='4.26',
        pytorch_version='1.13',
        py_version='py39',
        entry_point='inference.py',
        source_dir='./scripts'
    )

    # Deploy to endpoint
    predictor = huggingface_model.deploy(
        initial_instance_count=1,
        instance_type='ml.g4dn.xlarge',  # GPU instance for inference
        endpoint_name='australian-support-endpoint'
    )

    return predictor.endpoint_name

# Deploy
endpoint_name = deploy_fine_tuned_model(estimator)
print(f"Model deployed to: {endpoint_name}")

Using Deployed Model

import boto3
import json

def get_support_response(query: str, endpoint_name: str) -> str:
    """Get response from fine-tuned support model."""

    runtime = boto3.client('sagemaker-runtime', region_name='ap-southeast-2')

    response = runtime.invoke_endpoint(
        EndpointName=endpoint_name,
        ContentType='application/json',
        Body=json.dumps({
            'inputs': f"Input: {query}\nOutput:",
            'parameters': {
                'max_length': 300,
                'temperature': 0.7,
                'top_p': 0.9
            }
        })
    )

    result = json.loads(response['Body'].read())
    return result[0]['generated_text']

# Usage
response = get_support_response(
    "What is your shipping policy for Tasmania?",
    "australian-support-endpoint"
)

print(response)

Evaluation and Testing

Evaluation Metrics

from sklearn.metrics import accuracy_score, precision_recall_fscore_support
import numpy as np

class ModelEvaluator:
    def __init__(self, model_endpoint: str):
        self.endpoint = model_endpoint
        self.runtime = boto3.client('sagemaker-runtime', region_name='ap-southeast-2')

    def evaluate_model(self, test_examples: list) -> dict:
        """Evaluate fine-tuned model on test set."""

        predictions = []
        ground_truth = []

        for example in test_examples:
            # Get prediction
            prediction = self._get_prediction(example['input'])
            predictions.append(prediction)
            ground_truth.append(example['output'])

        # Calculate metrics
        metrics = {
            'exact_match': self._calculate_exact_match(predictions, ground_truth),
            'bleu_score': self._calculate_bleu(predictions, ground_truth),
            'rouge_scores': self._calculate_rouge(predictions, ground_truth),
            'semantic_similarity': self._calculate_semantic_similarity(predictions, ground_truth)
        }

        return metrics

    def _calculate_exact_match(self, predictions: list, ground_truth: list) -> float:
        """Calculate exact match accuracy."""
        matches = sum(p.strip() == gt.strip() for p, gt in zip(predictions, ground_truth))
        return matches / len(predictions)

    def _calculate_bleu(self, predictions: list, ground_truth: list) -> float:
        """Calculate BLEU score for text generation quality."""
        from nltk.translate.bleu_score import sentence_bleu

        scores = []
        for pred, ref in zip(predictions, ground_truth):
            score = sentence_bleu([ref.split()], pred.split())
            scores.append(score)

        return np.mean(scores)

    def human_evaluation_template(self, predictions: list, ground_truth: list) -> pd.DataFrame:
        """Create template for human evaluation."""

        eval_df = pd.DataFrame({
            'input': [ex['input'] for ex in test_examples],
            'model_output': predictions,
            'expected_output': ground_truth,
            'relevance_score': '',  # 1-5
            'accuracy_score': '',   # 1-5
            'australian_context': '',  # 1-5
            'tone_appropriateness': '',  # 1-5
            'comments': ''
        })

        eval_df.to_csv('human_evaluation_template.csv', index=False)
        return eval_df

# Evaluate
evaluator = ModelEvaluator('australian-support-endpoint')

test_examples = [
    {
        "input": "How long does shipping take to Perth?",
        "output": "Shipping to Perth typically takes 3-4 business days from our Sydney warehouse using Australia Post Express. For standard shipping, please allow 5-7 business days. Tracking information will be emailed once your order ships."
    }
]

metrics = evaluator.evaluate_model(test_examples)
print(f"Model performance: {metrics}")

A/B Testing

class ABTestManager:
    def __init__(self, base_model: str, fine_tuned_model: str):
        self.base_model = base_model
        self.fine_tuned_model = fine_tuned_model
        self.runtime = boto3.client('bedrock-runtime', region_name='ap-southeast-2')

    def route_request(self, user_id: str, query: str) -> dict:
        """Route request between base and fine-tuned model for A/B testing."""

        # Route 50% to each model based on user_id hash
        use_fine_tuned = hash(user_id) % 2 == 0

        if use_fine_tuned:
            response = self._invoke_model(self.fine_tuned_model, query)
            variant = 'fine_tuned'
        else:
            response = self._invoke_model(self.base_model, query)
            variant = 'base'

        # Log for analysis
        self._log_experiment(user_id, query, response, variant)

        return {
            'response': response,
            'variant': variant
        }

    def analyse_results(self) -> dict:
        """Analyse A/B test results."""

        # Query logs from CloudWatch or database
        results = self._get_experiment_logs()

        analysis = {
            'base_model': {
                'avg_response_time': self._calculate_avg_latency(results, 'base'),
                'user_satisfaction': self._calculate_satisfaction(results, 'base'),
                'task_completion': self._calculate_completion_rate(results, 'base')
            },
            'fine_tuned_model': {
                'avg_response_time': self._calculate_avg_latency(results, 'fine_tuned'),
                'user_satisfaction': self._calculate_satisfaction(results, 'fine_tuned'),
                'task_completion': self._calculate_completion_rate(results, 'fine_tuned')
            }
        }

        return analysis

Cost Optimisation

Bedrock Fine-Tuning Costs

def calculate_bedrock_finetuning_cost(
    num_tokens: int,
    num_epochs: int,
    model_storage_months: int = 12
) -> dict:
    """
    Calculate Bedrock fine-tuning costs (example prices in AUD).

    Prices vary by model and region. Check AWS pricing for current rates.
    """

    # Example pricing (convert USD to AUD, ~1.5x)
    training_cost_per_token = 0.000012 * 1.5  # Per token
    storage_cost_per_month = 2.25 * 1.5  # Per model per month

    training_cost = (num_tokens * num_epochs * training_cost_per_token)
    storage_cost = storage_cost_per_month * model_storage_months

    return {
        'training_cost_aud': round(training_cost, 2),
        'storage_cost_aud': round(storage_cost, 2),
        'total_cost_aud': round(training_cost + storage_cost, 2)
    }

# Example: 1M tokens, 3 epochs, 12 months storage
cost = calculate_bedrock_finetuning_cost(1_000_000, 3, 12)
print(f"Total cost: ${cost['total_cost_aud']} AUD")

SageMaker Fine-Tuning Costs

def calculate_sagemaker_finetuning_cost(
    instance_type: str,
    training_hours: float,
    endpoint_instance_type: str,
    endpoint_hours_per_month: int = 730
) -> dict:
    """Calculate SageMaker fine-tuning and deployment costs."""

    # Prices in AUD per hour (ap-southeast-2)
    training_prices = {
        'ml.p3.2xlarge': 4.862,   # GPU for training
        'ml.p3.8xlarge': 19.448,
        'ml.g4dn.xlarge': 0.877
    }

    inference_prices = {
        'ml.t2.medium': 0.065,
        'ml.m5.large': 0.134,
        'ml.g4dn.xlarge': 0.877
    }

    training_cost = training_prices[instance_type] * training_hours
    inference_cost = inference_prices[endpoint_instance_type] * endpoint_hours_per_month

    return {
        'training_cost_aud': round(training_cost, 2),
        'monthly_inference_cost_aud': round(inference_cost, 2),
        'annual_cost_aud': round(training_cost + (inference_cost * 12), 2)
    }

# Example: 4 hours training on p3.2xlarge, deploy on g4dn.xlarge
cost = calculate_sagemaker_finetuning_cost(
    'ml.p3.2xlarge',
    4,
    'ml.g4dn.xlarge'
)

print(f"Training: ${cost['training_cost_aud']} AUD")
print(f"Monthly inference: ${cost['monthly_inference_cost_aud']} AUD")
print(f"Annual total: ${cost['annual_cost_aud']} AUD")

Cost Reduction Strategies

1. Data Efficiency:

# Use fewer, higher-quality examples
# 500 great examples > 5000 mediocre examples

2. Hyperparameter Optimisation:

# Reduce epochs if validation loss plateaus early
# Start with 3 epochs, increase only if needed

3. Instance Selection:

# Training: Use Spot instances for cost savings
# Inference: Right-size based on traffic patterns

4. Model Pruning:

# Use smaller base models when possible
# Fine-tune 7B model instead of 70B if performance is acceptable

Australian Compliance

Data Privacy

Privacy Act Considerations:

def anonymise_training_data(examples: list) -> list:
    """Remove PII from training data."""

    import re

    def redact_pii(text: str) -> str:
        # Redact email addresses
        text = re.sub(r'\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b', '[EMAIL]', text)

        # Redact phone numbers
        text = re.sub(r'\b(?:\+?61|0)[2-478](?:[ -]?[0-9]){8}\b', '[PHONE]', text)

        # Redact names (basic approach, consider NER for better results)
        text = re.sub(r'\b[A-Z][a-z]+ [A-Z][a-z]+\b', '[NAME]', text)

        return text

    anonymised = []
    for example in examples:
        anonymised.append({
            'prompt': redact_pii(example['prompt']),
            'completion': redact_pii(example['completion'])
        })

    return anonymised

Model Governance

Documentation template:

model_card = {
    "model_details": {
        "name": "Australian Customer Support Assistant",
        "version": "1.0",
        "base_model": "anthropic.claude-v2",
        "fine_tuned_on": "2025-11-30",
        "owner": "Customer Support Team"
    },
    "intended_use": {
        "primary_use": "Automated customer support responses",
        "out_of_scope": "Medical advice, legal advice, financial recommendations"
    },
    "training_data": {
        "source": "Historical support tickets (anonymised)",
        "size": "5,000 examples",
        "date_range": "2024-01-01 to 2025-10-31",
        "pii_handling": "All PII removed before training"
    },
    "evaluation": {
        "test_accuracy": "92%",
        "human_evaluation_score": "4.2/5",
        "last_evaluated": "2025-11-30"
    },
    "australian_compliance": {
        "privacy_act": "Compliant - No PII in training data",
        "data_sovereignty": "Model trained and hosted in ap-southeast-2",
        "retention_policy": "Training data deleted after 90 days"
    }
}

Best Practices

1. Start Small

# Begin with small dataset
initial_examples = 100

# Evaluate results
# If performance good, deploy
# If not, add more examples incrementally

2. Quality Over Quantity

Focus on high-quality, diverse examples that represent real use cases.

3. Continuous Evaluation

# Monitor model performance in production
# Collect examples where model fails
# Retrain periodically with new data

4. Version Control

# Track model versions
# Document changes between versions
# Maintain rollback capability

5. Cost Monitoring

# Set CloudWatch alarms for training costs
# Monitor inference costs
# Review and optimise quarterly

Conclusion

Fine-tuning foundation models on AWS enables Australian businesses to create AI systems tailored to their specific needs, industry terminology, and local context. Whether using Bedrock for simplicity or SageMaker for advanced control, fine-tuning provides a middle ground between generic models and building from scratch.

Key takeaways:

Fine-tune when you need domain expertise or specific output formats
Use RAG when you need factual, updatable information
Start with high-quality data, even if small
Evaluate thoroughly before production deployment
Monitor costs and performance continuously

CloudPoint helps Australian businesses implement fine-tuning strategies, from data preparation to production deployment. We ensure your fine-tuned models meet performance requirements while maintaining compliance with Australian regulations.

Need Help Fine-Tuning Models?

CloudPoint helps Australian businesses fine-tune and deploy custom AI models on AWS. Get in touch to discuss your requirements.

Learn more about our AI Services →