Skip to content

Object Storage Integration Guide

Overview

This guide covers the integration of object storage (S3/MinIO compatible) into the backend document controller. The integration allows documents to be stored in cloud object storage instead of local filesystem, providing better scalability, reliability, and cost efficiency.

Architecture

Components

  1. ObjectStorageService - High-level document storage abstraction
  2. StorageProvider - Abstract base class for storage implementations
  3. S3StorageProvider - AWS S3/MinIO compatible implementation
  4. LocalStorageProvider - Local filesystem implementation (fallback)
  5. ObjectStorageFactory - Factory for creating and managing storage providers

Key Features

  • ✅ Abstract storage layer supporting multiple backends
  • ✅ Hybrid encryption: AES for documents + RSA for keys
  • ✅ Presigned URLs for secure file access (S3 only)
  • ✅ Hierarchical storage organization
  • ✅ Backward compatibility with local filesystem
  • ✅ Graceful fallback mechanisms
  • ✅ Migration utilities for existing documents

Configuration

Environment Variables

# Storage type: "s3" or "local"
STORAGE_TYPE=s3

# S3 Configuration
S3_ACCESS_KEY=your_access_key
S3_SECRET_KEY=your_secret_key
S3_BUCKET=bank-documents
S3_REGION=us-east-1
S3_ENDPOINT_URL=https://s3.amazonaws.com  # Optional, for MinIO use custom endpoint

# Local Storage Configuration (fallback)
LOCAL_STORAGE_PATH=encrypted_files/

.env Example

# For AWS S3
STORAGE_TYPE=s3
S3_ACCESS_KEY=AKIA1234567890ABCDEF
S3_SECRET_KEY=wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY
S3_BUCKET=bank-documents
S3_REGION=us-east-1

# For MinIO
STORAGE_TYPE=s3
S3_ACCESS_KEY=minioadmin
S3_SECRET_KEY=minioadmin
S3_BUCKET=bank-documents
S3_ENDPOINT_URL=http://minio:9000

# Fallback to local storage
STORAGE_TYPE=local
LOCAL_STORAGE_PATH=encrypted_files/

Database Schema Changes

New Columns on document.attachment

ALTER TABLE document.attachment ADD COLUMN storage_path VARCHAR(500);
ALTER TABLE document.attachment ADD COLUMN storage_type VARCHAR(50) DEFAULT 'local';

New Columns on document.attachment_staging

ALTER TABLE document.attachment_staging ADD COLUMN storage_path VARCHAR(500);
ALTER TABLE document.attachment_staging ADD COLUMN storage_type VARCHAR(50) DEFAULT 'local';

Usage

Initialization in Application

from core.config import settings
from services.object_storage_service import init_storage_service

# Initialize storage service during application startup
config = {
    "STORAGE_TYPE": settings.STORAGE_TYPE,
    "S3_ACCESS_KEY": settings.S3_ACCESS_KEY,
    "S3_SECRET_KEY": settings.S3_SECRET_KEY,
    "S3_BUCKET": settings.S3_BUCKET,
    "S3_REGION": settings.S3_REGION,
    "S3_ENDPOINT_URL": settings.S3_ENDPOINT_URL,
    "LOCAL_STORAGE_PATH": settings.LOCAL_STORAGE_PATH,
}

storage_service = init_storage_service(config)

In Document Controller

Files are automatically stored in object storage when uploaded:

# Upload automatically uses object storage
@r.post("/file/add")
async def upload_file(request: CustomRequest, payload: FileUploadRequest):
    return await upload_staged_file(
        request.state.db_session, 
        request.state.current_user, 
        payload
    )

Direct Usage

from services.object_storage_service import get_storage_service

storage_service = get_storage_service()

# Store a document
storage_path = await storage_service.store_encrypted_document(
    file_id=123,
    filename="document.pdf",
    encrypted_data={"ciphertext": "...", "tag": "...", ...},
    user_id=456
)

# Retrieve a document
encrypted_data = await storage_service.retrieve_encrypted_document(storage_path)

# Delete a document
await storage_service.delete_document(storage_path)

# Get presigned URL (S3 only)
url = storage_service.get_download_url(storage_path, expiration_hours=24)

File Organization

Storage Path Hierarchy

documents/
├── user_123/
│   ├── 2024/01/15/
│   │   ├── 1_passport.pdf_pdf.json
│   │   ├── 2_license.pdf_pdf.json
│   └── 2024/01/16/
│       └── 3_visa.pdf_pdf.json
├── user_456/
│   └── 2024/01/15/
│       └── 4_certificate.pdf_pdf.json
└── system/
    └── 2024/01/15/
        └── 5_template.pdf_pdf.json

Migration from Local to S3

Prerequisites

  • ✅ boto3 installed (included in requirements.txt)
  • ✅ AWS S3 bucket created and credentials configured
  • ✅ S3 IAM permissions: s3:PutObject, s3:GetObject, s3:DeleteObject

Running Migration

# Migrate to S3
python -m scripts.migrate_documents_to_storage s3

# Migrate to local storage (fallback)
python -m scripts.migrate_documents_to_storage local

Migration Script Features

  • 📊 Batch processing to avoid memory issues
  • ✅ Progress reporting
  • 🔍 Verification of migration integrity
  • ❌ Error handling and logging
  • ⏭️ Skips already migrated files

Migration Report Example

==================================================
📋 MIGRATION REPORT
==================================================
Total files: 1523
✅ Successful: 1521
❌ Failed: 2
⏭️ Skipped: 0

❌ Errors:
  - File not found: /path/to/missing_file.json
  - Attachment 456: Connection timeout
==================================================

Error Handling

Graceful Fallback

The system includes automatic fallback mechanisms:

try:
    # Try to retrieve from object storage
    encrypted_data = await storage_service.retrieve_encrypted_document(storage_path)
except Exception as e:
    logger.warning(f"Failed to retrieve from storage: {e}")
    # Fallback to local filesystem
    if os.path.exists(attachment.filepath):
        with open(attachment.filepath, "rb") as f:
            encrypted_data = bytes_to_dict(f.read())

Performance Considerations

Upload Performance

  • Local Storage: ~10-50ms per file
  • S3 (same region): ~50-200ms per file
  • S3 (cross-region): ~200-500ms per file

Optimization Tips

  1. Use multi-part upload for large files
  2. Batch uploads when possible
  3. Enable S3 transfer acceleration for better performance
  4. Use CloudFront CDN for frequently accessed documents
  5. Consider S3 Intelligent-Tiering for cost optimization

Security Best Practices

IAM Permissions (AWS S3)

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "s3:PutObject",
        "s3:GetObject",
        "s3:DeleteObject",
        "s3:ListBucket"
      ],
      "Resource": [
        "arn:aws:s3:::bank-documents",
        "arn:aws:s3:::bank-documents/*"
      ]
    }
  ]
}

Encryption

  • All documents are encrypted at rest using AES-256 with GCM mode
  • AES keys are protected using RSA-2048 encryption
  • Each document has unique encryption key and nonce

Access Control

  • Use presigned URLs with time-limited expiration (default 24 hours)
  • Store credentials in environment variables or AWS Secrets Manager
  • Enable S3 bucket versioning for audit trail
  • Enable S3 MFA Delete for critical buckets

Monitoring and Logging

Key Metrics to Monitor

  • Upload/download success rate
  • Average response time per operation
  • S3 API call volume
  • Storage costs
  • Failed migration attempts

Log Examples

✅ S3 storage initialized: bucket=bank-documents, endpoint=None
✅ File uploaded to S3: s3://bank-documents/documents/user_123/2024/01/15/1_passport.pdf_pdf.json
✅ File downloaded from S3: documents/user_123/2024/01/15/1_passport.pdf_pdf.json
✅ Presigned URL generated: documents/user_123/2024/01/15/1_passport.pdf_pdf.json
❌ Failed to connect to S3: Connection refused
⚠️ Failed to retrieve from object storage, falling back to local

Troubleshooting

S3 Connection Issues

# Error: "Unable to locate credentials"
# Solution: Set AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY environment variables

# Error: "NoSuchBucket"
# Solution: Ensure bucket exists and is accessible with provided credentials

# Error: "AccessDenied"
# Solution: Check IAM permissions for the user/role

Migration Issues

# Error: "File not found on server"
# Solution: Verify LOCAL_STORAGE_PATH and check if files exist

# Error: "Connection timeout"
# Solution: Check S3 endpoint URL, network connectivity, and credentials

# Error: "Botocore parsing failed"
# Solution: Ensure S3_REGION is valid (e.g., us-east-1, eu-west-1)

Testing

Unit Tests

import pytest
from services.object_storage_service import (
    LocalStorageProvider,
    S3StorageProvider,
    ObjectStorageFactory,
)

@pytest.mark.asyncio
async def test_local_storage_upload():
    provider = LocalStorageProvider("test_storage/")
    path = await provider.upload_file(
        "test.txt",
        b"test content"
    )
    assert path.endswith("test.txt")

@pytest.mark.asyncio
async def test_s3_storage_fallback():
    # Test graceful fallback when S3 is unavailable
    ...

Cost Estimation

AWS S3 Pricing Example (us-east-1)

  • Upload: $0.005 per 1,000 requests
  • Download: $0.0004 per 1,000 requests
  • Storage: $0.023 per GB/month
  • Transfer out: $0.09 per GB

For 1,000 documents (100MB total): - Monthly storage: ~$2.30 - Monthly API calls: ~$0.01 (100 uploads + 100 downloads) - Total: ~$2.31/month

Future Enhancements

  • [ ] Support for Google Cloud Storage (GCS)
  • [ ] Support for Azure Blob Storage
  • [ ] Multi-cloud failover
  • [ ] Automatic document expiration and cleanup
  • [ ] Compression before upload
  • [ ] Parallel upload/download
  • [ ] Progress tracking for large files
  • [ ] Bulk operations API
  • [ ] Document version control
  • [ ] Access audit logging

Support & Questions

For issues or questions about object storage integration: 1. Check the logs: tail -f logs/*.log 2. Review migration report: python -m scripts.migrate_documents_to_storage --verify 3. Test connectivity: python -c "from services.object_storage_service import ObjectStorageFactory; ObjectStorageFactory.initialize(config)"