File Storage & Uploads
Enterprise file storage with S3-compatible backends, direct-to-bucket uploads, content validation, and multi-tenant isolation.
The storage system handles secure file uploads using direct-to-bucket architecture. Rather than uploading files through the API server (which creates bottlenecks), files upload directly to S3-compatible storage (AWS S3, MinIO, SeaweedFS) via short-lived presigned URLs.
This approach enables large file support (up to 50 MB by default), real-time upload progress tracking, and better scalability under load.
How It Works
User selects file
↓
Browser requests presigned URL from API
↓
API returns temporary upload URL (expires in 5 min)
↓
Browser uploads file directly to S3
↓
Browser notifies API: "upload complete"
↓
Background validation job scans the file
↓
File transitions to READY statusThe entire flow happens in three API calls: initiate, upload (direct to S3), and complete. The actual bytes never touch the API server.
Building Blocks
1. Object Classes
Defines what types of files are allowed where. Each class has validation rules and retention policies.
| Class | Allowed Types | Max Size | Retention |
|---|---|---|---|
| USER_PROFILE_PHOTO | JPEG, PNG, WebP | 5 MB | Permanent |
| COMPANY_LOGO | JPEG, PNG, WebP, SVG | 5 MB | Permanent |
| CUSTOMER_SIGNATURE | JPEG, PNG, WebP | 2 MB | Permanent |
| KYC_IDENTITY_DOCUMENT | JPEG, PNG, PDF | 10 MB | Compliance Archive |
| KYC_SUPPORTING_DOCUMENT | JPEG, PNG, PDF | 10 MB | Compliance Archive |
| LOAN_DOCUMENT | JPEG, PNG, PDF | 20 MB | Compliance Archive |
| IMPORT_FILE | CSV, XLS, XLSX | 50 MB | 90 days |
| STATEMENT | 20 MB | Compliance Archive | |
| REPORT | PDF, CSV | 20 MB | Compliance Archive |
| GENERAL_DOCUMENT | JPEG, PNG, PDF | 20 MB | Permanent |
The system validates files against these rules even if the extension says "jpg".
2. Upload Sessions
Every upload attempt creates an upload session that tracks the lifecycle from presign generation through completion. Sessions capture:
| Field | Purpose |
|---|---|
| initiatedBy | Who started the upload |
| presignExpiresAt | When the upload URL expires |
| expectedContentType | What content type was declared |
| expectedFileSize | What size was declared |
| policySnapshot | Copy of validation rules at initiation time |
This creates an audit trail independent of the stored object itself.
3. Validation Pipeline
After upload completes, a background job verifies file integrity:
| Check | Purpose |
|---|---|
| Magic bytes | File signature matches declared content-type (prevents extension spoofing) |
| Image dimensions | Width and height within bounds (max 8192x8192, prevents decompression bombs) |
| SVG safety | Detects embedded scripts and event handlers (prevents XSS) |
| PDF structure | Validates PDF header format |
| CSV structure | Checks for valid delimiters and line breaks |
If validation fails, the file is rejected and scheduled for cleanup.
Security Layers
Presigned URL Expiry
Upload and download URLs are temporary and cryptographically signed:
| URL Type | Expiry | Purpose |
|---|---|---|
| Upload URL | 5 minutes | One-time upload to specific key |
| Download URL | 15 minutes | Authorized access to file |
URLs are bound to specific keys and cannot be used to access other objects.
Content Verification
The system inspects actual file contents, not just extensions:
| File Type | Validation Method |
|---|---|
| JPEG | Magic bytes FF D8 FF + dimension extraction from EXIF |
| PNG | Magic bytes 89 50 4E 47 + IHDR chunk dimensions |
| WebP | RIFF + WEBP signature |
%PDF- header validation | |
| SVG | Script element and event handler detection |
| Excel | OLE2 or ZIP (OOXML) signature |
Encryption
All files stored with server-side encryption (AES256). Copied objects inherit the same encryption.
Tenant Isolation
Files are organized hierarchically in the bucket by tenant:
tenants/\{tenantId\}/\{ownerEntityType\}/\{ownerEntityId\}/\{objectId\}Example:
tenants/acme-corp/User/usr-123/obj-abc123.jpg
tenants/globex-inc/Customer/cust-456/obj-def456.pdfDatabase queries always filter by tenantId. The unique constraint @@unique([tenantId, storageKey]) ensures no key collisions within a tenant while allowing identical paths across tenants.
Cross-tenant access is impossible because:
- Repository queries include
WHERE tenantId = X - Storage keys contain the tenant prefix at the root level
- Object IDs from one tenant resolve to null when queried from another tenant's scope
Checksum Verification
If the client provides an MD5 checksum, it is verified against the S3 ETag after upload. Checksum mismatches trigger automatic deletion of the corrupted file.
Retention Policies
Files are automatically cleaned up based on their retention policy:
| Policy | Duration | Auto-Cleanup | Use Case |
|---|---|---|---|
| PERMANENT | Forever | No | Profile photos, logos |
| COMPLIANCE_ARCHIVE | Forever | No | KYC docs, loan agreements |
| SUPERSEDED_30D | 30 days | Yes | Old versions of replaceable files |
| TEMP_UPLOAD_24H | 24 hours | Yes | Abandoned uploads |
| FAILED_UPLOAD_7D | 7 days | Yes | Rejected files (validation failed) |
| IMPORT_ARTIFACT_90D | 90 days | Yes | Temporary import files |
| LEGAL_HOLD | Forever | Blocked | Litigation/compliance holds |
A background job runs hourly to clean up expired objects. Objects under legal hold are never deleted regardless of retention policy.
Replacement and Supersession
Certain object classes support replacement (profile photos, logos). When a new file replaces an existing one:
New upload initiated
↓
Previous file marked as superseded
↓
Superseded file gets SUPERSEDED_30D policy
↓
New file becomes current (isCurrent = true)
↓
After 30 days, superseded file is cleaned upDocument classes (KYC, loan docs) do not support replacement — multiple files can be current simultaneously.
Legal Hold
Mark sensitive files with legal hold to block deletion regardless of retention policy. This is useful for:
- Litigation holds
- Compliance investigations
- Audit requirements
Once placed, a hold can only be released by an authorized user. The hold reason and timestamp are recorded in the audit trail.
API Usage
Uploading a File
Step 1: Initiate
POST /api/v1/uploads/initiate
{
"objectClass": "USER_PROFILE_PHOTO",
"ownerEntityType": "User",
"ownerEntityId": "usr_123",
"originalFilename": "photo.jpg",
"contentType": "image/jpeg",
"fileSize": 1024000
}Response:
{
"storedObject": {
"id": "obj_abc123",
"storageKey": "tenants/acme-corp/User/usr_123/obj_abc123",
"status": "PENDING_UPLOAD"
},
"uploadUrl": "https://s3.amazonaws.com/...",
"uploadUrlExpiresAt": "2026-01-15T10:35:00Z",
"sessionId": "sess_xyz789"
}Step 2: Upload
PUT the file bytes directly to uploadUrl.
Step 3: Complete
POST /api/v1/uploads/obj_abc123/complete
{
"checksum": "d41d8cd98f00b204e9800998ecf8427e"
}The object transitions to SCANNING status while validation runs asynchronously. Poll the object status or wait for webhook notification (if configured).
Downloading a File
Request a presigned download URL:
POST /api/v1/uploads/obj_abc123/download-url?disposition=attachmentResponse:
{
"downloadUrl": "https://s3.amazonaws.com/...",
"expiresAt": "2026-01-15T10:50:00Z",
"contentType": "image/jpeg",
"originalFilename": "photo.jpg",
"fileSize": "1024000"
}The URL expires in 15 minutes. Disposition can be attachment (forces download) or inline (browser display).
Listing Files
GET /api/v1/uploads?ownerEntityType=User&ownerEntityId=usr_123&isCurrent=trueReturns current files for the specified owner. Supports filtering by object class, status, and pagination.
Deleting a File
DELETE /api/v1/uploads/obj_abc123Soft-deletes the database record and hard-deletes the bucket object. Blocked if legal hold is active.
Frontend Integration
The useUploadFile hook manages the full lifecycle:
const { upload, cancel, state, progress } = useUploadFile();
// Initiates upload, uploads via XHR with progress, completes automatically
await upload(file, {
objectClass: 'CUSTOMER_PHOTO',
ownerEntityType: 'Customer',
ownerEntityId: customerId
});
// State: 'idle' | 'initiating' | 'uploading' | 'completing' | 'done' | 'error'
// Progress: { loaded, total, percentage }Uploads can be cancelled mid-flight. The hook handles cleanup of stale requests.
Real-Time Progress Updates
The upload system uses a two-phase progress model optimized for direct-to-bucket uploads:
Phase 1: Upload Progress (Client-Side)
Since files upload directly to S3 (not through the API server), progress tracking happens entirely in the browser using XMLHttpRequest (XHR) progress events:
File selected
↓
XHR opened to presigned S3 URL
↓
xhr.upload.onprogress fires repeatedly
↓
UI updates: "Uploading... 45% (1.2 MB / 2.5 MB)"
↓
Upload completesWhy XHR instead of Fetch API? The Fetch API does not expose upload progress events. Only XHR provides xhr.upload.addEventListener("progress", ...) for real-time byte counters.
Progress data structure:
| Field | Description |
|---|---|
| loaded | Bytes uploaded so far |
| total | Total file size in bytes |
| percentage | Calculated as loaded / total (0-100%) |
No WebSocket or Server-Sent Events: The system intentionally avoids WebSocket/SSE connections for uploads. Since the data flows browser→S3 directly, there's no server involvement during the transfer that could publish progress updates.
Phase 2: Validation Status (Polling)
After the upload completes, a background validation job scans the file. The client polls for status changes:
| Status | Meaning | Polling |
|---|---|---|
| SCANNING | Validation in progress | Poll every 2 seconds |
| READY | Validation passed | Stop polling |
| REJECTED | Validation failed | Stop polling |
Polling implementation:
POST /complete returns { status: "SCANNING" }
↓
GET /uploads/:id every 2 seconds
↓
Status change detected
↓
UI updates: "Processing..." → "Complete" or "Failed"Why polling instead of WebSocket?
- Validation is typically 1-3 seconds (magic byte checks, not heavy processing)
- Polling is simpler for intermittent/mobile connectivity
- No persistent connection overhead for short-lived operations
Progress State Machine
idle
↓ (file selected)
initiating
↓ (presign URL obtained)
uploading ← [XHR progress events]
↓ (S3 confirms upload)
completing
↓ (API notified, validation queued)
SCANNING ← [poll every 2s]
↓
READY / REJECTED / ERRORCancellation Support
Uploads can be cancelled at any point before completion:
| Phase | Cancellation Method |
|---|---|
| initiating | AbortController signal |
| uploading | xhr.abort() |
| completing | AbortController signal |
| SCANNING | Cannot cancel (already in S3) |
Cancelled uploads leave orphaned S3 objects that the cleanup job removes automatically after 24 hours.
Offline Support (PWA)
For field agents working without connectivity:
| Capability | Implementation |
|---|---|
| Local drafts | IndexedDB (Dexie) stores attachment metadata |
| Upload queue | Files queued for upload when connection restored |
| Progress tracking | Upload status tracked across sessions |
| Retry logic | Failed uploads retried with exponential backoff |
When connectivity returns, the sync process uploads queued files and updates local records with server-generated IDs.
Configuration
Environment variables control storage behavior:
| Variable | Default | Description |
|---|---|---|
| STORAGE_ENDPOINT | Required | S3-compatible endpoint URL |
| STORAGE_REGION | us-east-1 | AWS region or default |
| STORAGE_BUCKET | Required | Bucket name |
| STORAGE_ACCESS_KEY | Required | S3 access key |
| STORAGE_SECRET_KEY | Required | S3 secret key |
| STORAGE_PRESIGN_UPLOAD_EXPIRY_SECONDS | 300 | Upload URL validity (5 min) |
| STORAGE_PRESIGN_DOWNLOAD_EXPIRY_SECONDS | 900 | Download URL validity (15 min) |
| STORAGE_MAX_FILE_SIZE_BYTES | 52428800 | Maximum file size (50 MB) |
The system supports AWS S3, MinIO, and SeaweedFS as backends.
Audit Trail
Every storage action is recorded in the ObjectEvent table:
| Action | When Recorded |
|---|---|
| UPLOAD_INITIATED | Presign URL generated |
| UPLOAD_COMPLETED | Client called complete endpoint |
| UPLOAD_FAILED | Upload verification failed |
| SCAN_PASSED | Background validation succeeded |
| SCAN_FAILED | Background validation failed |
| DOWNLOAD_URL_ISSUED | Presign URL generated for download |
| OBJECT_DELETED | File deleted by user |
| OBJECT_EXPIRED | Cleanup job removed expired file |
| HOLD_PLACED | Legal hold applied |
| HOLD_RELEASED | Legal hold removed |
Each entry includes actor ID, timestamp, request ID for correlation, and result status (success/failure).
Error Handling
Common error scenarios and their handling:
| Scenario | Response | Recovery |
|---|---|---|
| Presign expired | 400 Bad Request | Re-initiate upload |
| File too large | 400 Bad Request | Compress or select smaller file |
| Invalid content type | 400 Bad Request | Convert to allowed format |
| Checksum mismatch | 400 Bad Request | Re-upload (file corrupted) |
| Validation failed | 200 + REJECTED status | File removed automatically |
| Legal hold active | 403 Forbidden | Remove hold before deleting |
| Object not found | 404 Not Found | Verify object ID and permissions |
Validation failures set the object status to REJECTED and trigger automatic cleanup after 7 days.
Storage Key Structure
Keys follow a predictable hierarchy for organization and potential IAM policy restrictions:
tenants/\{tenantId\}/\{ownerEntityType\}/\{ownerEntityId\}/\{objectId\}| Component | Description |
|---|---|
| tenants/ | Static prefix for all tenant-scoped objects |
| {tenantId} | Tenant identifier (e.g., "acme-corp") |
| {ownerEntityType} | Entity category: User, Customer, Account, LoanActivity, etc. |
| {ownerEntityId} | Specific entity instance ID |
| {objectId} | Unique object identifier (UUID) |
This structure enables:
- Prefix-based listing if needed
- IAM policies scoped to tenant prefixes
- Logical organization in S3 consoles
- Efficient cleanup by tenant or entity
Cleanup and Maintenance
The storage cleanup job runs on a schedule to remove expired objects:
Every hour:
1. Find objects where retentionExpiresAt <= now()
2. Skip if isLegalHold = true
3. Skip if policy is PERMANENT or COMPLIANCE_ARCHIVE
4. Soft-delete database record
5. Hard-delete bucket object
6. Record OBJECT_EXPIRED event
7. Continue in batches of 100Orphaned bucket objects (where DB delete succeeded but S3 delete failed) are eventually consistent — a separate reconciliation process can identify and remove them.
Third-Party Delivery
Automated data delivery to external systems with field mapping, retries, and full audit trail.
Observability (OpenTelemetry)
Vendor-neutral observability with OpenTelemetry: gRPC-based trace and metric export, automatic instrumentation for HTTP, database, cache, and queue layers, and configurable sampling.