Message Storage Architecture
Comprehensive exploration of message storage solutions considering database growth, retention policies, Postfix queues, and compliance requirements.
Last Updated: February 2025
The Problem
Current Issue: SQL Storage Growth
Storage Growth:
- Email content (HTML + text + attachments) can be 10KB - 10MB+ per message
- At 100K messages/month: ~1-100GB/month
- With multiple organizations: Database becomes massive quickly
- Expensive backups, slow queries, high costs
Retention Complexity:
- Different tiers need different retention (7 days vs 90 days vs 1 year)
- Deleting millions of rows from SQL is expensive and slow
- Can't use simple SQL
DELETEat scale
The Complete Email Journey
Problems with Postfix Storage
Problem 1: Messages Sit in Postfix Queue
# Messages can stay in queue for days
/var/spool/postfix/active/ # Currently being delivered
/var/spool/postfix/deferred/ # Failed delivery, will retry
/var/spool/postfix/hold/ # Held messages
- Default retry: Up to 5 days
- Full message content stored as files
- Privacy/compliance risk (GDPR, HIPAA)
Problem 2: Logs Contain Sensitive Data
/var/log/mail.log
# Contains: email addresses, subjects, delivery status, IPs
# Can grow massive (100MB+ per day at scale)
# Log retention = compliance issue
Problem 3: No Automatic Cleanup
- Postfix doesn't auto-delete old queue files
- Logs rotate but aren't purged
- Manual cleanup required
Problem 4: Status Tracking is Complex
- Must parse logs to know delivery status
- Need real-time log monitoring
- Bounce parsing is error-prone
Recommended Solution: Hybrid Storage with Minimal Postfix Queue
Architecture Overview
Storage Layers
1. PostgreSQL (Metadata Only)
Store in messages table:
message_id,org_id,status,from,to,subjectcreated_at,sent_at,delivered_attemplate_id,domain_idcontent_s3_key(reference to S3)content_available(boolean flag)
Benefits:
- Fast queries for dashboards and filtering
- Small database size (metadata is ~1KB vs 10KB+ for content)
- Easy to keep metadata longer than content (for analytics)
2. Object Storage (S3/MinIO) - Message Content
Store full message content:
{
"message_id": "msg_123",
"html": "<html>...</html>",
"text": "Plain text version",
"attachments": [...],
"headers": {...},
"variables": {...}
}
Path structure: messages/{org_id}/{year}/{month}/{day}/{message_id}.json.gz
Benefits:
- Cheap storage ($0.023/GB/month on S3 vs $0.10+/GB for RDS)
- Compression reduces size by 70-90%
- Easy retention with S3 lifecycle policies
- No database performance impact
3. Tier-Based Retention Policy
| Tier | Content Retention | Metadata Retention |
|---|---|---|
| Free | 7 days | 30 days |
| Basic | 30 days | 90 days |
| Pro | 90 days | 1 year |
| Enterprise | 1 year+ | Unlimited |
Implementation:
- Set S3 lifecycle rules per organization folder
- Background job marks
content_available = falsewhen content expires - UI shows "Content expired per retention policy" for old messages
- Keep metadata for analytics (delivery rates, trends) even after content deletion
Message Access Flow
Writing Messages
// 1. Save metadata to SQL
const message = await prisma.message.create({
data: { ...metadata, content_available: true }
})
// 2. Upload content to S3 (async, non-blocking)
await s3.putObject({
Bucket: 'postchi-messages',
Key: `messages/${orgId}/${year}/${month}/${day}/${messageId}.json.gz`,
Body: gzip(JSON.stringify(content))
})
Reading Messages
// 1. Get metadata from SQL (fast)
const message = await prisma.message.findUnique({ where: { id } })
// 2. If content_available, fetch from S3
if (message.content_available) {
const content = await s3.getObject({
Key: message.content_s3_key
})
return { ...message, ...JSON.parse(ungzip(content)) }
}
return { ...message, content: null }
Postfix Configuration for Minimal Storage
Aggressive Queue Settings
# /etc/postfix/main.cf
# Queue Lifetime (aggressive cleanup)
maximal_queue_lifetime = 1h
bounce_queue_lifetime = 1h
# Retry Settings (fail fast)
minimal_backoff_time = 5m
maximal_backoff_time = 15m
queue_run_delay = 5m
# Size Limits (prevent abuse)
message_size_limit = 10485760 # 10MB
mailbox_size_limit = 0
Log Cleanup
# /etc/logrotate.d/postfix
/var/log/mail.log {
daily
rotate 7
compress
delaycompress
missingok
notifempty
sharedscripts
postrotate
/usr/sbin/postfix reload > /dev/null
endscript
}
Automated Queue Cleanup
# Crontab - Clean Postfix queue every 15 minutes
*/15 * * * * /usr/sbin/postsuper -d ALL deferred
0 * * * * /usr/sbin/postsuper -d ALL hold
Delivery Status Tracking
Option A: SMTP Response (Recommended)
Get delivery status directly from SMTP response:
import nodemailer from 'nodemailer'
const transporter = nodemailer.createTransport({
host: 'localhost',
port: 25,
logger: true,
debug: true
})
const info = await transporter.sendMail(message)
// info.response = "250 2.0.0 OK: queued as ABC123"
// info.accepted = ['user@example.com']
// info.rejected = []
await prisma.message.update({
where: { id: message.id },
data: {
status: info.accepted.length > 0 ? 'sent' : 'bounced',
postfix_queue_id: info.messageId,
smtp_response: info.response
}
})
Option B: Log Parsing (Fallback)
import { Tail } from 'tail'
const tail = new Tail('/var/log/mail.log')
tail.on('line', async (line) => {
const match = line.match(/to=<(.+?)>.*status=(\w+)/)
if (match) {
const [, email, status] = match
await prisma.message.updateMany({
where: { to: email, status: 'sending' },
data: {
status: status === 'sent' ? 'delivered' : 'bounced',
delivered_at: new Date()
}
})
}
})
Privacy & Compliance Solution
Data Retention by Layer
| Layer | Data Type | Retention | Cleanup Method |
|---|---|---|---|
| PostgreSQL | Metadata (from, to, subject) | 30-365 days | Daily cron job |
| S3 | Full content (HTML, attachments) | 7-90 days | S3 lifecycle policy |
| Postfix Queue | Full message | 1 hour max | Aggressive queue settings |
| Postfix Logs | Delivery events | 7 days | Logrotate + rsyslog |
| Redis | Job queue | 24 hours | Redis TTL |
Background Retention Enforcement
// Background job (runs daily)
async function enforceRetention() {
const orgs = await prisma.organization.findMany()
for (const org of orgs) {
const cutoffDate = subDays(new Date(), org.retention_policy.content_days)
// Mark content as expired
await prisma.message.updateMany({
where: {
org_id: org.id,
created_at: { lt: cutoffDate },
content_available: true
},
data: { content_available: false }
})
// S3 lifecycle policy handles actual deletion
}
}
Implementation Checklist
- Add
content_s3_keyandcontent_availableto messages table - Configure Postfix aggressive queue settings (1h max)
- Implement S3 content storage in worker
- Update worker to get SMTP response and update status immediately
- Create background job for retention enforcement
- Set up Postfix queue cleanup cron jobs
- Configure logrotate for mail.log (7 day retention)
- Implement S3 lifecycle policies per organization tier
- Add retry logic in application (BullMQ) for deferred messages
Key Principles
- Minimize Postfix Storage - Max 1 hour in queue, aggressive cleanup
- Application Controls Retention - Not Postfix
- Status from SMTP Response - Not log parsing
- S3 for Content - Cheap, scalable, easy retention policies
- SQL for Metadata - Fast queries, keep longer than content for analytics
- Tier-Based Policies - Different retention per organization tier