Skip to main content

Message Storage Architecture

Comprehensive exploration of message storage solutions considering database growth, retention policies, Postfix queues, and compliance requirements.

Last Updated: February 2025


The Problem

Current Issue: SQL Storage Growth

Storage Growth:

  • Email content (HTML + text + attachments) can be 10KB - 10MB+ per message
  • At 100K messages/month: ~1-100GB/month
  • With multiple organizations: Database becomes massive quickly
  • Expensive backups, slow queries, high costs

Retention Complexity:

  • Different tiers need different retention (7 days vs 90 days vs 1 year)
  • Deleting millions of rows from SQL is expensive and slow
  • Can't use simple SQL DELETE at scale

The Complete Email Journey


Problems with Postfix Storage

Problem 1: Messages Sit in Postfix Queue

# Messages can stay in queue for days
/var/spool/postfix/active/ # Currently being delivered
/var/spool/postfix/deferred/ # Failed delivery, will retry
/var/spool/postfix/hold/ # Held messages
  • Default retry: Up to 5 days
  • Full message content stored as files
  • Privacy/compliance risk (GDPR, HIPAA)

Problem 2: Logs Contain Sensitive Data

/var/log/mail.log
# Contains: email addresses, subjects, delivery status, IPs
# Can grow massive (100MB+ per day at scale)
# Log retention = compliance issue

Problem 3: No Automatic Cleanup

  • Postfix doesn't auto-delete old queue files
  • Logs rotate but aren't purged
  • Manual cleanup required

Problem 4: Status Tracking is Complex

  • Must parse logs to know delivery status
  • Need real-time log monitoring
  • Bounce parsing is error-prone

Architecture Overview


Storage Layers

1. PostgreSQL (Metadata Only)

Store in messages table:

  • message_id, org_id, status, from, to, subject
  • created_at, sent_at, delivered_at
  • template_id, domain_id
  • content_s3_key (reference to S3)
  • content_available (boolean flag)

Benefits:

  • Fast queries for dashboards and filtering
  • Small database size (metadata is ~1KB vs 10KB+ for content)
  • Easy to keep metadata longer than content (for analytics)

2. Object Storage (S3/MinIO) - Message Content

Store full message content:

{
"message_id": "msg_123",
"html": "<html>...</html>",
"text": "Plain text version",
"attachments": [...],
"headers": {...},
"variables": {...}
}

Path structure: messages/{org_id}/{year}/{month}/{day}/{message_id}.json.gz

Benefits:

  • Cheap storage ($0.023/GB/month on S3 vs $0.10+/GB for RDS)
  • Compression reduces size by 70-90%
  • Easy retention with S3 lifecycle policies
  • No database performance impact

3. Tier-Based Retention Policy

TierContent RetentionMetadata Retention
Free7 days30 days
Basic30 days90 days
Pro90 days1 year
Enterprise1 year+Unlimited

Implementation:

  • Set S3 lifecycle rules per organization folder
  • Background job marks content_available = false when content expires
  • UI shows "Content expired per retention policy" for old messages
  • Keep metadata for analytics (delivery rates, trends) even after content deletion

Message Access Flow

Writing Messages

// 1. Save metadata to SQL
const message = await prisma.message.create({
data: { ...metadata, content_available: true }
})

// 2. Upload content to S3 (async, non-blocking)
await s3.putObject({
Bucket: 'postchi-messages',
Key: `messages/${orgId}/${year}/${month}/${day}/${messageId}.json.gz`,
Body: gzip(JSON.stringify(content))
})

Reading Messages

// 1. Get metadata from SQL (fast)
const message = await prisma.message.findUnique({ where: { id } })

// 2. If content_available, fetch from S3
if (message.content_available) {
const content = await s3.getObject({
Key: message.content_s3_key
})
return { ...message, ...JSON.parse(ungzip(content)) }
}

return { ...message, content: null }

Postfix Configuration for Minimal Storage

Aggressive Queue Settings

# /etc/postfix/main.cf

# Queue Lifetime (aggressive cleanup)
maximal_queue_lifetime = 1h
bounce_queue_lifetime = 1h

# Retry Settings (fail fast)
minimal_backoff_time = 5m
maximal_backoff_time = 15m
queue_run_delay = 5m

# Size Limits (prevent abuse)
message_size_limit = 10485760 # 10MB
mailbox_size_limit = 0

Log Cleanup

# /etc/logrotate.d/postfix
/var/log/mail.log {
daily
rotate 7
compress
delaycompress
missingok
notifempty
sharedscripts
postrotate
/usr/sbin/postfix reload > /dev/null
endscript
}

Automated Queue Cleanup

# Crontab - Clean Postfix queue every 15 minutes
*/15 * * * * /usr/sbin/postsuper -d ALL deferred
0 * * * * /usr/sbin/postsuper -d ALL hold

Delivery Status Tracking

Get delivery status directly from SMTP response:

import nodemailer from 'nodemailer'

const transporter = nodemailer.createTransport({
host: 'localhost',
port: 25,
logger: true,
debug: true
})

const info = await transporter.sendMail(message)

// info.response = "250 2.0.0 OK: queued as ABC123"
// info.accepted = ['user@example.com']
// info.rejected = []

await prisma.message.update({
where: { id: message.id },
data: {
status: info.accepted.length > 0 ? 'sent' : 'bounced',
postfix_queue_id: info.messageId,
smtp_response: info.response
}
})

Option B: Log Parsing (Fallback)

import { Tail } from 'tail'

const tail = new Tail('/var/log/mail.log')

tail.on('line', async (line) => {
const match = line.match(/to=<(.+?)>.*status=(\w+)/)
if (match) {
const [, email, status] = match
await prisma.message.updateMany({
where: { to: email, status: 'sending' },
data: {
status: status === 'sent' ? 'delivered' : 'bounced',
delivered_at: new Date()
}
})
}
})

Privacy & Compliance Solution

Data Retention by Layer

LayerData TypeRetentionCleanup Method
PostgreSQLMetadata (from, to, subject)30-365 daysDaily cron job
S3Full content (HTML, attachments)7-90 daysS3 lifecycle policy
Postfix QueueFull message1 hour maxAggressive queue settings
Postfix LogsDelivery events7 daysLogrotate + rsyslog
RedisJob queue24 hoursRedis TTL

Background Retention Enforcement

// Background job (runs daily)
async function enforceRetention() {
const orgs = await prisma.organization.findMany()

for (const org of orgs) {
const cutoffDate = subDays(new Date(), org.retention_policy.content_days)

// Mark content as expired
await prisma.message.updateMany({
where: {
org_id: org.id,
created_at: { lt: cutoffDate },
content_available: true
},
data: { content_available: false }
})

// S3 lifecycle policy handles actual deletion
}
}

Implementation Checklist

  • Add content_s3_key and content_available to messages table
  • Configure Postfix aggressive queue settings (1h max)
  • Implement S3 content storage in worker
  • Update worker to get SMTP response and update status immediately
  • Create background job for retention enforcement
  • Set up Postfix queue cleanup cron jobs
  • Configure logrotate for mail.log (7 day retention)
  • Implement S3 lifecycle policies per organization tier
  • Add retry logic in application (BullMQ) for deferred messages

Key Principles

  1. Minimize Postfix Storage - Max 1 hour in queue, aggressive cleanup
  2. Application Controls Retention - Not Postfix
  3. Status from SMTP Response - Not log parsing
  4. S3 for Content - Cheap, scalable, easy retention policies
  5. SQL for Metadata - Fast queries, keep longer than content for analytics
  6. Tier-Based Policies - Different retention per organization tier