Skip to main content

Logging Infrastructure

Overview

Postchi's logging infrastructure is designed to handle millions of requests per day across multiple products (Email Sending and Email Validation/Stamp) with minimal performance impact. The system uses a tiered storage approach optimized for cost, speed, and scalability.

Architecture

Storage Tiers

┌─────────────┐
│ Request │
└──────┬──────┘


┌─────────────────────────────────────┐
│ Application (API/Worker/Stamp) │
│ - Logs event immediately │
│ - Non-blocking Promise.all() │
└──────┬──────────────────────────────┘

├─────────────────┬──────────────────┐
▼ ▼ ▼
┌─────────────┐ ┌────────────┐ ┌──────────────┐
│ Redis │ │ Redis │ │ Redis │
│ (Hot Logs) │ │ (Queue) │ │ (Usage) │
│ Last 1000 │ │ Archive │ │ Counters │
│ per org │ │ Queue │ │ Daily/Month │
│ 7 days TTL │ │ FIFO │ │ Auto-expire │
└─────────────┘ └─────┬──────┘ └──────────────┘


┌─────────────┐
│ Worker │
│ (Every min) │
│ Batch: 500 │
└──────┬──────┘


┌─────────────┐
│ S3/R2 │
│ Archive │
│ NDJSON │
│ Infinite │
└─────────────┘

Storage Strategy

StoragePurposeRetentionAccess SpeedCost
Redis (Hot)Dashboard display, recent logsLast 1000 logs per org, 7 days<10msHigh
Redis (Queue)Archive queue before S3 writeUntil processed<10msHigh
Redis (Usage)Daily/monthly counters35 days auto-expire<10msHigh
S3/R2 (Archive)Long-term storage, complianceInfinite (configurable)100-500msVery Low

Phase 1 vs Phase 2

Phase 1 (Current Implementation):

  • Redis for hot logs and queues
  • S3/R2 for cold storage
  • SQL only for aggregated usage (hourly/daily rollups)

Phase 2 (Future):

  • Add ClickHouse for efficient log queries
  • Better analytics and time-range queries
  • Still use S3 for archive/compliance

Components

1. Shared Logging Module (@postchi/shared)

Located in packages/shared/src/logging/

Files:

  • types.ts - TypeScript types for all log entries
  • redis-client.ts - Singleton Redis client (db 2)
  • s3-client.ts - S3/R2 client configuration
  • usage-meter.ts - Usage counting and quota enforcement
  • log-writer.ts - Store logs in Redis and queue for S3
  • index.ts - Public API exports

Key Functions:

// Initialize clients (call once on startup)
Logging.initRedisClient({ host, port, password, db: 2 });
Logging.initS3Client({ region, endpoint, bucket, credentials });

// Store logs (non-blocking)
await Logging.storeHotLog(logEntry);
await Logging.queueLogForArchive(logEntry);
await Logging.incrementUsage(orgId, ProductType.EMAIL_VALIDATION);

// Retrieve logs
const logs = await Logging.getHotLogs(orgId, product, limit, offset);
const count = await Logging.getHotLogsCount(orgId, product);
const usage = await Logging.getUsage(orgId, product);

2. Product Types

enum ProductType {
EMAIL_SENDING = "EMAIL_SENDING",
EMAIL_VALIDATION = "EMAIL_VALIDATION",
}

3. Log Entry Types

Email Validation Log

interface ValidationLogEntry {
id: string; // UUID
timestamp: Date;
organizationId: string;
apiKeyId: string;
product: ProductType.EMAIL_VALIDATION;

// Request data
email: string;
options?: {
checkFormat?: boolean;
checkMx?: boolean;
checkSmtp?: boolean;
checkDisposable?: boolean;
checkCatchAll?: boolean;
timeout?: number;
};

// Result data
valid: boolean;
reason: string;
details: {
formatValid?: boolean;
mxExists?: boolean;
smtpValid?: boolean;
disposable?: boolean;
catchAll?: boolean;
smtpCode?: number;
smtpMessage?: string;
};

// Performance
duration: number; // milliseconds

// Storage
s3Key?: string; // Set after archiving
}

Email Sending Log

interface EmailSendingLogEntry {
id: string;
timestamp: Date;
organizationId: string;
apiKeyId: string;
product: ProductType.EMAIL_SENDING;

// Message data
messageId: string; // Postchi message ID
from: string;
to: string[];
cc?: string[];
bcc?: string[];
subject: string;
templateId?: string;
tags?: string[];
metadata?: Record<string, any>;

// Status
status: 'queued' | 'processing' | 'sent' | 'failed' | 'bounced';

// Performance
duration: number;

// Storage
s3Key?: string;
}

Implementation Locations

Stamp (Email Validator)

Location: /packages/api/srcWAIT, THIS IS WRONG

Actually: postchi-email-validator/ (separate repository)

Files Modified:

  • src/config.ts - Added Redis/R2 configuration
  • src/logging.ts - Initialize logging clients
  • src/server.ts - Call initializeLogging() on startup
  • src/routes/validate.ts - Log validation requests

Logging Flow:

// In validation endpoint (src/routes/validate.ts:89-112)
const logEntry = {
id: randomUUID(),
timestamp: new Date(),
organizationId: 'demo-org', // TODO: Get from auth
apiKeyId: 'demo-key', // TODO: Get from auth
product: ProductType.EMAIL_VALIDATION,
email,
options,
valid: result.valid,
reason: result.reason,
details: result.details,
duration,
};

// Non-blocking logging
Promise.all([
storeHotLog(logEntry),
queueLogForArchive(logEntry),
incrementUsage(organizationId, ProductType.EMAIL_VALIDATION),
]).catch((error) => {
request.log.error({ error, logEntry }, 'Failed to store validation log');
// Don't fail the request if logging fails
});

Postchi API

Location: packages/api/src/

Files Modified:

  • src/config/env.ts - Added REDIS_DB_LOGGING
  • src/index.ts - Initialize logging on startup
  • src/services/logs.service.ts - Business logic for fetching logs
  • src/api/controllers/logs.controller.ts - HTTP handlers
  • src/api/routes/logs.routes.ts - API routes
  • src/api/routes/index.ts - Registered /logs routes

Postchi Worker

Location: packages/worker/src/

Files Modified:

  • src/config/env.ts - Added Redis/R2 configuration
  • src/index.ts - Initialize logging on startup
  • src/workers/email.worker.ts - Log email sends (success + failure)
  • src/workers/log-archiver.worker.ts - NEW - Archives logs to S3

Email Worker Logging:

// After successful SMTP send (email.worker.ts:358-390)
const logEntry: Logging.EmailSendingLogEntry = {
id: randomUUID(),
timestamp: new Date(),
organizationId: data.organizationId,
apiKeyId: 'worker-send',
product: Logging.ProductType.EMAIL_SENDING,
messageId: data.messageId,
from: data.from.email,
to: data.to,
subject,
status: 'sent',
duration: sendDuration,
// ... metadata
};

// Non-blocking
Promise.all([
Logging.storeHotLog(logEntry),
Logging.queueLogForArchive(logEntry),
Logging.incrementUsage(organizationId, ProductType.EMAIL_SENDING),
]).catch((error) => {
logger.error({ error, messageId }, 'Failed to store email sending log');
});

Log Archiver Worker:

  • Runs every minute via BullMQ scheduled job
  • Fetches 500 logs at a time from Redis queue
  • Groups by organization and date
  • Writes to S3/R2 as newline-delimited JSON

Scheduler:

  • src/services/log-archiver-scheduler.service.ts - Creates repeatable job
  • Registered in src/index.ts on API startup

S3/R2 Storage Structure

logs/
EMAIL_VALIDATION/
{organizationId}/
2026/
02/
21/
1708531200000-a1b2c3d4.json
1708531260000-e5f6g7h8.json
EMAIL_SENDING/
{organizationId}/
2026/
02/
21/
1708531200000-i9j0k1l2.json

File Format: Newline-delimited JSON (NDJSON)

{"id":"log-1","timestamp":"2026-02-21T00:00:00.000Z","email":"user@example.com",...}
{"id":"log-2","timestamp":"2026-02-21T00:00:05.000Z","email":"test@example.com",...}

Benefits:

  • Organization isolation (easy to delete org data for GDPR)
  • Time-based partitioning (efficient date range queries)
  • Small files (better performance than giant files)
  • NDJSON format (easy to stream and process line-by-line)

API Endpoints

Get Logs

Generic Endpoint:

GET /v1/logs?product=EMAIL_VALIDATION&limit=100&offset=0
Authorization: Bearer <token>

Convenience Endpoints:

GET /v1/logs/validation?limit=100&offset=0
GET /v1/logs/sending?limit=100&offset=0

Response:

{
"success": true,
"data": [
{
"id": "550e8400-e29b-41d4-a716-446655440000",
"timestamp": 1708531200000,
"email": "user@example.com",
"valid": true,
"reason": "valid",
"duration": 150
}
],
"pagination": {
"total": 1000,
"limit": 100,
"offset": 0,
"hasMore": true
},
"meta": {
"product": "EMAIL_VALIDATION"
}
}

Get Usage Statistics

Single Product:

GET /v1/logs/usage?product=EMAIL_VALIDATION

Response:

{
"success": true,
"data": {
"dailyUsage": 1234,
"monthlyUsage": 45678,
"product": "EMAIL_VALIDATION"
}
}

All Products:

GET /v1/logs/usage/all

Response:

{
"success": true,
"data": {
"validation": {
"dailyUsage": 1234,
"monthlyUsage": 45678,
"product": "EMAIL_VALIDATION"
},
"sending": {
"dailyUsage": 5678,
"monthlyUsage": 123456,
"product": "EMAIL_SENDING"
}
}
}

Environment Variables

Required for All Services

# Redis (Logging)
REDIS_HOST=localhost
REDIS_PORT=6379
REDIS_PASSWORD=optional
REDIS_DB_LOGGING=2 # Use db 2 for logging

# Cloudflare R2 (Archive Storage)
R2_ACCOUNT_ID=your-account-id
R2_ACCESS_KEY_ID=your-access-key
R2_SECRET_ACCESS_KEY=your-secret
R2_BUCKET_NAME=postchi-logs

Performance Characteristics

Write Performance

  • Redis hot log write: <1ms
  • Redis queue push: <1ms
  • Redis usage counter increment: <1ms
  • Total logging overhead: <5ms (all async, non-blocking)

Read Performance

  • Dashboard (last 100 logs): <10ms (Redis)
  • Archive query (S3): 100-500ms (depends on file size)

Cost Optimization

  • Redis only stores last 1000 logs per org (compact format)
  • Auto-expire keys after 7 days
  • S3 storage: ~$0.015/GB/month (Cloudflare R2)
  • Estimated: $50-100/month for 10M logs/month

Scaling Considerations

Current Capacity

  • Redis: Can handle 100K+ writes/sec
  • S3: No practical limit
  • Worker: Processes 30K logs/minute (500 batch × 60 minutes)

When to Scale

  • If queue backup > 5 minutes: Increase worker concurrency
  • If Redis memory > 80%: Reduce hot log limit or TTL
  • If S3 costs high: Implement lifecycle policies (move to Glacier)

Phase 2 Migration (ClickHouse)

When query performance becomes important:

  1. Keep current Redis + S3 setup
  2. Add ClickHouse for analytics
  3. Worker writes to both S3 and ClickHouse
  4. Use ClickHouse for dashboard queries
  5. Keep S3 for compliance/backup

Monitoring

Key Metrics to Track

  • Redis queue depth: LLEN archive:queue:EMAIL_VALIDATION
  • Redis memory usage: INFO memory
  • S3 write failures: Check worker logs
  • Logging errors: Count of caught errors in Promise.catch()

Health Checks

# Check Redis connection
redis-cli -h localhost -p 6379 -n 2 PING

# Check queue size
redis-cli -h localhost -p 6379 -n 2 LLEN archive:queue:EMAIL_VALIDATION

# Check hot logs count for org
redis-cli -h localhost -p 6379 -n 2 ZCARD logs:EMAIL_VALIDATION:{orgId}

Testing

Manual Testing

1. Send validation request (Stamp):

curl -X POST http://localhost:3000/api/v1/validate \
-H "Content-Type: application/json" \
-d '{"email": "test@example.com"}'

2. Check Redis (hot logs):

redis-cli -n 2 ZRANGE logs:EMAIL_VALIDATION:demo-org 0 -1

3. Check Redis (queue):

redis-cli -n 2 LLEN archive:queue:EMAIL_VALIDATION

4. Check API (get logs):

curl http://localhost:3000/v1/logs/validation \
-H "Authorization: Bearer <token>"

5. Wait 1 minute for archiver, then check S3:

aws s3 ls s3://postchi-logs/logs/EMAIL_VALIDATION/demo-org/2026/02/21/ \
--endpoint-url=https://<account>.r2.cloudflarestorage.com

Troubleshooting

Logs not appearing in Redis

  • Check Redis connection: redis-cli -n 2 PING
  • Verify logging initialized: Look for "✅ Logging infrastructure initialized" in logs
  • Check for errors in application logs

Logs not archiving to S3

  • Check worker is running: ps aux | grep worker
  • Check queue has items: redis-cli -n 2 LLEN archive:queue:EMAIL_VALIDATION
  • Check worker logs for S3 errors
  • Verify R2 credentials are correct

High Redis memory usage

  • Check total keys: redis-cli -n 2 DBSIZE
  • Check largest keys: redis-cli -n 2 --bigkeys
  • Reduce HOT_LOGS_LIMIT in log-writer.ts (currently 1000)
  • Reduce TTL (currently 7 days)

Usage counters incorrect

  • Counters auto-expire (daily: 2 days, monthly: 35 days)
  • Check if date changed during testing
  • Manually check Redis: redis-cli -n 2 GET usage:EMAIL_VALIDATION:{orgId}:daily:{YYYYMMDD}

Future Enhancements

Phase 2 (ClickHouse)

  • Set up ClickHouse cluster
  • Create tables with proper partitioning
  • Modify worker to write to both S3 and ClickHouse
  • Update API to query ClickHouse for analytics
  • Keep S3 as backup/archive

Features

  • Log retention policies (auto-delete after X days)
  • Advanced filtering (by date range, status, email domain)
  • Export logs to CSV/JSON
  • Real-time log streaming (WebSockets)
  • Alerting on unusual patterns
  • Aggregated analytics dashboard
  • Cost attribution per organization

Optimizations

  • Compress logs before S3 write (gzip)
  • Use Parquet format instead of NDJSON for better compression
  • Implement log sampling for high-volume orgs
  • Add rate limiting per organization
  • Implement quota enforcement in logging layer

Summary

The logging infrastructure is designed to:

  1. Scale to millions of requests/day without performance impact
  2. Keep costs low using tiered storage
  3. Provide fast dashboard access with Redis hot logs
  4. Enable compliance with S3 long-term storage
  5. Allow easy analytics in Phase 2 with ClickHouse

All logging is non-blocking and fault-tolerant - if logging fails, the core application continues to work.