Logging Infrastructure
Overview
Postchi's logging infrastructure is designed to handle millions of requests per day across multiple products (Email Sending and Email Validation/Stamp) with minimal performance impact. The system uses a tiered storage approach optimized for cost, speed, and scalability.
Architecture
Storage Tiers
┌─────────────┐
│ Request │
└──────┬──────┘
│
▼
┌─────────────────────────────────────┐
│ Application (API/Worker/Stamp) │
│ - Logs event immediately │
│ - Non-blocking Promise.all() │
└──────┬──────────────────────────────┘
│
├─────────────────┬──────────────────┐
▼ ▼ ▼
┌─────────────┐ ┌────────────┐ ┌──────────────┐
│ Redis │ │ Redis │ │ Redis │
│ (Hot Logs) │ │ (Queue) │ │ (Usage) │
│ Last 1000 │ │ Archive │ │ Counters │
│ per org │ │ Queue │ │ Daily/Month │
│ 7 days TTL │ │ FIFO │ │ Auto-expire │
└─────────────┘ └─────┬──────┘ └──────────────┘
│
▼
┌─────────────┐
│ Worker │
│ (Every min) │
│ Batch: 500 │
└──────┬──────┘
│
▼
┌─────────────┐
│ S3/R2 │
│ Archive │
│ NDJSON │
│ Infinite │
└─────────────┘
Storage Strategy
| Storage | Purpose | Retention | Access Speed | Cost |
|---|---|---|---|---|
| Redis (Hot) | Dashboard display, recent logs | Last 1000 logs per org, 7 days | <10ms | High |
| Redis (Queue) | Archive queue before S3 write | Until processed | <10ms | High |
| Redis (Usage) | Daily/monthly counters | 35 days auto-expire | <10ms | High |
| S3/R2 (Archive) | Long-term storage, compliance | Infinite (configurable) | 100-500ms | Very Low |
Phase 1 vs Phase 2
Phase 1 (Current Implementation):
- Redis for hot logs and queues
- S3/R2 for cold storage
- SQL only for aggregated usage (hourly/daily rollups)
Phase 2 (Future):
- Add ClickHouse for efficient log queries
- Better analytics and time-range queries
- Still use S3 for archive/compliance
Components
1. Shared Logging Module (@postchi/shared)
Located in packages/shared/src/logging/
Files:
types.ts- TypeScript types for all log entriesredis-client.ts- Singleton Redis client (db 2)s3-client.ts- S3/R2 client configurationusage-meter.ts- Usage counting and quota enforcementlog-writer.ts- Store logs in Redis and queue for S3index.ts- Public API exports
Key Functions:
// Initialize clients (call once on startup)
Logging.initRedisClient({ host, port, password, db: 2 });
Logging.initS3Client({ region, endpoint, bucket, credentials });
// Store logs (non-blocking)
await Logging.storeHotLog(logEntry);
await Logging.queueLogForArchive(logEntry);
await Logging.incrementUsage(orgId, ProductType.EMAIL_VALIDATION);
// Retrieve logs
const logs = await Logging.getHotLogs(orgId, product, limit, offset);
const count = await Logging.getHotLogsCount(orgId, product);
const usage = await Logging.getUsage(orgId, product);
2. Product Types
enum ProductType {
EMAIL_SENDING = "EMAIL_SENDING",
EMAIL_VALIDATION = "EMAIL_VALIDATION",
}
3. Log Entry Types
Email Validation Log
interface ValidationLogEntry {
id: string; // UUID
timestamp: Date;
organizationId: string;
apiKeyId: string;
product: ProductType.EMAIL_VALIDATION;
// Request data
email: string;
options?: {
checkFormat?: boolean;
checkMx?: boolean;
checkSmtp?: boolean;
checkDisposable?: boolean;
checkCatchAll?: boolean;
timeout?: number;
};
// Result data
valid: boolean;
reason: string;
details: {
formatValid?: boolean;
mxExists?: boolean;
smtpValid?: boolean;
disposable?: boolean;
catchAll?: boolean;
smtpCode?: number;
smtpMessage?: string;
};
// Performance
duration: number; // milliseconds
// Storage
s3Key?: string; // Set after archiving
}
Email Sending Log
interface EmailSendingLogEntry {
id: string;
timestamp: Date;
organizationId: string;
apiKeyId: string;
product: ProductType.EMAIL_SENDING;
// Message data
messageId: string; // Postchi message ID
from: string;
to: string[];
cc?: string[];
bcc?: string[];
subject: string;
templateId?: string;
tags?: string[];
metadata?: Record<string, any>;
// Status
status: 'queued' | 'processing' | 'sent' | 'failed' | 'bounced';
// Performance
duration: number;
// Storage
s3Key?: string;
}
Implementation Locations
Stamp (Email Validator)
Location: /packages/api/src → WAIT, THIS IS WRONG
Actually: postchi-email-validator/ (separate repository)
Files Modified:
src/config.ts- Added Redis/R2 configurationsrc/logging.ts- Initialize logging clientssrc/server.ts- CallinitializeLogging()on startupsrc/routes/validate.ts- Log validation requests
Logging Flow:
// In validation endpoint (src/routes/validate.ts:89-112)
const logEntry = {
id: randomUUID(),
timestamp: new Date(),
organizationId: 'demo-org', // TODO: Get from auth
apiKeyId: 'demo-key', // TODO: Get from auth
product: ProductType.EMAIL_VALIDATION,
email,
options,
valid: result.valid,
reason: result.reason,
details: result.details,
duration,
};
// Non-blocking logging
Promise.all([
storeHotLog(logEntry),
queueLogForArchive(logEntry),
incrementUsage(organizationId, ProductType.EMAIL_VALIDATION),
]).catch((error) => {
request.log.error({ error, logEntry }, 'Failed to store validation log');
// Don't fail the request if logging fails
});
Postchi API
Location: packages/api/src/
Files Modified:
src/config/env.ts- AddedREDIS_DB_LOGGINGsrc/index.ts- Initialize logging on startupsrc/services/logs.service.ts- Business logic for fetching logssrc/api/controllers/logs.controller.ts- HTTP handlerssrc/api/routes/logs.routes.ts- API routessrc/api/routes/index.ts- Registered/logsroutes
Postchi Worker
Location: packages/worker/src/
Files Modified:
src/config/env.ts- Added Redis/R2 configurationsrc/index.ts- Initialize logging on startupsrc/workers/email.worker.ts- Log email sends (success + failure)src/workers/log-archiver.worker.ts- NEW - Archives logs to S3
Email Worker Logging:
// After successful SMTP send (email.worker.ts:358-390)
const logEntry: Logging.EmailSendingLogEntry = {
id: randomUUID(),
timestamp: new Date(),
organizationId: data.organizationId,
apiKeyId: 'worker-send',
product: Logging.ProductType.EMAIL_SENDING,
messageId: data.messageId,
from: data.from.email,
to: data.to,
subject,
status: 'sent',
duration: sendDuration,
// ... metadata
};
// Non-blocking
Promise.all([
Logging.storeHotLog(logEntry),
Logging.queueLogForArchive(logEntry),
Logging.incrementUsage(organizationId, ProductType.EMAIL_SENDING),
]).catch((error) => {
logger.error({ error, messageId }, 'Failed to store email sending log');
});
Log Archiver Worker:
- Runs every minute via BullMQ scheduled job
- Fetches 500 logs at a time from Redis queue
- Groups by organization and date
- Writes to S3/R2 as newline-delimited JSON
Scheduler:
src/services/log-archiver-scheduler.service.ts- Creates repeatable job- Registered in
src/index.tson API startup
S3/R2 Storage Structure
logs/
EMAIL_VALIDATION/
{organizationId}/
2026/
02/
21/
1708531200000-a1b2c3d4.json
1708531260000-e5f6g7h8.json
EMAIL_SENDING/
{organizationId}/
2026/
02/
21/
1708531200000-i9j0k1l2.json
File Format: Newline-delimited JSON (NDJSON)
{"id":"log-1","timestamp":"2026-02-21T00:00:00.000Z","email":"user@example.com",...}
{"id":"log-2","timestamp":"2026-02-21T00:00:05.000Z","email":"test@example.com",...}
Benefits:
- Organization isolation (easy to delete org data for GDPR)
- Time-based partitioning (efficient date range queries)
- Small files (better performance than giant files)
- NDJSON format (easy to stream and process line-by-line)
API Endpoints
Get Logs
Generic Endpoint:
GET /v1/logs?product=EMAIL_VALIDATION&limit=100&offset=0
Authorization: Bearer <token>
Convenience Endpoints:
GET /v1/logs/validation?limit=100&offset=0
GET /v1/logs/sending?limit=100&offset=0
Response:
{
"success": true,
"data": [
{
"id": "550e8400-e29b-41d4-a716-446655440000",
"timestamp": 1708531200000,
"email": "user@example.com",
"valid": true,
"reason": "valid",
"duration": 150
}
],
"pagination": {
"total": 1000,
"limit": 100,
"offset": 0,
"hasMore": true
},
"meta": {
"product": "EMAIL_VALIDATION"
}
}
Get Usage Statistics
Single Product:
GET /v1/logs/usage?product=EMAIL_VALIDATION
Response:
{
"success": true,
"data": {
"dailyUsage": 1234,
"monthlyUsage": 45678,
"product": "EMAIL_VALIDATION"
}
}
All Products:
GET /v1/logs/usage/all
Response:
{
"success": true,
"data": {
"validation": {
"dailyUsage": 1234,
"monthlyUsage": 45678,
"product": "EMAIL_VALIDATION"
},
"sending": {
"dailyUsage": 5678,
"monthlyUsage": 123456,
"product": "EMAIL_SENDING"
}
}
}
Environment Variables
Required for All Services
# Redis (Logging)
REDIS_HOST=localhost
REDIS_PORT=6379
REDIS_PASSWORD=optional
REDIS_DB_LOGGING=2 # Use db 2 for logging
# Cloudflare R2 (Archive Storage)
R2_ACCOUNT_ID=your-account-id
R2_ACCESS_KEY_ID=your-access-key
R2_SECRET_ACCESS_KEY=your-secret
R2_BUCKET_NAME=postchi-logs
Performance Characteristics
Write Performance
- Redis hot log write: <1ms
- Redis queue push: <1ms
- Redis usage counter increment: <1ms
- Total logging overhead: <5ms (all async, non-blocking)
Read Performance
- Dashboard (last 100 logs): <10ms (Redis)
- Archive query (S3): 100-500ms (depends on file size)
Cost Optimization
- Redis only stores last 1000 logs per org (compact format)
- Auto-expire keys after 7 days
- S3 storage: ~$0.015/GB/month (Cloudflare R2)
- Estimated: $50-100/month for 10M logs/month
Scaling Considerations
Current Capacity
- Redis: Can handle 100K+ writes/sec
- S3: No practical limit
- Worker: Processes 30K logs/minute (500 batch × 60 minutes)
When to Scale
- If queue backup > 5 minutes: Increase worker concurrency
- If Redis memory > 80%: Reduce hot log limit or TTL
- If S3 costs high: Implement lifecycle policies (move to Glacier)
Phase 2 Migration (ClickHouse)
When query performance becomes important:
- Keep current Redis + S3 setup
- Add ClickHouse for analytics
- Worker writes to both S3 and ClickHouse
- Use ClickHouse for dashboard queries
- Keep S3 for compliance/backup
Monitoring
Key Metrics to Track
- Redis queue depth:
LLEN archive:queue:EMAIL_VALIDATION - Redis memory usage:
INFO memory - S3 write failures: Check worker logs
- Logging errors: Count of caught errors in Promise.catch()
Health Checks
# Check Redis connection
redis-cli -h localhost -p 6379 -n 2 PING
# Check queue size
redis-cli -h localhost -p 6379 -n 2 LLEN archive:queue:EMAIL_VALIDATION
# Check hot logs count for org
redis-cli -h localhost -p 6379 -n 2 ZCARD logs:EMAIL_VALIDATION:{orgId}
Testing
Manual Testing
1. Send validation request (Stamp):
curl -X POST http://localhost:3000/api/v1/validate \
-H "Content-Type: application/json" \
-d '{"email": "test@example.com"}'
2. Check Redis (hot logs):
redis-cli -n 2 ZRANGE logs:EMAIL_VALIDATION:demo-org 0 -1
3. Check Redis (queue):
redis-cli -n 2 LLEN archive:queue:EMAIL_VALIDATION
4. Check API (get logs):
curl http://localhost:3000/v1/logs/validation \
-H "Authorization: Bearer <token>"
5. Wait 1 minute for archiver, then check S3:
aws s3 ls s3://postchi-logs/logs/EMAIL_VALIDATION/demo-org/2026/02/21/ \
--endpoint-url=https://<account>.r2.cloudflarestorage.com
Troubleshooting
Logs not appearing in Redis
- Check Redis connection:
redis-cli -n 2 PING - Verify logging initialized: Look for "✅ Logging infrastructure initialized" in logs
- Check for errors in application logs
Logs not archiving to S3
- Check worker is running:
ps aux | grep worker - Check queue has items:
redis-cli -n 2 LLEN archive:queue:EMAIL_VALIDATION - Check worker logs for S3 errors
- Verify R2 credentials are correct
High Redis memory usage
- Check total keys:
redis-cli -n 2 DBSIZE - Check largest keys:
redis-cli -n 2 --bigkeys - Reduce
HOT_LOGS_LIMITinlog-writer.ts(currently 1000) - Reduce TTL (currently 7 days)
Usage counters incorrect
- Counters auto-expire (daily: 2 days, monthly: 35 days)
- Check if date changed during testing
- Manually check Redis:
redis-cli -n 2 GET usage:EMAIL_VALIDATION:{orgId}:daily:{YYYYMMDD}
Future Enhancements
Phase 2 (ClickHouse)
- Set up ClickHouse cluster
- Create tables with proper partitioning
- Modify worker to write to both S3 and ClickHouse
- Update API to query ClickHouse for analytics
- Keep S3 as backup/archive
Features
- Log retention policies (auto-delete after X days)
- Advanced filtering (by date range, status, email domain)
- Export logs to CSV/JSON
- Real-time log streaming (WebSockets)
- Alerting on unusual patterns
- Aggregated analytics dashboard
- Cost attribution per organization
Optimizations
- Compress logs before S3 write (gzip)
- Use Parquet format instead of NDJSON for better compression
- Implement log sampling for high-volume orgs
- Add rate limiting per organization
- Implement quota enforcement in logging layer
Summary
The logging infrastructure is designed to:
- Scale to millions of requests/day without performance impact
- Keep costs low using tiered storage
- Provide fast dashboard access with Redis hot logs
- Enable compliance with S3 long-term storage
- Allow easy analytics in Phase 2 with ClickHouse
All logging is non-blocking and fault-tolerant - if logging fails, the core application continues to work.