# Performance Improvement Recommendations for Echoes of the Ashes ## Current Performance Baseline - **Throughput**: 212 req/s (with 8 workers) - **Success Rate**: 94% (6% failures under load) - **Bottleneck**: Database connection pool and complex queries ## Quick Wins (Immediate Implementation) ### 1. Increase Database Connection Pool ⚡ **HIGH IMPACT** **Current**: Default pool size (~10-20 connections per worker) **Problem**: 8 workers competing for limited connections ```python # In api/database.py, update engine creation: engine = create_async_engine( DATABASE_URL, echo=False, pool_size=20, # Increased from default 5 max_overflow=30, # Allow bursts up to 50 total connections pool_timeout=30, # Wait up to 30s for connection pool_recycle=3600, # Recycle connections every hour pool_pre_ping=True # Verify connections before use ) ``` **Expected Impact**: +30-50% throughput, reduce failures to <2% ### 2. Add Database Indexes 🚀 **HIGH IMPACT** **Current**: Missing indexes on frequently queried columns ```sql -- Run these in PostgreSQL: -- Player lookups (auth) CREATE INDEX IF NOT EXISTS idx_players_username ON players(username); CREATE INDEX IF NOT EXISTS idx_players_telegram_id ON players(telegram_id); -- Location queries (most expensive operation) CREATE INDEX IF NOT EXISTS idx_players_location_id ON players(location_id); CREATE INDEX IF NOT EXISTS idx_dropped_items_location ON dropped_items(location_id); CREATE INDEX IF NOT EXISTS idx_wandering_enemies_location ON wandering_enemies(location_id); -- Combat queries CREATE INDEX IF NOT EXISTS idx_active_combats_player_id ON active_combats(player_id); -- Inventory queries CREATE INDEX IF NOT EXISTS idx_inventory_player_id ON inventory(player_id); CREATE INDEX IF NOT EXISTS idx_inventory_item_id ON inventory(item_id); -- Compound index for item pickups CREATE INDEX IF NOT EXISTS idx_inventory_player_item ON inventory(player_id, item_id); ``` **Expected Impact**: 50-70% faster location queries (730ms → 200-300ms) ### 3. Implement Redis Caching Layer 💾 **MEDIUM IMPACT** Cache frequently accessed, rarely changing data: ```python # Install: pip install redis aioredis import aioredis import json redis = await aioredis.create_redis_pool('redis://localhost') # Cache item definitions (never change) async def get_item_cached(item_id: str): cached = await redis.get(f"item:{item_id}") if cached: return json.loads(cached) item = ITEMS_MANAGER.get_item(item_id) await redis.setex(f"item:{item_id}", 3600, json.dumps(item)) return item # Cache location data (5 second TTL for NPCs/items) async def get_location_cached(location_id: str): cached = await redis.get(f"location:{location_id}") if cached: return json.loads(cached) location = await get_location_from_db(location_id) await redis.setex(f"location:{location_id}", 5, json.dumps(location)) return location ``` **Expected Impact**: +40-60% throughput for read-heavy operations ### 4. Optimize Location Query 📊 **HIGH IMPACT** **Current Issue**: Location endpoint makes 5+ separate DB queries **Solution**: Use a single JOIN query or batch operations ```python async def get_location_data(location_id: str, player_id: int): """Optimized single-query location data fetch""" async with DatabaseSession() as session: # Single query with JOINs instead of 5 separate queries stmt = select( dropped_items, wandering_enemies, players ).where( or_( dropped_items.c.location_id == location_id, wandering_enemies.c.location_id == location_id, players.c.location_id == location_id ) ) result = await session.execute(stmt) # Process all data in one go ``` **Expected Impact**: 60-70% faster location queries ## Medium-Term Improvements ### 5. Database Read Replicas 🔄 Set up PostgreSQL read replicas for location queries (read-heavy): ```yaml # docker-compose.yml echoes_db_replica: image: postgres:15 environment: POSTGRES_REPLICATION_MODE: slave POSTGRES_MASTER_HOST: echoes_of_the_ashes_db ``` Route read-only queries to replicas, writes to primary. **Expected Impact**: 2x throughput for read operations ### 6. Batch Processing for Item Operations Instead of individual item pickup/drop operations: ```python # Current: N queries for N items for item in items: await db.add_to_inventory(player_id, item) # Optimized: 1 query for N items await db.batch_add_to_inventory(player_id, items) ``` ### 7. Optimize Status Effects Query Current player status effects might be queried inefficiently: ```python # Use eager loading stmt = select(players).options( selectinload(players.status_effects) ).where(players.c.id == player_id) ``` ### 8. Connection Pooling at Application Level Use PgBouncer in transaction mode: ```yaml pgbouncer: image: pgbouncer/pgbouncer environment: DATABASES: echoes_db=host=echoes_of_the_ashes_db port=5432 dbname=echoes POOL_MODE: transaction MAX_CLIENT_CONN: 1000 DEFAULT_POOL_SIZE: 25 ``` **Expected Impact**: Better connection management, +20-30% throughput ## Long-Term / Infrastructure Improvements ### 9. Horizontal Scaling - Load balancer in front of multiple API containers - Shared Redis session store - Database connection pooler (PgBouncer) ### 10. Database Query Optimization Monitor slow queries: ```sql -- Enable slow query logging ALTER DATABASE echoes SET log_min_duration_statement = 100; -- Find slow queries SELECT query, calls, mean_exec_time, max_exec_time FROM pg_stat_statements ORDER BY mean_exec_time DESC LIMIT 10; ``` ### 11. Asynchronous Task Queue Offload heavy operations to background workers: ```python # Use Celery/RQ for: - Combat damage calculations - Loot generation - Statistics updates - Email notifications ``` ### 12. CDN for Static Assets Move images to CDN (CloudFlare, AWS CloudFront) ## Implementation Priority ### Phase 1 (Today - 1 hour work) 1. ✅ **Add database indexes** (30 min) 2. ✅ **Increase connection pool** (5 min) 3. ⚠️ Test and verify improvements **Expected Result**: 300-400 req/s, <2% failures ### Phase 2 (This Week) 1. Implement Redis caching for items/NPCs 2. Optimize location query to single JOIN 3. Add PgBouncer connection pooler **Expected Result**: 500-700 req/s ### Phase 3 (Next Sprint) 1. Add database read replicas 2. Implement batch operations 3. Set up monitoring (Prometheus/Grafana) **Expected Result**: 1000+ req/s ## Monitoring Recommendations Add performance monitoring: ```python # Add to api/main.py from prometheus_client import Counter, Histogram import time request_duration = Histogram('http_request_duration_seconds', 'HTTP request latency') request_count = Counter('http_requests_total', 'Total HTTP requests') @app.middleware("http") async def monitor_requests(request, call_next): start = time.time() response = await call_next(request) duration = time.time() - start request_duration.observe(duration) request_count.inc() return response ``` ## Quick Performance Test Commands ```bash # Test current performance cd /opt/dockers/echoes_of_the_ashes timeout 300 .venv/bin/python load_test.py # Monitor database connections docker exec echoes_of_the_ashes_db psql -U your_user -d echoes -c \ "SELECT count(*) as connections FROM pg_stat_activity;" # Check slow queries docker exec echoes_of_the_ashes_db psql -U your_user -d echoes -c \ "SELECT query, mean_exec_time FROM pg_stat_statements ORDER BY mean_exec_time DESC LIMIT 5;" # Monitor API CPU/Memory docker stats echoes_of_the_ashes_api ``` ## Cost vs Benefit Analysis | Improvement | Time to Implement | Performance Gain | Complexity | |-------------|-------------------|------------------|------------| | Database Indexes | 30 minutes | +50-70% | Low | | Connection Pool | 5 minutes | +30-50% | Low | | Optimize Location Query | 2 hours | +60-70% | Medium | | Redis Caching | 4 hours | +40-60% | Medium | | PgBouncer | 1 hour | +20-30% | Low | | Read Replicas | 1 day | +100% reads | High | | Batch Operations | 4 hours | +30-40% | Medium | ## Conclusion **Most Impact for Least Effort**: 1. Add database indexes (30 min) → +50-70% faster queries 2. Increase connection pool (5 min) → +30-50% throughput 3. Add PgBouncer (1 hour) → +20-30% throughput Combined: **Could reach 400-500 req/s with just a few hours of work** Current bottleneck is definitely the **database** (not the API workers anymore). Focus optimization there first.