# Scalability Analysis - Background Tasks **Date:** October 21, 2025 **Scope:** Performance analysis for 10,000+ concurrent players ## Executive Summary ⚠️ **Current implementation has SEVERE scalability issues** at 10,000 players: | Function | Current | 10K Players Impact | Risk Level | |----------|---------|-------------------|------------| | `regenerate_stamina()` | **O(n)** fetch-all + loop | ~10K DB queries every 5min | 🔴 **CRITICAL** | | `check_combat_timers()` | **O(n)** fetch-all + loop | Fetch all combats every 30s | 🟡 **HIGH** | | `decay_dropped_items()` | **O(1)** single DELETE | ~1 query every 5min | 🟢 **LOW** | ## Detailed Analysis --- ### 1. `regenerate_stamina()` - 🔴 CRITICAL ISSUE **Current Implementation:** ```python async def regenerate_all_players_stamina() -> int: # 1. SELECT ALL players below max stamina result = await conn.execute( players.select().where( (players.c.is_dead == False) & (players.c.stamina < players.c.max_stamina) ) ) players_to_update = result.fetchall() # Load ALL into memory # 2. Loop through EACH player (O(n)) for player in players_to_update: # Calculate recovery per player base_recovery = 1 endurance_bonus = player.endurance // 10 total_recovery = base_recovery + endurance_bonus new_stamina = min(player.stamina + total_recovery, player.max_stamina) # 3. Individual UPDATE query per player (O(n) queries!) await conn.execute( players.update() .where(players.c.telegram_id == player.telegram_id) .values(stamina=new_stamina) ) ``` **Performance at Scale:** - **10,000 active players** with stamina < max - Runs every **5 minutes** (288 times per day) - **Operations per cycle:** - 1 SELECT query → 10K rows loaded into memory - 10K individual UPDATE queries - **Total: 10,001 queries per cycle** - **Daily load:** 2,880,000+ queries just for stamina regeneration! **Memory Impact:** - Loading 10K player objects into Python: ~5-10 MB per cycle - Holding them during UPDATE loop: memory spike every 5 minutes **Database Impact:** - 10K sequential UPDATE queries = **MASSIVE lock contention** - Each UPDATE acquires row locks - Other queries (player actions) get blocked - **Potential cascading failures** under load **Network Latency:** - If DB has 5ms latency: 10K × 5ms = **50 seconds** per cycle - Blocks the async loop for 50+ seconds - Other background tasks starve --- ### 2. `check_combat_timers()` - 🟡 HIGH RISK **Current Implementation:** ```python async def check_combat_timers(): # Every 30 seconds: idle_combats = await database.get_all_idle_combats(idle_threshold) # In database.py: stmt = active_combats.select().where( active_combats.c.turn_started_at < idle_threshold ) result = await conn.execute(stmt) return [row._asdict() for row in result.fetchall()] # Load ALL # Loop through each combat for combat in idle_combats: await combat_logic.npc_attack(combat['player_id']) ``` **Performance at Scale:** - Assume 5% of players in combat at any time: **500 combats** - Runs every **30 seconds** (2,880 times per day) - **Operations per cycle:** - 1 SELECT query → 500 rows - 500 × `npc_attack()` calls (each does multiple DB queries) - **Estimate: 500-1000 queries per cycle** **Problems:** - If combat rate increases (10% in combat): **1000 combats** - `npc_attack()` itself does multiple DB operations: - Update combat state - Update player HP - Check for death - Potential inventory operations - **Cascading load** during peak hours **Edge Case Risk:** - If many players go AFK simultaneously (server maintenance, network issue) - Could have 1000+ idle combats to process at once - 30-second cycle time becomes 5+ minutes - Combats pile up, system collapses --- ### 3. `decay_dropped_items()` - 🟢 LOW RISK (Optimal) **Current Implementation:** ```python async def remove_expired_dropped_items(timestamp_limit: float) -> int: stmt = dropped_items.delete().where( dropped_items.c.drop_timestamp < timestamp_limit ) result = await conn.execute(stmt) await conn.commit() return result.rowcount ``` **Performance at Scale:** - **Single DELETE query** with WHERE clause - Database handles filtering efficiently (indexed timestamp) - **O(1) in terms of queries** (regardless of player count) - Only cleanup work scales with number of expired items (which is constant per time window) **Why This Works:** - ✅ Single query, database-side filtering - ✅ Indexed timestamp column - ✅ No data loaded into Python memory - ✅ Scales to millions of items --- ## Scalability Comparison Table | Metric | `regenerate_stamina()` | `check_combat_timers()` | `decay_dropped_items()` | |--------|------------------------|-------------------------|------------------------| | **Queries/cycle** | 10,001 (10K players) | 500-1000 (500 combats) | 1 | | **Memory usage** | 5-10 MB | 1-2 MB | <1 KB | | **Cycle time** | 50+ seconds | 5-10 seconds | <100ms | | **Lock contention** | **SEVERE** | Moderate | Minimal | | **Network overhead** | **MASSIVE** | High | Low | | **Scalability** | **O(n) queries** | O(m) queries | **O(1) queries** | | **10K players** | 🔴 Breaks | 🟡 Struggles | 🟢 Fine | | **100K players** | 💀 Dead | 💀 Dead | 🟢 Fine | --- ## Recommended Solutions ### 🔴 CRITICAL: Fix `regenerate_stamina()` **Option 1: Single UPDATE Query (Best)** ```sql -- PostgreSQL supports calculated updates UPDATE players SET stamina = LEAST( stamina + 1 + (endurance / 10), -- base + endurance bonus max_stamina ) WHERE is_dead = FALSE AND stamina < max_stamina RETURNING telegram_id; ``` **Benefits:** - **1 query instead of 10,001** - Database calculates per-row (no Python loop) - Atomic operation (no race conditions) - **~1000x faster** **Implementation:** ```python async def regenerate_all_players_stamina() -> int: async with engine.connect() as conn: stmt = text(""" UPDATE players SET stamina = LEAST( stamina + 1 + (endurance / 10), max_stamina ) WHERE is_dead = FALSE AND stamina < max_stamina """) result = await conn.execute(stmt) await conn.commit() return result.rowcount ``` **Performance Gain:** - 10K queries → **1 query** - 50 seconds → **<1 second** - No memory bloat - No lock contention --- **Option 2: Batch Updates (Good)** If you need custom Python logic per player: ```python async def regenerate_all_players_stamina() -> int: async with engine.connect() as conn: # Still fetch all (1 query) result = await conn.execute( players.select().where( (players.c.is_dead == False) & (players.c.stamina < players.c.max_stamina) ) ) players_to_update = result.fetchall() # Build batch update updates = [] for player in players_to_update: base_recovery = 1 endurance_bonus = player.endurance // 10 total_recovery = base_recovery + endurance_bonus new_stamina = min(player.stamina + total_recovery, player.max_stamina) if new_stamina > player.stamina: updates.append({ 'telegram_id': player.telegram_id, 'stamina': new_stamina }) # Single bulk update (PostgreSQL specific) if updates: await conn.execute( players.update(), updates ) await conn.commit() return len(updates) ``` **Performance Gain:** - 10K queries → **2 queries** (1 SELECT + 1 bulk UPDATE) - 50 seconds → **1-2 seconds** - Still loads data into memory (not ideal) --- ### 🟡 HIGH: Optimize `check_combat_timers()` **Option 1: Limit + Pagination** ```python async def check_combat_timers(): BATCH_SIZE = 100 while not shutdown_event.is_set(): try: await asyncio.wait_for(shutdown_event.wait(), timeout=30) except asyncio.TimeoutError: idle_threshold = time.time() - 300 offset = 0 while True: # Process in batches idle_combats = await database.get_idle_combats_paginated( idle_threshold, limit=BATCH_SIZE, offset=offset ) if not idle_combats: break for combat in idle_combats: try: from bot import combat as combat_logic if combat['turn'] == 'player': await database.update_combat(combat['player_id'], { 'turn': 'npc', 'turn_started_at': time.time() }) await combat_logic.npc_attack(combat['player_id']) except Exception as e: logger.error(f"Error processing idle combat: {e}") offset += BATCH_SIZE ``` **Benefits:** - Processes 100 at a time instead of all - Prevents memory spikes - Other tasks can interleave --- **Option 2: Database-Side Auto-Timeout** ```sql -- Add trigger to auto-switch turns CREATE OR REPLACE FUNCTION auto_timeout_combat() RETURNS trigger AS $$ BEGIN IF NEW.turn_started_at < (EXTRACT(EPOCH FROM NOW()) - 300) THEN NEW.turn := CASE WHEN NEW.turn = 'player' THEN 'npc' ELSE 'player' END; NEW.turn_started_at := EXTRACT(EPOCH FROM NOW()); END IF; RETURN NEW; END; $$ LANGUAGE plpgsql; ``` **Benefits:** - No Python loop needed - Database handles it automatically - Zero application load --- ### 🟢 `decay_dropped_items()` - Already Optimal No changes needed. This is the **gold standard** for background tasks. --- ## Performance Projections ### Current System (Before Optimization) | Players | Stamina Regen Time | Combat Check Time | Total Background Load | |---------|-------------------|-------------------|---------------------| | 100 | 0.5s | 0.1s | Negligible | | 1,000 | 5s | 1s | Manageable | | 10,000 | **50s+** | **10s+** | 🔴 **Breaking** | | 100,000 | **500s+** | **100s+** | 💀 **Dead** | ### After Optimization (Single-Query Approach) | Players | Stamina Regen Time | Combat Check Time | Total Background Load | |---------|-------------------|-------------------|---------------------| | 100 | 0.1s | 0.1s | Negligible | | 1,000 | 0.2s | 0.5s | Low | | 10,000 | **0.5s** | **2s** | 🟢 **Good** | | 100,000 | **2s** | **10s** | 🟡 **Acceptable** | --- ## Additional Recommendations ### 1. Add Database Indexes ```sql -- Speed up stamina regeneration query CREATE INDEX idx_players_stamina_regen ON players(is_dead, stamina) WHERE is_dead = FALSE AND stamina < max_stamina; -- Speed up idle combat check CREATE INDEX idx_combat_turn_time ON active_combats(turn_started_at); -- Already optimal for dropped items CREATE INDEX idx_dropped_items_timestamp ON dropped_items(drop_timestamp); ``` ### 2. Add Monitoring ```python import time async def regenerate_stamina(): while not shutdown_event.is_set(): try: await asyncio.wait_for(shutdown_event.wait(), timeout=300) except asyncio.TimeoutError: start_time = time.time() logger.info("Running stamina regeneration...") players_updated = await database.regenerate_all_players_stamina() elapsed = time.time() - start_time logger.info( f"Regenerated stamina for {players_updated} players " f"in {elapsed:.2f}s" ) # Alert if slow if elapsed > 5.0: logger.warning( f"⚠️ Stamina regeneration took {elapsed:.2f}s " f"(threshold: 5s)" ) ``` ### 3. Add Connection Pooling ```python # In database.py from sqlalchemy.pool import NullPool, QueuePool engine = create_async_engine( DATABASE_URL, poolclass=QueuePool, pool_size=20, # Max 20 connections max_overflow=10, # Allow 10 more if needed pool_pre_ping=True, # Test connections before use ) ``` ### 4. Consider Redis for Hot Data For frequently accessed data (player stats, combat state): ```python import redis.asyncio as redis # Cache player stamina in Redis async def get_player_cached(player_id: int): cached = await redis_client.get(f"player:{player_id}") if cached: return json.loads(cached) # Fetch from DB, cache for 1 minute player = await database.get_player(player_id) await redis_client.setex( f"player:{player_id}", 60, json.dumps(player) ) return player ``` --- ## Implementation Priority 1. **🔴 IMMEDIATE:** Fix `regenerate_stamina()` with single-query approach 2. **🟡 HIGH:** Add batching to `check_combat_timers()` 3. **🟢 MEDIUM:** Add database indexes 4. **🟢 MEDIUM:** Add performance monitoring 5. **🔵 LOW:** Consider Redis caching (only if needed) --- ## Conclusion **Current state at 10,000 players:** - ❌ `regenerate_stamina()`: **WILL BREAK** (50+ seconds per cycle, 10K queries) - ⚠️ `check_combat_timers()`: **WILL STRUGGLE** (500-1000 queries per cycle) - ✅ `decay_dropped_items()`: **WORKS PERFECTLY** (1 query, optimal design) **After optimization:** - ✅ All tasks complete in **<5 seconds** total - ✅ Scales to **100,000+ players** - ✅ Minimal database load - ✅ No memory bloat **Bottom line:** The single-query approach for `regenerate_stamina()` is **CRITICAL** for any production deployment beyond 1000 players.