Commit

2025-11-27 16:27:01 +01:00
parent 33cc9586c2
commit 81f8912059
304 changed files with 56149 additions and 10122 deletions
--- a/old/REDIS_INTEGRATION_QA.md
+++ b/old/REDIS_INTEGRATION_QA.md
@@ -0,0 +1,765 @@
+# Redis Integration: Questions & Answers
+
+## Q1: Why cache locations/items if they're already in memory?
+
+**Short Answer**: You're absolutely right - we should **NOT** cache static data that's already loaded in memory!
+
+**Revised Approach**:
+
+### What to Cache in Redis:
+1. ✅ **Player sessions** (dynamic, needs cross-worker sharing)
+2. ✅ **Location player registry** (who's where, changes constantly)
+3. ✅ **Player inventory** (reduce DB queries for frequently accessed data)
+4. ✅ **Active combat states** (for cross-worker coordination)
+5. ✅ **Dropped items per location** (dynamic world state)
+
+### What NOT to Cache:
+1. ❌ **Locations** - Already in `LOCATIONS` dict from `world_loader.py`
+2. ❌ **Items** - Already in `ITEMS_MANAGER.items` from `items.py`
+3. ❌ **NPCs** - Already in `NPCS` dict from `npcs.py`
+4. ❌ **Interactables** - Already in each `Location.interactables` list
+
+**Why This Matters**:
+- Each worker loads `load_world()` on startup → all static data in memory
+- No point duplicating in Redis (wastes memory, adds latency)
+- Redis should only store **dynamic, cross-worker state**
+
+---
+
+## Q2: How do unique items work?
+
+**Database Structure**:
+
+```python
+# unique_items table (single source of truth)
+unique_items = Table(
+    "unique_items",
+    Column("id", Integer, primary_key=True),
+    Column("item_id", String),  # Template reference (e.g., "iron_sword")
+    Column("durability", Integer),
+    Column("max_durability", Integer),
+    Column("tier", Integer, default=1),
+    Column("unique_stats", JSON),  # Custom stats
+    Column("created_at", Float)
+)
+
+# inventory table (references unique_items)
+inventory = Table(
+    "inventory",
+    Column("id", Integer, primary_key=True),
+    Column("character_id", Integer),
+    Column("item_id", String),  # Template ID
+    Column("quantity", Integer),  # Always 1 for unique items
+    Column("unique_item_id", Integer, ForeignKey("unique_items.id")),  # Link
+    Column("is_equipped", Boolean)
+)
+```
+
+**Flow**:
+1. **Creation**: NPC drops weapon → `create_unique_item()` → insert into `unique_items`
+2. **Pickup**: Player picks up → insert into `inventory` with `unique_item_id` reference
+3. **Equip**: Player equips → queries join `inventory ⋈ unique_items` to get stats
+4. **Drop**: Player drops → move to `dropped_items` (keeping `unique_item_id` link)
+5. **Deletion**: Item despawns → CASCADE delete removes from `inventory`/`dropped_items`
+
+**Redis Caching Strategy**:
+```python
+# Cache unique item data when equipped/viewed
+key = f"unique_item:{unique_item_id}"
+value = {
+    "item_id": "iron_sword",
+    "durability": 85,
+    "max_durability": 100,
+    "tier": 2,
+    "unique_stats": {"damage_bonus": 5}
+}
+# TTL: 5 minutes (invalidate on durability change)
+```
+
+---
+
+## Q3: How do enemies work with custom stats?
+
+**Combat Initialization**:
+
+When combat starts, NPC gets **randomized HP**:
+
+```python
+# NPCDefinition in npcs.py
+@dataclass
+class NPCDefinition:
+    hp_min: int  # e.g., 80
+    hp_max: int  # e.g., 120
+    damage_min: int
+    damage_max: int
+    defense: int
+    # ... other stats
+
+# When combat starts (in game_logic.py or main.py)
+import random
+npc_def = NPCS.get("raider")  # Load from memory
+npc_hp = random.randint(npc_def.hp_min, npc_def.hp_max)  # Random HP
+
+# Store in database
+await db.create_combat(
+    player_id=player_id,
+    npc_id="raider",
+    npc_hp=npc_hp,  # Randomized
+    npc_max_hp=npc_hp,
+    location_id=location_id
+)
+```
+
+**Redis Caching for Active Combat**:
+
+```python
+# Cache active combat state (avoid repeated DB queries)
+key = f"player:{character_id}:combat"
+value = {
+    "npc_id": "raider",
+    "npc_hp": 95,
+    "npc_max_hp": 115,
+    "turn": "player",
+    "npc_damage_min": 8,
+    "npc_damage_max": 15,
+    "npc_defense": 3
+}
+# TTL: No expiration (deleted when combat ends)
+```
+
+**Combat Flow**:
+1. Player attacks → Check Redis cache for combat state
+2. If miss → Query DB → Cache in Redis
+3. Calculate damage, update NPC HP
+4. Update Redis cache + Publish `combat_update` to player channel
+5. NPC turn → Repeat
+6. Combat ends → Delete Redis cache + Publish `combat_over`
+
+---
+
+## Q4: How is everything loaded on server startup?
+
+**Current Flow** (per worker):
+
+```python
+# api/main.py - Lifespan startup
+@asynccontextmanager
+async def lifespan(app: FastAPI):
+    # 1. Database
+    await db.init_db()  # Connect to PostgreSQL
+    
+    # 2. Load static data into memory (THIS PART)
+    WORLD: World = load_world()  # Load locations from gamedata/locations.json
+    LOCATIONS: Dict[str, Location] = WORLD.locations
+    ITEMS_MANAGER = ItemsManager()  # Load items from gamedata/items.json
+    # NPCs loaded in data/npcs.py module (imported on demand)
+    
+    # 3. Start background tasks (single worker via file lock)
+    tasks = await background_tasks.start_background_tasks(manager, LOCATIONS)
+    
+    yield
+```
+
+**With Redis Integration**:
+
+```python
+@asynccontextmanager
+async def lifespan(app: FastAPI):
+    # 1. Database
+    await db.init_db()
+    
+    # 2. Redis connection
+    await redis_manager.connect()
+    
+    # 3. Load static data (STAYS IN MEMORY - NO REDIS CACHING)
+    WORLD: World = load_world()
+    LOCATIONS: Dict[str, Location] = WORLD.locations
+    ITEMS_MANAGER = ItemsManager()
+    
+    # 4. Subscribe to Redis Pub/Sub channels
+    location_channels = [f"location:{loc_id}" for loc_id in LOCATIONS.keys()]
+    await redis_manager.subscribe_to_channels(location_channels + ['game:broadcast'])
+    
+    # 5. Start Redis message listener (background task)
+    asyncio.create_task(redis_manager.listen_for_messages(manager.handle_redis_message))
+    
+    # 6. Register this worker in Redis
+    await redis_manager.redis_client.sadd('active_workers', redis_manager.worker_id)
+    
+    # 7. Start background tasks (distributed via Redis locks)
+    tasks = await background_tasks.start_background_tasks(manager, LOCATIONS)
+    
+    yield
+    
+    # Cleanup
+    await redis_manager.redis_client.srem('active_workers', redis_manager.worker_id)
+    await redis_manager.disconnect()
+```
+
+---
+
+## Q5: How many channels can exist?
+
+**Redis Pub/Sub Channels**:
+
+### Fixed Channels (Always Active):
+1. `game:broadcast` - Global announcements (1 channel)
+2. `game:workers` - Worker coordination (1 channel)
+
+### Dynamic Channels (Created on Demand):
+
+**Location Channels** (14 currently):
+- `location:start_point`
+- `location:overpass`
+- `location:gas_station`
+- ... (one per location in `locations.json`)
+
+**Player Channels** (one per connected player):
+- `player:1` (character_id=1)
+- `player:2`
+- `player:5`
+- ... (created on WebSocket connect, destroyed on disconnect)
+
+**Total Active Channels**:
+- **Minimum**: 16 (2 fixed + 14 locations)
+- **With 100 players**: 116 (2 + 14 + 100)
+- **With 1000 players**: 1016 (2 + 14 + 1000)
+
+**Redis Limits**:
+- Redis supports **millions** of channels simultaneously
+- Each channel has minimal memory overhead (~100 bytes)
+- 1000 channels = ~100 KB memory (negligible)
+
+**Subscription Strategy**:
+- All workers subscribe to: `game:broadcast` + all location channels
+- Each worker subscribes to: only its connected players' channels
+- When player connects → Worker subscribes to `player:{id}`
+- When player disconnects → Worker unsubscribes from `player:{id}`
+
+---
+
+## Q6: How does client update data in the UI?
+
+**Current Flow** (without Redis):
+
+```
+1. User clicks "Attack" button
+   ↓
+2. Client: POST /api/game/combat/action {"action": "attack"}
+   ↓
+3. Server: Process attack, update DB
+   ↓
+4. Server: Send WebSocket message to player
+   ↓
+5. Server: Query DB for other players in location
+   ↓
+6. Server: Send WebSocket messages to location
+   ↓
+7. Client: Receives WebSocket "combat_update"
+   ↓
+8. Client: Updates UI (HP bar, combat log)
+   ↓
+9. Client: GET /api/game/state (refresh full state)
+   ↓
+10. Server: Query DB for player, inventory, combat, etc.
+    ↓
+11. Client: Re-render entire game UI
+```
+
+**With Redis** (optimized):
+
+```
+1. User clicks "Attack" button
+   ↓
+2. Client: POST /api/game/combat/action {"action": "attack"}
+   ↓
+3. Server: Process attack, update DB + Redis cache
+   ↓
+4. Server: Publish to Redis channel "player:{id}" (personal message)
+   ↓
+5. Worker handling that player: Receives Redis message
+   ↓
+6. Worker: Send WebSocket to local connection
+   ↓
+7. Client: Receives WebSocket "combat_update" with ALL needed data
+   ↓
+8. Client: Updates UI directly from WebSocket payload (NO API CALL)
+   ↓
+9. Server: Publish to Redis channel "location:{id}" (broadcast)
+   ↓
+10. All workers: Receive location broadcast
+    ↓
+11. Workers: Send WebSocket to their local connections in that location
+    ↓
+12. Other players: UI updates with "Jocaru is in combat"
+```
+
+**Key Changes**:
+- ✅ **No more `GET /api/game/state` after actions** - WebSocket payload contains everything
+- ✅ **Cross-worker broadcasts** - Redis pub/sub ensures all workers relay messages
+- ✅ **Reduced DB queries** - Combat state cached in Redis
+- ✅ **Faster UI updates** - WebSocket messages < 2ms via Redis
+
+**WebSocket Message Format** (enhanced):
+
+```json
+{
+  "type": "combat_update",
+  "data": {
+    "message": "You dealt 12 damage!",
+    "log_entry": "You dealt 12 damage!",
+    "combat_over": false,
+    "combat": {
+      "npc_id": "raider",
+      "npc_hp": 85,
+      "npc_max_hp": 115,
+      "turn": "npc"
+    },
+    "player": {
+      "hp": 78,
+      "stamina": 42,
+      "xp": 1250,
+      "level": 5
+    }
+  },
+  "timestamp": "2025-11-09T18:00:00Z"
+}
+```
+
+Client receives this → Updates HP bar, combat log, turn indicator **WITHOUT** calling `/api/game/state`.
+
+---
+
+## Q7: Disconnected players staying in location?
+
+**Excellent Gameplay Mechanic!** This adds risk/consequence to disconnecting in dangerous areas.
+
+### Implementation:
+
+**When Player Disconnects**:
+
+```python
+# ConnectionManager.disconnect()
+async def disconnect(self, player_id: int):
+    # 1. Remove local WebSocket connection
+    if player_id in self.active_connections:
+        del self.active_connections[player_id]
+    
+    # 2. Update Redis session (mark as disconnected)
+    session = await redis_manager.get_player_session(player_id)
+    if session:
+        session['websocket_connected'] = 'false'
+        session['disconnect_time'] = str(time.time())
+        await redis_manager.set_player_session(player_id, session, ttl=3600)  # Keep for 1 hour
+    
+    # 3. KEEP player in location registry (don't remove)
+    # await redis_manager.remove_player_from_location(...)  # DON'T DO THIS
+    
+    # 4. Broadcast to location
+    await redis_manager.publish_to_location(
+        session['location_id'],
+        {
+            "type": "player_status_change",
+            "data": {
+                "player_id": player_id,
+                "username": session['username'],
+                "status": "disconnected",
+                "message": f"{session['username']} has disconnected (vulnerable)"
+            }
+        }
+    )
+```
+
+**When Other Players Query Location**:
+
+```python
+# GET /api/game/location endpoint
+@app.get("/api/game/location")
+async def get_current_location(current_user: dict = Depends(get_current_user)):
+    # Get players in location from Redis
+    player_ids = await redis_manager.get_players_in_location(location_id)
+    
+    other_players = []
+    for pid in player_ids:
+        if pid == current_user['id']:
+            continue
+        
+        # Get player session
+        session = await redis_manager.get_player_session(pid)
+        if session:
+            other_players.append({
+                "id": pid,
+                "username": session['username'],
+                "level": int(session['level']),
+                "hp": int(session['hp']),
+                "is_connected": session['websocket_connected'] == 'true',
+                "can_attack": True  # Always true, even if disconnected!
+            })
+    
+    return {
+        "id": location_id,
+        "other_players": other_players  # Includes disconnected players
+    }
+```
+
+**Combat with Disconnected Player**:
+
+```python
+# POST /api/game/pvp/initiate
+@app.post("/api/game/pvp/initiate")
+async def initiate_pvp(target_id: int, current_user: dict = Depends(get_current_user)):
+    # Check target session
+    target_session = await redis_manager.get_player_session(target_id)
+    
+    if not target_session:
+        raise HTTPException(400, detail="Target player not found")
+    
+    # Allow combat even if disconnected
+    is_connected = target_session['websocket_connected'] == 'true'
+    
+    # Create PvP combat
+    pvp_combat = await db.create_pvp_combat(
+        attacker_id=current_user['id'],
+        defender_id=target_id,
+        location_id=current_user['location_id']
+    )
+    
+    if is_connected:
+        # Target is online → Send WebSocket notification
+        await redis_manager.publish_to_player(target_id, {
+            "type": "pvp_challenge",
+            "data": {
+                "attacker": current_user['name'],
+                "attacker_level": current_user['level']
+            }
+        })
+    else:
+        # Target is offline → Auto-acknowledge, they can't respond
+        await db.acknowledge_pvp_combat(pvp_combat['id'], target_id)
+        
+        # Attacker gets free first strike advantage
+        return {
+            "message": f"{target_session['username']} is disconnected - you get first strike!",
+            "pvp_combat": pvp_combat,
+            "target_vulnerable": True
+        }
+```
+
+**Cleanup Policy** (optional):
+
+```python
+# Background task: Remove disconnected players after 1 hour
+async def cleanup_disconnected_players():
+    while True:
+        await asyncio.sleep(300)  # Every 5 minutes
+        
+        # Get all player sessions
+        keys = await redis_manager.redis_client.keys("player:*:session")
+        
+        for key in keys:
+            session = await redis_manager.redis_client.hgetall(key)
+            
+            if session['websocket_connected'] == 'false':
+                disconnect_time = float(session['disconnect_time'])
+                
+                # If disconnected for > 1 hour
+                if time.time() - disconnect_time > 3600:
+                    character_id = int(key.split(':')[1])
+                    location_id = session['location_id']
+                    
+                    # Remove from location registry
+                    await redis_manager.remove_player_from_location(character_id, location_id)
+                    
+                    # Delete session
+                    await redis_manager.delete_player_session(character_id)
+                    
+                    print(f"🧹 Cleaned up disconnected player {character_id}")
+```
+
+**UI Display**:
+
+```tsx
+// Frontend: Show disconnected status
+{otherPlayers.map(player => (
+  <div className={`player-card ${!player.is_connected ? 'disconnected' : ''}`}>
+    <span className="player-name">{player.username}</span>
+    <span className="player-level">Lv. {player.level}</span>
+    {!player.is_connected && (
+      <span className="player-status">⚠️ Disconnected (Vulnerable)</span>
+    )}
+    {player.can_attack && (
+      <button onClick={() => attackPlayer(player.id)}>
+        Attack {!player.is_connected ? '(Easy Target)' : ''}
+      </button>
+    )}
+  </div>
+))}
+```
+
+---
+
+## Q8: RDB vs AOF - Code changes needed?
+
+**Short Answer**: No code changes required, only Redis configuration.
+
+### Redis Persistence Options:
+
+**RDB (Snapshotting)**:
+- Periodic snapshots to disk
+- Fast restarts, smaller files
+- May lose last few seconds of data
+
+**AOF (Append-Only File)**:
+- Logs every write operation
+- More durable, no data loss
+- Slower restarts, larger files
+
+**Recommended Configuration** (for your use case):
+
+```bash
+# docker-compose.yml
+echoes_redis:
+  command: |
+    redis-server 
+    --appendonly yes               # Enable AOF
+    --appendfsync everysec         # Sync every second (good balance)
+    --save 900 1                   # RDB backup every 15 min if 1+ key changed
+    --save 300 10                  # RDB backup every 5 min if 10+ keys changed
+    --save 60 10000                # RDB backup every 1 min if 10k+ keys changed
+    --maxmemory 512mb              # Max memory usage
+    --maxmemory-policy allkeys-lru # Evict least recently used keys
+```
+
+**What This Gives You**:
+- ✅ **AOF for durability**: Every write logged (max 1 second data loss)
+- ✅ **RDB for fast recovery**: Snapshots for quick restarts
+- ✅ **Memory protection**: Won't crash if memory full (evicts old caches)
+
+**Application Code**: No changes needed! Redis handles persistence transparently.
+
+**Testing Persistence**:
+
+```bash
+# 1. Add some data
+docker exec echoes_of_the_ashes_redis redis-cli SET test:key "hello"
+
+# 2. Restart Redis
+docker restart echoes_of_the_ashes_redis
+
+# 3. Check if data persisted
+docker exec echoes_of_the_ashes_redis redis-cli GET test:key
+# Should return: "hello"
+```
+
+---
+
+## Q9: What if cache invalidation isn't aggressive enough?
+
+**Potential Problems**:
+
+### 1. Stale Player Stats
+**Scenario**: Player levels up, but Redis cache shows old level
+```
+1. Player gains XP → DB updated (level 6)
+2. Redis cache still shows level 5
+3. Other players see "Lv. 5" instead of "Lv. 6"
+```
+
+**Solution**: Invalidate on every stat change
+```python
+async def update_character_stats(character_id: int, **kwargs):
+    # Update DB
+    await db.update_character(character_id, **kwargs)
+    
+    # Invalidate Redis cache
+    await redis_manager.delete_player_session(character_id)
+    
+    # Or update cache directly
+    session = await redis_manager.get_player_session(character_id)
+    if session:
+        session.update(kwargs)
+        await redis_manager.set_player_session(character_id, session)
+```
+
+### 2. Ghost Items in Inventory
+**Scenario**: Player drops item, but cache shows they still have it
+```
+1. Player drops "Iron Sword"
+2. DB updated (inventory row deleted)
+3. Redis cache still shows sword in inventory
+4. Player sees sword in UI, tries to equip → Error!
+```
+
+**Solution**: Invalidate inventory cache on add/remove/use
+```python
+async def remove_item_from_inventory(character_id: int, item_id: str):
+    # Update DB
+    await db.delete_inventory_item(character_id, item_id)
+    
+    # Invalidate cache (force reload next time)
+    await redis_manager.invalidate_inventory(character_id)
+```
+
+### 3. Wrong Player Count in Location
+**Scenario**: Player moves, but old location still shows them
+```
+1. Player moves overpass → gas_station
+2. Redis location registry not updated
+3. Other players in overpass still see them
+4. Broadcasts sent to wrong location
+```
+
+**Solution**: Atomic location updates
+```python
+async def move_player(character_id: int, from_loc: str, to_loc: str):
+    # Use Redis transaction (atomic)
+    async with redis_manager.redis_client.pipeline() as pipe:
+        pipe.srem(f"location:{from_loc}:players", character_id)
+        pipe.sadd(f"location:{to_loc}:players", character_id)
+        await pipe.execute()
+```
+
+### 4. Combat State Desync
+**Scenario**: Combat ends, but cache shows still in combat
+```
+1. Player defeats enemy
+2. DB: active_combats row deleted
+3. Redis: combat cache still exists
+4. Player sees combat UI, can't move
+```
+
+**Solution**: Explicit cache deletion on combat end
+```python
+async def end_combat(character_id: int):
+    # Delete from DB
+    await db.end_combat(character_id)
+    
+    # Delete Redis cache
+    await redis_manager.redis_client.delete(f"player:{character_id}:combat")
+    
+    # Update player session
+    session = await redis_manager.get_player_session(character_id)
+    if session:
+        session['in_combat'] = 'false'
+        await redis_manager.set_player_session(character_id, session)
+```
+
+**General Strategy**:
+
+```python
+# PATTERN 1: Write-Through Cache (recommended for critical data)
+async def update_data(key, value):
+    await db.update(key, value)  # Write to DB first
+    await redis_manager.cache(key, value)  # Update cache immediately
+    
+# PATTERN 2: Cache Invalidation (simpler, slight delay)
+async def update_data(key, value):
+    await db.update(key, value)  # Write to DB
+    await redis_manager.delete_cache(key)  # Delete cache (reload on next access)
+
+# PATTERN 3: TTL Fallback (for non-critical data)
+# Set short TTLs (e.g., 30 seconds) so cache self-expires if not invalidated
+await redis_manager.cache(key, value, ttl=30)
+```
+
+**For Your Game**:
+- ✅ **Aggressive invalidation** for: inventory, combat state, player stats
+- ✅ **Write-through cache** for: player sessions, location registry
+- ✅ **TTL fallback** for: dropped items list, interactable cooldowns
+
+---
+
+## Q10: No feature flags needed (dev only)
+
+**Agreed!** Since you're the only tester, we can implement directly without feature flags.
+
+### Simplified Rollout:
+
+**Phase 1: Redis Infrastructure (Week 1)**
+- Add Redis to docker-compose
+- Create redis_manager.py
+- Test connection/pub-sub
+
+**Phase 2: Pub/Sub Only (Week 2)**
+- Update ConnectionManager to use Redis pub/sub
+- Keep all other logic same (no caching yet)
+- Test cross-worker broadcasts
+
+**Phase 3: Add Caching (Week 3)**
+- Add player session cache
+- Add inventory cache
+- Add combat state cache
+- Test performance improvements
+
+**Phase 4: Multi-Worker (Week 4)**
+- Increase workers to 2
+- Test load balancing
+- Monitor for race conditions
+
+**Simplified Implementation** (no toggles):
+
+```python
+# Just implement Redis directly
+async def lifespan(app: FastAPI):
+    await db.init_db()
+    await redis_manager.connect()  # No if/else, just do it
+    # ... rest of startup
+```
+
+---
+
+## Updated Implementation Priority
+
+Based on your feedback, here's what we'll actually implement:
+
+### Phase 1: Redis Pub/Sub (Core Multi-Worker Support)
+**Goal**: Enable cross-worker broadcasts
+
+**Changes**:
+1. Add Redis container
+2. Create `redis_manager.py` with pub/sub only
+3. Update ConnectionManager:
+   - Keep local WebSocket storage
+   - Change `send_personal_message()` → publish to Redis
+   - Change `send_to_location()` → publish to Redis
+   - Add `handle_redis_message()` → send to local WebSockets
+4. Subscribe to location channels on startup
+
+**What We DON'T Cache**:
+- ❌ Locations (already in memory)
+- ❌ Items (already in memory)
+- ❌ NPCs (already in memory)
+
+### Phase 2: Dynamic State Caching (Performance)
+**Goal**: Reduce database queries for frequently accessed data
+
+**What We DO Cache**:
+1. ✅ Player sessions (location, HP, level, stats)
+2. ✅ Location player registry (Set of character IDs per location)
+3. ✅ Player inventory (with aggressive invalidation)
+4. ✅ Active combat state (with explicit deletion)
+5. ✅ Dropped items per location (with TTL)
+
+### Phase 3: Multi-Worker Deployment
+**Goal**: Horizontal scaling
+
+**Changes**:
+1. Update docker-compose for 4 workers
+2. Test load distribution
+3. Implement distributed background task locks
+4. Monitor performance
+
+---
+
+## Next Steps
+
+Ready to implement? Here's what I'll do:
+
+1. **Create `redis_manager.py`** - Simplified version (no static data caching)
+2. **Update `docker-compose.yml`** - Add Redis container
+3. **Update `ConnectionManager`** - Integrate pub/sub
+4. **Update endpoints** - Add cache invalidation where needed
+5. **Implement disconnected player** - Keep in location, mark as vulnerable
+6. **Test suite** - Verify cross-worker communication
+
+Do you want me to proceed with implementation?