This commit is contained in:
Joan
2025-11-27 16:27:01 +01:00
parent 33cc9586c2
commit 81f8912059
304 changed files with 56149 additions and 10122 deletions

765
old/REDIS_INTEGRATION_QA.md Normal file
View File

@@ -0,0 +1,765 @@
# Redis Integration: Questions & Answers
## Q1: Why cache locations/items if they're already in memory?
**Short Answer**: You're absolutely right - we should **NOT** cache static data that's already loaded in memory!
**Revised Approach**:
### What to Cache in Redis:
1.**Player sessions** (dynamic, needs cross-worker sharing)
2.**Location player registry** (who's where, changes constantly)
3.**Player inventory** (reduce DB queries for frequently accessed data)
4.**Active combat states** (for cross-worker coordination)
5.**Dropped items per location** (dynamic world state)
### What NOT to Cache:
1.**Locations** - Already in `LOCATIONS` dict from `world_loader.py`
2.**Items** - Already in `ITEMS_MANAGER.items` from `items.py`
3.**NPCs** - Already in `NPCS` dict from `npcs.py`
4.**Interactables** - Already in each `Location.interactables` list
**Why This Matters**:
- Each worker loads `load_world()` on startup → all static data in memory
- No point duplicating in Redis (wastes memory, adds latency)
- Redis should only store **dynamic, cross-worker state**
---
## Q2: How do unique items work?
**Database Structure**:
```python
# unique_items table (single source of truth)
unique_items = Table(
"unique_items",
Column("id", Integer, primary_key=True),
Column("item_id", String), # Template reference (e.g., "iron_sword")
Column("durability", Integer),
Column("max_durability", Integer),
Column("tier", Integer, default=1),
Column("unique_stats", JSON), # Custom stats
Column("created_at", Float)
)
# inventory table (references unique_items)
inventory = Table(
"inventory",
Column("id", Integer, primary_key=True),
Column("character_id", Integer),
Column("item_id", String), # Template ID
Column("quantity", Integer), # Always 1 for unique items
Column("unique_item_id", Integer, ForeignKey("unique_items.id")), # Link
Column("is_equipped", Boolean)
)
```
**Flow**:
1. **Creation**: NPC drops weapon → `create_unique_item()` → insert into `unique_items`
2. **Pickup**: Player picks up → insert into `inventory` with `unique_item_id` reference
3. **Equip**: Player equips → queries join `inventory ⋈ unique_items` to get stats
4. **Drop**: Player drops → move to `dropped_items` (keeping `unique_item_id` link)
5. **Deletion**: Item despawns → CASCADE delete removes from `inventory`/`dropped_items`
**Redis Caching Strategy**:
```python
# Cache unique item data when equipped/viewed
key = f"unique_item:{unique_item_id}"
value = {
"item_id": "iron_sword",
"durability": 85,
"max_durability": 100,
"tier": 2,
"unique_stats": {"damage_bonus": 5}
}
# TTL: 5 minutes (invalidate on durability change)
```
---
## Q3: How do enemies work with custom stats?
**Combat Initialization**:
When combat starts, NPC gets **randomized HP**:
```python
# NPCDefinition in npcs.py
@dataclass
class NPCDefinition:
hp_min: int # e.g., 80
hp_max: int # e.g., 120
damage_min: int
damage_max: int
defense: int
# ... other stats
# When combat starts (in game_logic.py or main.py)
import random
npc_def = NPCS.get("raider") # Load from memory
npc_hp = random.randint(npc_def.hp_min, npc_def.hp_max) # Random HP
# Store in database
await db.create_combat(
player_id=player_id,
npc_id="raider",
npc_hp=npc_hp, # Randomized
npc_max_hp=npc_hp,
location_id=location_id
)
```
**Redis Caching for Active Combat**:
```python
# Cache active combat state (avoid repeated DB queries)
key = f"player:{character_id}:combat"
value = {
"npc_id": "raider",
"npc_hp": 95,
"npc_max_hp": 115,
"turn": "player",
"npc_damage_min": 8,
"npc_damage_max": 15,
"npc_defense": 3
}
# TTL: No expiration (deleted when combat ends)
```
**Combat Flow**:
1. Player attacks → Check Redis cache for combat state
2. If miss → Query DB → Cache in Redis
3. Calculate damage, update NPC HP
4. Update Redis cache + Publish `combat_update` to player channel
5. NPC turn → Repeat
6. Combat ends → Delete Redis cache + Publish `combat_over`
---
## Q4: How is everything loaded on server startup?
**Current Flow** (per worker):
```python
# api/main.py - Lifespan startup
@asynccontextmanager
async def lifespan(app: FastAPI):
# 1. Database
await db.init_db() # Connect to PostgreSQL
# 2. Load static data into memory (THIS PART)
WORLD: World = load_world() # Load locations from gamedata/locations.json
LOCATIONS: Dict[str, Location] = WORLD.locations
ITEMS_MANAGER = ItemsManager() # Load items from gamedata/items.json
# NPCs loaded in data/npcs.py module (imported on demand)
# 3. Start background tasks (single worker via file lock)
tasks = await background_tasks.start_background_tasks(manager, LOCATIONS)
yield
```
**With Redis Integration**:
```python
@asynccontextmanager
async def lifespan(app: FastAPI):
# 1. Database
await db.init_db()
# 2. Redis connection
await redis_manager.connect()
# 3. Load static data (STAYS IN MEMORY - NO REDIS CACHING)
WORLD: World = load_world()
LOCATIONS: Dict[str, Location] = WORLD.locations
ITEMS_MANAGER = ItemsManager()
# 4. Subscribe to Redis Pub/Sub channels
location_channels = [f"location:{loc_id}" for loc_id in LOCATIONS.keys()]
await redis_manager.subscribe_to_channels(location_channels + ['game:broadcast'])
# 5. Start Redis message listener (background task)
asyncio.create_task(redis_manager.listen_for_messages(manager.handle_redis_message))
# 6. Register this worker in Redis
await redis_manager.redis_client.sadd('active_workers', redis_manager.worker_id)
# 7. Start background tasks (distributed via Redis locks)
tasks = await background_tasks.start_background_tasks(manager, LOCATIONS)
yield
# Cleanup
await redis_manager.redis_client.srem('active_workers', redis_manager.worker_id)
await redis_manager.disconnect()
```
---
## Q5: How many channels can exist?
**Redis Pub/Sub Channels**:
### Fixed Channels (Always Active):
1. `game:broadcast` - Global announcements (1 channel)
2. `game:workers` - Worker coordination (1 channel)
### Dynamic Channels (Created on Demand):
**Location Channels** (14 currently):
- `location:start_point`
- `location:overpass`
- `location:gas_station`
- ... (one per location in `locations.json`)
**Player Channels** (one per connected player):
- `player:1` (character_id=1)
- `player:2`
- `player:5`
- ... (created on WebSocket connect, destroyed on disconnect)
**Total Active Channels**:
- **Minimum**: 16 (2 fixed + 14 locations)
- **With 100 players**: 116 (2 + 14 + 100)
- **With 1000 players**: 1016 (2 + 14 + 1000)
**Redis Limits**:
- Redis supports **millions** of channels simultaneously
- Each channel has minimal memory overhead (~100 bytes)
- 1000 channels = ~100 KB memory (negligible)
**Subscription Strategy**:
- All workers subscribe to: `game:broadcast` + all location channels
- Each worker subscribes to: only its connected players' channels
- When player connects → Worker subscribes to `player:{id}`
- When player disconnects → Worker unsubscribes from `player:{id}`
---
## Q6: How does client update data in the UI?
**Current Flow** (without Redis):
```
1. User clicks "Attack" button
2. Client: POST /api/game/combat/action {"action": "attack"}
3. Server: Process attack, update DB
4. Server: Send WebSocket message to player
5. Server: Query DB for other players in location
6. Server: Send WebSocket messages to location
7. Client: Receives WebSocket "combat_update"
8. Client: Updates UI (HP bar, combat log)
9. Client: GET /api/game/state (refresh full state)
10. Server: Query DB for player, inventory, combat, etc.
11. Client: Re-render entire game UI
```
**With Redis** (optimized):
```
1. User clicks "Attack" button
2. Client: POST /api/game/combat/action {"action": "attack"}
3. Server: Process attack, update DB + Redis cache
4. Server: Publish to Redis channel "player:{id}" (personal message)
5. Worker handling that player: Receives Redis message
6. Worker: Send WebSocket to local connection
7. Client: Receives WebSocket "combat_update" with ALL needed data
8. Client: Updates UI directly from WebSocket payload (NO API CALL)
9. Server: Publish to Redis channel "location:{id}" (broadcast)
10. All workers: Receive location broadcast
11. Workers: Send WebSocket to their local connections in that location
12. Other players: UI updates with "Jocaru is in combat"
```
**Key Changes**:
-**No more `GET /api/game/state` after actions** - WebSocket payload contains everything
-**Cross-worker broadcasts** - Redis pub/sub ensures all workers relay messages
-**Reduced DB queries** - Combat state cached in Redis
-**Faster UI updates** - WebSocket messages < 2ms via Redis
**WebSocket Message Format** (enhanced):
```json
{
"type": "combat_update",
"data": {
"message": "You dealt 12 damage!",
"log_entry": "You dealt 12 damage!",
"combat_over": false,
"combat": {
"npc_id": "raider",
"npc_hp": 85,
"npc_max_hp": 115,
"turn": "npc"
},
"player": {
"hp": 78,
"stamina": 42,
"xp": 1250,
"level": 5
}
},
"timestamp": "2025-11-09T18:00:00Z"
}
```
Client receives this → Updates HP bar, combat log, turn indicator **WITHOUT** calling `/api/game/state`.
---
## Q7: Disconnected players staying in location?
**Excellent Gameplay Mechanic!** This adds risk/consequence to disconnecting in dangerous areas.
### Implementation:
**When Player Disconnects**:
```python
# ConnectionManager.disconnect()
async def disconnect(self, player_id: int):
# 1. Remove local WebSocket connection
if player_id in self.active_connections:
del self.active_connections[player_id]
# 2. Update Redis session (mark as disconnected)
session = await redis_manager.get_player_session(player_id)
if session:
session['websocket_connected'] = 'false'
session['disconnect_time'] = str(time.time())
await redis_manager.set_player_session(player_id, session, ttl=3600) # Keep for 1 hour
# 3. KEEP player in location registry (don't remove)
# await redis_manager.remove_player_from_location(...) # DON'T DO THIS
# 4. Broadcast to location
await redis_manager.publish_to_location(
session['location_id'],
{
"type": "player_status_change",
"data": {
"player_id": player_id,
"username": session['username'],
"status": "disconnected",
"message": f"{session['username']} has disconnected (vulnerable)"
}
}
)
```
**When Other Players Query Location**:
```python
# GET /api/game/location endpoint
@app.get("/api/game/location")
async def get_current_location(current_user: dict = Depends(get_current_user)):
# Get players in location from Redis
player_ids = await redis_manager.get_players_in_location(location_id)
other_players = []
for pid in player_ids:
if pid == current_user['id']:
continue
# Get player session
session = await redis_manager.get_player_session(pid)
if session:
other_players.append({
"id": pid,
"username": session['username'],
"level": int(session['level']),
"hp": int(session['hp']),
"is_connected": session['websocket_connected'] == 'true',
"can_attack": True # Always true, even if disconnected!
})
return {
"id": location_id,
"other_players": other_players # Includes disconnected players
}
```
**Combat with Disconnected Player**:
```python
# POST /api/game/pvp/initiate
@app.post("/api/game/pvp/initiate")
async def initiate_pvp(target_id: int, current_user: dict = Depends(get_current_user)):
# Check target session
target_session = await redis_manager.get_player_session(target_id)
if not target_session:
raise HTTPException(400, detail="Target player not found")
# Allow combat even if disconnected
is_connected = target_session['websocket_connected'] == 'true'
# Create PvP combat
pvp_combat = await db.create_pvp_combat(
attacker_id=current_user['id'],
defender_id=target_id,
location_id=current_user['location_id']
)
if is_connected:
# Target is online → Send WebSocket notification
await redis_manager.publish_to_player(target_id, {
"type": "pvp_challenge",
"data": {
"attacker": current_user['name'],
"attacker_level": current_user['level']
}
})
else:
# Target is offline → Auto-acknowledge, they can't respond
await db.acknowledge_pvp_combat(pvp_combat['id'], target_id)
# Attacker gets free first strike advantage
return {
"message": f"{target_session['username']} is disconnected - you get first strike!",
"pvp_combat": pvp_combat,
"target_vulnerable": True
}
```
**Cleanup Policy** (optional):
```python
# Background task: Remove disconnected players after 1 hour
async def cleanup_disconnected_players():
while True:
await asyncio.sleep(300) # Every 5 minutes
# Get all player sessions
keys = await redis_manager.redis_client.keys("player:*:session")
for key in keys:
session = await redis_manager.redis_client.hgetall(key)
if session['websocket_connected'] == 'false':
disconnect_time = float(session['disconnect_time'])
# If disconnected for > 1 hour
if time.time() - disconnect_time > 3600:
character_id = int(key.split(':')[1])
location_id = session['location_id']
# Remove from location registry
await redis_manager.remove_player_from_location(character_id, location_id)
# Delete session
await redis_manager.delete_player_session(character_id)
print(f"🧹 Cleaned up disconnected player {character_id}")
```
**UI Display**:
```tsx
// Frontend: Show disconnected status
{otherPlayers.map(player => (
<div className={`player-card ${!player.is_connected ? 'disconnected' : ''}`}>
<span className="player-name">{player.username}</span>
<span className="player-level">Lv. {player.level}</span>
{!player.is_connected && (
<span className="player-status"> Disconnected (Vulnerable)</span>
)}
{player.can_attack && (
<button onClick={() => attackPlayer(player.id)}>
Attack {!player.is_connected ? '(Easy Target)' : ''}
</button>
)}
</div>
))}
```
---
## Q8: RDB vs AOF - Code changes needed?
**Short Answer**: No code changes required, only Redis configuration.
### Redis Persistence Options:
**RDB (Snapshotting)**:
- Periodic snapshots to disk
- Fast restarts, smaller files
- May lose last few seconds of data
**AOF (Append-Only File)**:
- Logs every write operation
- More durable, no data loss
- Slower restarts, larger files
**Recommended Configuration** (for your use case):
```bash
# docker-compose.yml
echoes_redis:
command: |
redis-server
--appendonly yes # Enable AOF
--appendfsync everysec # Sync every second (good balance)
--save 900 1 # RDB backup every 15 min if 1+ key changed
--save 300 10 # RDB backup every 5 min if 10+ keys changed
--save 60 10000 # RDB backup every 1 min if 10k+ keys changed
--maxmemory 512mb # Max memory usage
--maxmemory-policy allkeys-lru # Evict least recently used keys
```
**What This Gives You**:
-**AOF for durability**: Every write logged (max 1 second data loss)
-**RDB for fast recovery**: Snapshots for quick restarts
-**Memory protection**: Won't crash if memory full (evicts old caches)
**Application Code**: No changes needed! Redis handles persistence transparently.
**Testing Persistence**:
```bash
# 1. Add some data
docker exec echoes_of_the_ashes_redis redis-cli SET test:key "hello"
# 2. Restart Redis
docker restart echoes_of_the_ashes_redis
# 3. Check if data persisted
docker exec echoes_of_the_ashes_redis redis-cli GET test:key
# Should return: "hello"
```
---
## Q9: What if cache invalidation isn't aggressive enough?
**Potential Problems**:
### 1. Stale Player Stats
**Scenario**: Player levels up, but Redis cache shows old level
```
1. Player gains XP → DB updated (level 6)
2. Redis cache still shows level 5
3. Other players see "Lv. 5" instead of "Lv. 6"
```
**Solution**: Invalidate on every stat change
```python
async def update_character_stats(character_id: int, **kwargs):
# Update DB
await db.update_character(character_id, **kwargs)
# Invalidate Redis cache
await redis_manager.delete_player_session(character_id)
# Or update cache directly
session = await redis_manager.get_player_session(character_id)
if session:
session.update(kwargs)
await redis_manager.set_player_session(character_id, session)
```
### 2. Ghost Items in Inventory
**Scenario**: Player drops item, but cache shows they still have it
```
1. Player drops "Iron Sword"
2. DB updated (inventory row deleted)
3. Redis cache still shows sword in inventory
4. Player sees sword in UI, tries to equip → Error!
```
**Solution**: Invalidate inventory cache on add/remove/use
```python
async def remove_item_from_inventory(character_id: int, item_id: str):
# Update DB
await db.delete_inventory_item(character_id, item_id)
# Invalidate cache (force reload next time)
await redis_manager.invalidate_inventory(character_id)
```
### 3. Wrong Player Count in Location
**Scenario**: Player moves, but old location still shows them
```
1. Player moves overpass → gas_station
2. Redis location registry not updated
3. Other players in overpass still see them
4. Broadcasts sent to wrong location
```
**Solution**: Atomic location updates
```python
async def move_player(character_id: int, from_loc: str, to_loc: str):
# Use Redis transaction (atomic)
async with redis_manager.redis_client.pipeline() as pipe:
pipe.srem(f"location:{from_loc}:players", character_id)
pipe.sadd(f"location:{to_loc}:players", character_id)
await pipe.execute()
```
### 4. Combat State Desync
**Scenario**: Combat ends, but cache shows still in combat
```
1. Player defeats enemy
2. DB: active_combats row deleted
3. Redis: combat cache still exists
4. Player sees combat UI, can't move
```
**Solution**: Explicit cache deletion on combat end
```python
async def end_combat(character_id: int):
# Delete from DB
await db.end_combat(character_id)
# Delete Redis cache
await redis_manager.redis_client.delete(f"player:{character_id}:combat")
# Update player session
session = await redis_manager.get_player_session(character_id)
if session:
session['in_combat'] = 'false'
await redis_manager.set_player_session(character_id, session)
```
**General Strategy**:
```python
# PATTERN 1: Write-Through Cache (recommended for critical data)
async def update_data(key, value):
await db.update(key, value) # Write to DB first
await redis_manager.cache(key, value) # Update cache immediately
# PATTERN 2: Cache Invalidation (simpler, slight delay)
async def update_data(key, value):
await db.update(key, value) # Write to DB
await redis_manager.delete_cache(key) # Delete cache (reload on next access)
# PATTERN 3: TTL Fallback (for non-critical data)
# Set short TTLs (e.g., 30 seconds) so cache self-expires if not invalidated
await redis_manager.cache(key, value, ttl=30)
```
**For Your Game**:
-**Aggressive invalidation** for: inventory, combat state, player stats
-**Write-through cache** for: player sessions, location registry
-**TTL fallback** for: dropped items list, interactable cooldowns
---
## Q10: No feature flags needed (dev only)
**Agreed!** Since you're the only tester, we can implement directly without feature flags.
### Simplified Rollout:
**Phase 1: Redis Infrastructure (Week 1)**
- Add Redis to docker-compose
- Create redis_manager.py
- Test connection/pub-sub
**Phase 2: Pub/Sub Only (Week 2)**
- Update ConnectionManager to use Redis pub/sub
- Keep all other logic same (no caching yet)
- Test cross-worker broadcasts
**Phase 3: Add Caching (Week 3)**
- Add player session cache
- Add inventory cache
- Add combat state cache
- Test performance improvements
**Phase 4: Multi-Worker (Week 4)**
- Increase workers to 2
- Test load balancing
- Monitor for race conditions
**Simplified Implementation** (no toggles):
```python
# Just implement Redis directly
async def lifespan(app: FastAPI):
await db.init_db()
await redis_manager.connect() # No if/else, just do it
# ... rest of startup
```
---
## Updated Implementation Priority
Based on your feedback, here's what we'll actually implement:
### Phase 1: Redis Pub/Sub (Core Multi-Worker Support)
**Goal**: Enable cross-worker broadcasts
**Changes**:
1. Add Redis container
2. Create `redis_manager.py` with pub/sub only
3. Update ConnectionManager:
- Keep local WebSocket storage
- Change `send_personal_message()` → publish to Redis
- Change `send_to_location()` → publish to Redis
- Add `handle_redis_message()` → send to local WebSockets
4. Subscribe to location channels on startup
**What We DON'T Cache**:
- ❌ Locations (already in memory)
- ❌ Items (already in memory)
- ❌ NPCs (already in memory)
### Phase 2: Dynamic State Caching (Performance)
**Goal**: Reduce database queries for frequently accessed data
**What We DO Cache**:
1. ✅ Player sessions (location, HP, level, stats)
2. ✅ Location player registry (Set of character IDs per location)
3. ✅ Player inventory (with aggressive invalidation)
4. ✅ Active combat state (with explicit deletion)
5. ✅ Dropped items per location (with TTL)
### Phase 3: Multi-Worker Deployment
**Goal**: Horizontal scaling
**Changes**:
1. Update docker-compose for 4 workers
2. Test load distribution
3. Implement distributed background task locks
4. Monitor performance
---
## Next Steps
Ready to implement? Here's what I'll do:
1. **Create `redis_manager.py`** - Simplified version (no static data caching)
2. **Update `docker-compose.yml`** - Add Redis container
3. **Update `ConnectionManager`** - Integrate pub/sub
4. **Update endpoints** - Add cache invalidation where needed
5. **Implement disconnected player** - Keep in location, mark as vulnerable
6. **Test suite** - Verify cross-worker communication
Do you want me to proceed with implementation?