Troubleshooting Guide

Comprehensive guide for diagnosing and resolving common issues with Apollo RAG.

Troubleshooting Overview

Apollo is a complex system with multiple layers (Frontend, Tauri, Backend). Issues can arise at any layer. This guide helps you:

Identify which layer is causing problems
Use diagnostic commands to gather information
Apply fixes systematically
Know when to escalate issues

Quick Diagnostics

# Check backend health
curl http://localhost:8000/api/health
 
# Check Docker containers
docker ps
docker logs atlas-backend --tail 50
 
# Check GPU availability
nvidia-smi
 
# Monitor resource usage
docker stats

Common Issues by Category

Startup Issues

Docker Container Won’t Start

Symptoms: docker-compose up fails or containers exit immediately

Quick Fix: Check Docker daemon is running and you have sufficient resources (48GB RAM, 16GB VRAM minimum)

Diagnostic Steps:

# Check container status
docker-compose -f backend/docker-compose.atlas.yml ps
 
# View detailed logs
docker-compose -f backend/docker-compose.atlas.yml logs atlas-backend
 
# Check resource limits
docker stats --no-stream

Common Causes:

Insufficient VRAM: Model loading requires 6-14GB depending on model
- Error: CUDA out of memory
- Fix: Reduce GPU_LAYERS or switch to smaller model
Port conflicts: Another service using 8000, 6333, or 6379
- Error: bind: address already in use
- Fix: lsof -i :8000 to find conflicting process
Missing models: GGUF files not in /models/
- Error: FileNotFoundError: models/Meta-Llama-3.1-8B-Instruct-Q5_K_M.gguf
- Fix: Download models from HuggingFace

CUDA Not Detected

Symptoms: Backend starts but uses CPU, very slow inference (8-12 tok/s)

# Check CUDA installation
docker exec atlas-backend nvcc --version
 
# Check GPU visibility
docker exec atlas-backend nvidia-smi
 
# Verify CUDA libraries
docker exec atlas-backend ldconfig -p | grep cuda

Common Causes:

NVIDIA runtime not configured:

# Fix: Install nvidia-docker2
sudo apt-get install nvidia-docker2
sudo systemctl restart docker

WSL2 driver path missing (Windows):
- Error: libcuda.so.1: cannot open shared object file
- Fix: Add to docker-compose.yml:
```
environment:
  LD_LIBRARY_PATH: /usr/lib/wsl/drivers:$LD_LIBRARY_PATH
```

Model Loading Timeout

Symptoms: Container health check fails, backend stuck at “Loading model…”

Expected: Model loading takes 10-30 seconds. Health check has 45s start period.

# Monitor loading progress
docker logs atlas-backend -f | grep -i "loading\|model\|initialized"
 
# Check VRAM during load
watch -n 1 nvidia-smi

Fix: Increase health check start_period in docker-compose.atlas.yml:

healthcheck:
  start_period: 90s  # Increase from 45s

Performance Issues

Slow Query Response (more than 30s)

Expected latency: 8-15s (simple mode), 10-25s (adaptive mode)

Diagnostic Steps:

# Check current mode and settings
curl http://localhost:8000/api/settings
 
# Monitor query timing breakdown
# Look for "timing" in query response
curl -X POST http://localhost:8000/api/query \
  -H "Content-Type: application/json" \
  -d '{"question":"test","mode":"simple"}'

Common Causes:

Cache disabled: Redis not running or misconfigured
- Check: docker ps | grep redis
- Fix: Ensure Redis container healthy
CPU embeddings bottleneck: RTX 5080 incompatibility forces CPU mode
- Expected: 50-100ms embedding time
- If more than 500ms: Check FORCE_TORCH_CPU=1 is set
Reranking overhead: Using quality preset with many documents
- Fix: Switch to quick preset or reduce top_k values

Performance Tuning: Cache hit rate of 60-80% is normal. Check /api/cache/stats for current hit rate.

Memory Leaks

Symptoms: Gradual RAM/VRAM increase over time, eventual OOM

# Monitor memory over time
watch -n 5 'docker stats --no-stream | grep atlas-backend'
 
# Check Python memory usage
docker exec atlas-backend python -c "
import psutil
process = psutil.Process()
print(f'RAM: {process.memory_info().rss / 1024**3:.2f} GB')
"

Common Causes:

Conversation memory not clearing: Ring buffer grows unbounded
- Fix: Call /api/conversation/clear periodically
- Automatic fix: Set TTL in Redis
Embedding cache unbounded growth:
- Check: redis-cli INFO memory
- Fix: Redis maxmemory-policy: allkeys-lru configured correctly
Model hotswap residual VRAM:
- Symptom: VRAM not released after switch
- Fix: Explicit cleanup in model_manager.py (already implemented)

GPU Issues

Out of Memory (OOM)

Symptoms: CUDA out of memory error during query or reindexing

# Check VRAM usage
nvidia-smi --query-gpu=memory.used,memory.total --format=csv
 
# Monitor during operation
watch -n 1 nvidia-smi

Immediate Fixes:

Reduce GPU layers:

environment:
  GPU_LAYERS: 25  # Reduce from 33

Switch to smaller model:
- Llama 3.1 8B Q5: 5.4GB VRAM
- Llama 3.2 1B Q4: 771MB VRAM (draft model)

Clear GPU cache:

docker exec atlas-backend python -c "
import torch
torch.cuda.empty_cache()
print('GPU cache cleared')
"

Critical: Reindexing uses embedding model + reranker simultaneously (10GB VRAM). Ensure 16GB VRAM available.

GPU Not Utilized (0% Usage)

Symptoms: GPU idle while backend running, slow inference

# Check GPU process
nvidia-smi pmon -c 1
 
# Verify llama.cpp CUDA build
docker exec atlas-backend python -c "
from llama_cpp import Llama
print('CUDA available:', Llama.supports_cuda())
"

Fix: Rebuild llama-cpp-python with CUDA:

docker-compose down
docker-compose build --no-cache atlas-backend
docker-compose up -d

Cache Issues

Stale Cache Responses

Symptoms: Reindexed documents but queries return old results

Automatic Behavior: Cache cleared on reindex. If not, manual clear needed.

# Clear all caches
curl -X POST http://localhost:8000/api/conversation/clear
 
# Verify Redis empty
docker exec redis redis-cli FLUSHALL
 
# Check cache stats
curl http://localhost:8000/api/cache/stats

Redis Connection Failed

Symptoms: ConnectionError: Error 111 connecting to redis:6379. Connection refused.

# Check Redis container
docker ps | grep redis
 
# Test Redis connectivity from backend
docker exec atlas-backend redis-cli -h redis ping
 
# Check Redis logs
docker logs redis --tail 50

Fix: Restart Redis container:

docker-compose restart redis

Network Issues

Backend Not Reachable from Tauri

Symptoms: Frontend shows “Offline” indicator, health checks fail

# Test from host machine
curl http://localhost:8000/api/health
 
# Check Docker network
docker network inspect atlas-network
 
# Verify backend listening
docker exec atlas-backend netstat -tuln | grep 8000

Common Causes:

Firewall blocking localhost: Windows Firewall or antivirus
- Fix: Add exception for port 8000
Docker network isolation: Backend in custom network
- Fix: Use host network mode (not recommended) or verify port mapping
Backend crash loop: Check logs for Python errors
```
docker logs atlas-backend --tail 100 -f
```

Timeout on Long Queries

Symptoms: Query aborts after 60 seconds with timeout error

Expected: Long queries (adaptive mode, cold start) can take 25+ seconds

Fix: Increase timeout in frontend:

// src/services/api.ts
const TIMEOUTS = {
  QUERY: 120000,  // Increase from 60000
};

Debug Mode & Logging

Enable Debug Logging

# docker-compose.atlas.yml
environment:
  LOG_LEVEL: DEBUG  # Change from INFO

Restart backend:

docker-compose restart atlas-backend

Log Locations

# Backend application logs
docker logs atlas-backend
 
# Persistent logs (if volume mounted)
cat backend/logs/backend.log
 
# Redis logs
docker logs redis
 
# Qdrant logs
docker logs qdrant

Structured Logging

Backend uses structured JSON logging:

{
  "timestamp": "2025-10-28T10:30:45Z",
  "level": "INFO",
  "component": "rag_engine",
  "message": "Query processed",
  "metadata": {
    "query_time_ms": 8234,
    "cache_hit": false,
    "strategy": "simple"
  }
}

Search logs efficiently:

docker logs atlas-backend | jq 'select(.level == "ERROR")'
docker logs atlas-backend | jq 'select(.component == "llm_engine")'

Log Analysis Patterns

Common Error Patterns

Model loading failure:

ERROR:llm_engine_llamacpp:Failed to load model: CUDA out of memory

Fix: Reduce GPU_LAYERS or switch model

Vector DB connection error:

ERROR:vector_store_qdrant:Failed to connect to Qdrant at qdrant:6333

Fix: Check Qdrant container health

Prompt injection detected:

WARNING:api.query:Prompt injection attempt detected: ignore previous instructions...

Info: Logged but not blocked. Monitor for abuse patterns.

Performance Indicators

Look for timing breakdown in logs:

INFO:rag_engine:Query timing: cache_lookup=0.9ms, retrieval=145ms, generation=8234ms

Good: Cache hit less than 1ms, retrieval less than 200ms, generation 8-15s Bad: Retrieval more than 500ms (indexing issue), generation more than 30s (GPU not used)

Health Check Diagnostics

Interpreting Health Response

curl http://localhost:8000/api/health | jq

{
  "status": "healthy",
  "components": {
    "vectorstore": "ready",
    "llm": "ready",
    "bm25_retriever": "ready",
    "cache": "ready",
    "conversation_memory": "ready"
  }
}

Component Status:

ready: Component operational
error: Component failed initialization
null: Component not initialized yet

Degraded Mode: If status: "degraded", queries may still work but some features disabled (e.g., caching, BM25)

Getting Help

Information to Collect

When reporting issues, include:

System info:

docker version
nvidia-smi --query-gpu=name,driver_version,memory.total --format=csv

Backend logs:

docker logs atlas-backend --tail 200 > backend.log

Health check:

curl http://localhost:8000/api/health > health.json

Settings:

curl http://localhost:8000/api/settings > settings.json

Docker stats:
```
docker stats --no-stream > stats.txt
```

GitHub Issues

Open issues at: [GitHub Repository]

Template:

## Environment
- OS: Windows 11 / Ubuntu 22.04 / macOS
- Docker version:
- GPU: RTX 5080 / RTX 4090 / etc
- VRAM: 16GB
 
## Problem Description
[Clear description of the issue]
 
## Steps to Reproduce
1.
2.
3.
 
## Expected Behavior
[What should happen]
 
## Actual Behavior
[What actually happens]
 
## Logs
[Attach backend.log, health.json, stats.txt]

Community Support

Discord: [Link]
Documentation: https://docs.apollo-rag.com
Email: z@onyxlab.ai

Need more help? Check the Configuration Guide or API Reference.

Customization & Extensions