version 0.3.1

feat: host-centric network map, analysis dashboard, deduped inventory
- Rewrote NetworkMap to use deduplicated host inventory (163 hosts from 394K rows) - New host_inventory.py service: scans datasets, groups by FQDN/ClientId, extracts IPs/users/OS - New /api/network/host-inventory endpoint - Added AnalysisDashboard with 6 tabs (IOC, anomaly, host profile, query, triage, reports) - Added 16 analysis API endpoints with job queue and load balancer - Added 4 AI/analysis ORM models (ProcessingJob, AnalysisResult, HostProfile, IOCEntry) - Filters system accounts (DWM-*, UMFD-*, LOCAL/NETWORK SERVICE) - Infers OS from hostname patterns (W10-* -> Windows 10) - Canvas 2D force-directed graph with host/external-IP node types - Click popover shows hostname, FQDN, IPs, OS, users, datasets, connections
2026-03-01 14:00:20 -05:00 · 2026-02-20 07:16:17 -05:00 · 2026-02-20 07:16:17 -05:00 · 2026-02-19 15:41:15 -05:00 · 2025-12-29 10:22:57 -05:00 · 2025-12-24 13:28:52 -05:00
14082 changed files with 33864 additions and 1548655 deletions
--- a/.env
+++ b/.env
@@ -1,3 +0,0 @@
-DATABASE_URL=postgresql://admin:secure_password_123@database:5432/threat_hunter
-SECRET_KEY=your-very-secret-key-change-in-production
-FLASK_ENV=production
--- a/.env.example
+++ b/.env.example
@@ -0,0 +1,53 @@
+# ── ThreatHunt Configuration ──────────────────────────────────────────
+# All backend env vars are prefixed with TH_ and match AppConfig field names.
+# Copy this file to .env and adjust values.
+
+# ── General ───────────────────────────────────────────────────────────
+TH_DEBUG=false
+
+# ── Database ──────────────────────────────────────────────────────────
+# SQLite for local dev (zero-config):
+TH_DATABASE_URL=sqlite+aiosqlite:///./threathunt.db
+# PostgreSQL for production:
+# TH_DATABASE_URL=postgresql+asyncpg://threathunt:password@localhost:5432/threathunt
+
+# ── CORS ──────────────────────────────────────────────────────────────
+TH_ALLOWED_ORIGINS=http://localhost:3000,http://localhost:8000
+
+# ── File uploads ──────────────────────────────────────────────────────
+TH_MAX_UPLOAD_SIZE_MB=500
+
+# ── LLM Cluster (Wile & Roadrunner) ──────────────────────────────────
+TH_OPENWEBUI_URL=https://ai.guapo613.beer
+TH_OPENWEBUI_API_KEY=
+TH_WILE_HOST=100.110.190.12
+TH_WILE_OLLAMA_PORT=11434
+TH_ROADRUNNER_HOST=100.110.190.11
+TH_ROADRUNNER_OLLAMA_PORT=11434
+
+# ── Default models (auto-selected by TaskRouter) ─────────────────────
+TH_DEFAULT_FAST_MODEL=llama3.1:latest
+TH_DEFAULT_HEAVY_MODEL=llama3.1:70b-instruct-q4_K_M
+TH_DEFAULT_CODE_MODEL=qwen2.5-coder:32b
+TH_DEFAULT_VISION_MODEL=llama3.2-vision:11b
+TH_DEFAULT_EMBEDDING_MODEL=bge-m3:latest
+
+# ── Agent behaviour ──────────────────────────────────────────────────
+TH_AGENT_MAX_TOKENS=2048
+TH_AGENT_TEMPERATURE=0.3
+TH_AGENT_HISTORY_LENGTH=10
+TH_FILTER_SENSITIVE_DATA=true
+
+# ── Enrichment API keys (optional) ───────────────────────────────────
+TH_VIRUSTOTAL_API_KEY=
+TH_ABUSEIPDB_API_KEY=
+TH_SHODAN_API_KEY=
+
+# ── Auth ─────────────────────────────────────────────────────────────
+TH_JWT_SECRET=CHANGE-ME-IN-PRODUCTION-USE-A-REAL-SECRET
+TH_JWT_ACCESS_TOKEN_MINUTES=60
+TH_JWT_REFRESH_TOKEN_DAYS=7
+
+# ── Frontend ─────────────────────────────────────────────────────────
+REACT_APP_API_URL=http://localhost:8000
+
--- a/.gitignore
+++ b/.gitignore
@@ -0,0 +1,56 @@
+# ── Python ────────────────────────────────────
+__pycache__/
+*.py[cod]
+*$py.class
+*.egg-info/
+dist/
+build/
+*.egg
+.eggs/
+
+# ── Virtual environments ─────────────────────
+venv/
+.venv/
+env/
+
+# ── IDE / Editor ─────────────────────────────
+.vscode/
+.idea/
+*.swp
+*.swo
+*~
+
+# ── OS ────────────────────────────────────────
+.DS_Store
+Thumbs.db
+
+# ── Environment / Secrets ────────────────────
+.env
+*.env.local
+
+# ── Database ─────────────────────────────────
+*.db
+*.sqlite3
+
+# ── Uploads ──────────────────────────────────
+uploads/
+
+# ── Node / Frontend ──────────────────────────
+node_modules/
+frontend/build/
+frontend/.env.local
+npm-debug.log*
+yarn-debug.log*
+yarn-error.log*
+
+# ── Docker ───────────────────────────────────
+docker-compose.override.yml
+
+# ── Test / Coverage ──────────────────────────
+.coverage
+htmlcov/
+.pytest_cache/
+.mypy_cache/
+
+# ── Alembic ──────────────────────────────────
+alembic/versions/*.pyc
--- a/.vscode/settings.json
+++ b/.vscode/settings.json
@@ -1,5 +0,0 @@
-{
-    "cSpell.words": [
-        "jsonify"
-    ]
-}
--- a/Dockerfile.backend
+++ b/Dockerfile.backend
@@ -0,0 +1,32 @@
+# ThreatHunt Backend API - Python 3.13
+FROM python:3.13-slim
+
+WORKDIR /app
+
+# Install system dependencies
+RUN apt-get update && apt-get install -y --no-install-recommends \
+    gcc curl \
+    && rm -rf /var/lib/apt/lists/*
+
+# Copy requirements
+COPY backend/requirements.txt .
+
+# Install Python dependencies
+RUN pip install --no-cache-dir -r requirements.txt
+
+# Copy backend code
+COPY backend/ .
+
+# Create non-root user & data directory
+RUN useradd -m -u 1000 appuser && mkdir -p /app/data && chown -R appuser:appuser /app
+USER appuser
+
+# Expose port
+EXPOSE 8000
+
+# Health check
+HEALTHCHECK --interval=30s --timeout=10s --start-period=10s --retries=3 \
+    CMD curl -f http://localhost:8000/ || exit 1
+
+# Run Alembic migrations then start Uvicorn
+CMD ["sh", "-c", "python -m alembic upgrade head && python run.py"]
--- a/Dockerfile.frontend
+++ b/Dockerfile.frontend
@@ -0,0 +1,36 @@
+# ThreatHunt Frontend - Node.js React
+FROM node:20-alpine AS builder
+
+WORKDIR /app
+
+# Copy package files
+COPY frontend/package.json frontend/package-lock.json* ./
+
+# Install dependencies
+RUN npm ci
+
+# Copy source
+COPY frontend/public ./public
+COPY frontend/src ./src
+COPY frontend/tsconfig.json ./
+
+# Build application
+RUN npm run build
+
+# Production stage — nginx reverse-proxy + static files
+FROM nginx:alpine
+
+# Copy built React app
+COPY --from=builder /app/build /usr/share/nginx/html
+
+# Copy custom nginx config (proxies /api to backend)
+COPY frontend/nginx.conf /etc/nginx/conf.d/default.conf
+
+# Expose port
+EXPOSE 3000
+
+# Health check
+HEALTHCHECK --interval=30s --timeout=10s --start-period=5s --retries=3 \
+    CMD wget --quiet --tries=1 --spider http://localhost:3000/ || exit 1
+
+CMD ["nginx", "-g", "daemon off;"]
--- a/README.md
+++ b/README.md
@@ -1,69 +1,496 @@
-<<<<<<< Updated upstream
-# ThreatHunt
-=======
-# Cyber Threat Hunter
+# ThreatHunt - Analyst-Assist Threat Hunting Platform

-A modern web application for threat hunting and security analysis, built with React frontend and Flask backend.
+A modern threat hunting platform with integrated analyst-assist agent guidance. Analyze CSV artifact data exported from Velociraptor with AI-powered suggestions for investigation directions, analytical pivots, and hypothesis formation.

-## Features
+## Overview

- **Security Tools Detection**: Identify running security tools (AV, EDR, VPN)
- **CSV Processing**: Upload and analyze security logs
- **Baseline Analysis**: System baseline comparison
- **Network Analysis**: Network traffic and connection analysis
- **VirusTotal Integration**: File and URL reputation checking
+ThreatHunt is a web application designed to help security analysts efficiently hunt for threats by:
+- Importing CSV artifacts from Velociraptor or other sources
+- Displaying data in an organized, queryable interface
+- Providing AI-powered guidance through an analyst-assist agent
+- Suggesting analytical directions, filters, and pivots
+- Highlighting anomalies and patterns of interest

-## Architecture
-
-```
-ThreatHunt/
-├── frontend/          # React application
-├── backend/           # Flask API server
-├── uploaded/          # File upload storage
-└── output/           # Analysis results
-```
+> **Agent Policy**: The analyst-assist agent provides read-only guidance only. It does not execute actions, escalate alerts, or modify data. All decisions remain with the analyst.

 ## Quick Start

-### Backend Setup
+### Docker (Recommended)

 ```bash
-cd backend
-chmod +x setup_backend.sh
-./setup_backend.sh
-source venv/bin/activate
-python app.py
+# Clone and navigate
+git clone https://github.com/mblanke/ThreatHunt.git
+cd ThreatHunt
+
+# Configure provider (choose one)
+cp .env.example .env
+# Edit .env and set your LLM provider:
+# Option 1: Online (OpenAI, etc.)
+#   THREAT_HUNT_AGENT_PROVIDER=online
+#   THREAT_HUNT_ONLINE_API_KEY=sk-your-key
+# Option 2: Local (Ollama, GGML, etc.)
+#   THREAT_HUNT_AGENT_PROVIDER=local
+#   THREAT_HUNT_LOCAL_MODEL_PATH=/path/to/model
+# Option 3: Networked (Internal inference service)
+#   THREAT_HUNT_AGENT_PROVIDER=networked
+#   THREAT_HUNT_NETWORKED_ENDPOINT=http://service:5000
+
+# Start services
+docker-compose up -d
+
+# Verify
+curl http://localhost:8000/api/agent/health
+curl http://localhost:3000
 ```

-### Frontend Setup
+Access at http://localhost:3000

+### Local Development
+
+**Backend**:
+```bash
+cd backend
+python -m venv venv
+source venv/bin/activate  # Windows: venv\Scripts\activate
+pip install -r requirements.txt
+
+# Configure provider
+export THREAT_HUNT_ONLINE_API_KEY=sk-your-key
+# OR set another provider env var
+
+# Run
+python run.py
+# API at http://localhost:8000/docs
+```
+
+**Frontend** (new terminal):
 ```bash
 cd frontend
 npm install
-npm run dev
+npm start
+# App at http://localhost:3000
+```
+
+## Features
+
+### Analyst-Assist Agent 🤖
+- **Read-only guidance**: Explains data patterns and suggests investigation directions
+- **Context-aware**: Understands current dataset, host, and artifact type
+- **Pluggable providers**: Local, networked, or online LLM backends
+- **Transparent reasoning**: Explains logic with caveats and confidence scores
+- **Governance-compliant**: Strictly adheres to agent policy (no execution, no escalation)
+
+### Chat Interface
+- Analyst asks questions about artifact data
+- Agent provides guidance with suggested pivots and filters
+- Conversation history for context continuity
+- Real-time typing and response indicators
+
+### Data Management
+- Import CSV artifacts from Velociraptor
+- Browse and filter findings by severity, host, artifact type
+- Annotate findings with analyst notes
+- Track investigation progress
+
+## Architecture
+
+### Backend
+- **Framework**: FastAPI (Python 3.11)
+- **Agent Module**: Pluggable LLM provider interface
+- **API**: RESTful endpoints with OpenAPI documentation
+- **Structure**: Modular design with clear separation of concerns
+
+### Frontend
+- **Framework**: React 18 with TypeScript
+- **Components**: Agent chat panel + analysis dashboard
+- **Styling**: CSS with responsive design
+- **State Management**: React hooks + Context API
+
+### LLM Providers
+Supports three provider architectures:
+
+1. **Local**: On-device or on-prem models (GGML, Ollama, vLLM)
+2. **Networked**: Shared internal inference services
+3. **Online**: External hosted APIs (OpenAI, Anthropic, Google)
+
+Auto-detection: Automatically uses the first available provider.
+
+## Project Structure
+
+```
+ThreatHunt/
+├── backend/
+│   ├── app/
+│   │   ├── agents/              # Analyst-assist agent
+│   │   │   ├── core.py          # ThreatHuntAgent class
+│   │   │   ├── providers.py     # LLM provider interface
+│   │   │   ├── config.py        # Configuration
+│   │   │   └── __init__.py
+│   │   ├── api/routes/          # API endpoints
+│   │   │   ├── agent.py         # /api/agent/* routes
+│   │   │   ├── __init__.py
+│   │   ├── main.py              # FastAPI app
+│   │   └── __init__.py
+│   ├── requirements.txt
+│   ├── run.py
+│   └── Dockerfile
+├── frontend/
+│   ├── src/
+│   │   ├── components/
+│   │   │   ├── AgentPanel.tsx   # Chat interface
+│   │   │   └── AgentPanel.css
+│   │   ├── utils/
+│   │   │   └── agentApi.ts      # API communication
+│   │   ├── App.tsx
+│   │   ├── App.css
+│   │   ├── index.tsx
+│   │   └── index.css
+│   ├── public/index.html
+│   ├── package.json
+│   ├── tsconfig.json
+│   └── Dockerfile
+├── docker-compose.yml
+├── .env.example
+├── .gitignore
+├── AGENT_IMPLEMENTATION.md       # Technical guide
+├── INTEGRATION_GUIDE.md           # Deployment guide
+├── IMPLEMENTATION_SUMMARY.md      # Overview
+├── README.md                      # This file
+├── ROADMAP.md
+└── THREATHUNT_INTENT.md
 ```

 ## API Endpoints

- `GET /` - Serve React app
- `GET /api/health` - Health check
- `POST /api/upload` - File upload
- `GET /api/analysis/<id>` - Get analysis results
+### Agent Assistance
+- **POST /api/agent/assist** - Request guidance on artifact data
+- **GET /api/agent/health** - Check agent availability

-## Security Considerations
+See full API documentation at http://localhost:8000/docs

- File upload validation
- Input sanitization
- Rate limiting
- CORS configuration
+## Configuration
+
+### LLM Provider Selection
+
+Set via `THREAT_HUNT_AGENT_PROVIDER` environment variable:
+
+```bash
+# Auto-detect (tries local → networked → online)
+THREAT_HUNT_AGENT_PROVIDER=auto
+
+# Local (on-device/on-prem)
+THREAT_HUNT_AGENT_PROVIDER=local
+THREAT_HUNT_LOCAL_MODEL_PATH=/models/model.gguf
+
+# Networked (internal service)
+THREAT_HUNT_AGENT_PROVIDER=networked
+THREAT_HUNT_NETWORKED_ENDPOINT=http://inference:5000
+THREAT_HUNT_NETWORKED_KEY=api-key
+
+# Online (hosted API)
+THREAT_HUNT_AGENT_PROVIDER=online
+THREAT_HUNT_ONLINE_API_KEY=sk-your-key
+THREAT_HUNT_ONLINE_PROVIDER=openai
+THREAT_HUNT_ONLINE_MODEL=gpt-3.5-turbo
+```
+
+### Agent Behavior
+
+```bash
+THREAT_HUNT_AGENT_MAX_TOKENS=1024
+THREAT_HUNT_AGENT_REASONING=true
+THREAT_HUNT_AGENT_HISTORY_LENGTH=10
+THREAT_HUNT_AGENT_FILTER_SENSITIVE=true
+```
+
+See `.env.example` for all configuration options.
+
+## Governance & Compliance
+
+This implementation strictly follows governance principles:
+
+- ✅ **Agents assist analysts** - No autonomous execution
+- ✅ **No tool execution** - Agent provides guidance only
+- ✅ **No alert escalation** - Analyst controls alerts
+- ✅ **No data modification** - Read-only analysis
+- ✅ **Transparent reasoning** - Explains guidance with caveats
+- ✅ **Analyst authority** - All decisions remain with analyst
+
+**References**:
+- `goose-core/governance/AGENT_POLICY.md`
+- `goose-core/governance/AI_RULES.md`
+- `THREATHUNT_INTENT.md`
+
+## Documentation
+
+- **[AGENT_IMPLEMENTATION.md](AGENT_IMPLEMENTATION.md)** - Detailed technical architecture
+- **[INTEGRATION_GUIDE.md](INTEGRATION_GUIDE.md)** - Deployment and configuration
+- **[IMPLEMENTATION_SUMMARY.md](IMPLEMENTATION_SUMMARY.md)** - Feature overview
+
+## Testing the Agent
+
+### Check Health
+```bash
+curl http://localhost:8000/api/agent/health
+```
+
+### Test API
+```bash
+curl -X POST http://localhost:8000/api/agent/assist \
+  -H "Content-Type: application/json" \
+  -d '{
+    "query": "What patterns suggest suspicious activity?",
+    "dataset_name": "FileList",
+    "artifact_type": "FileList",
+    "host_identifier": "DESKTOP-ABC123"
+  }'
+```
+
+### Use UI
+1. Open http://localhost:3000
+2. Enter a question in the agent panel
+3. View guidance with suggested pivots and filters
+
+## Troubleshooting
+
+### Agent Unavailable (503)
+- Check environment variables for provider configuration
+- Verify LLM provider is accessible
+- See logs: `docker-compose logs backend`
+
+### No Frontend Response
+- Verify backend health: `curl http://localhost:8000/api/agent/health`
+- Check browser console for errors
+- See logs: `docker-compose logs frontend`
+
+See [INTEGRATION_GUIDE.md](INTEGRATION_GUIDE.md) for detailed troubleshooting.
+
+## Development
+
+### Running Tests
+```bash
+cd backend
+pytest
+
+cd ../frontend
+npm test
+```
+
+### Building Images
+```bash
+docker-compose build
+```
+
+### Logs
+```bash
+docker-compose logs -f backend
+docker-compose logs -f frontend
+```
+
+## Security Notes
+
+For production deployment:
+1. Add authentication to API endpoints
+2. Enable HTTPS/TLS
+3. Implement rate limiting
+4. Filter sensitive data before LLM
+5. Add audit logging
+6. Use secrets management for API keys
+
+See [INTEGRATION_GUIDE.md](INTEGRATION_GUIDE.md#security-notes) for details.
+
+## Future Enhancements
+
+- [ ] Integration with actual CVE databases
+- [ ] Fine-tuned models for cybersecurity domain
+- [ ] Structured output from LLMs (JSON mode)
+- [ ] Feedback loop on guidance quality
+- [ ] Multi-modal support (images, documents)
+- [ ] Compliance reporting and audit trails
+- [ ] Performance optimization and caching
+
+## Contributing
+
+Follow the architecture and governance principles in `goose-core`. All changes must:
+- Adhere to agent policy (read-only, advisory only)
+- Conform to shared terminology in goose-core
+- Include appropriate documentation
+- Pass tests and lint checks
+
+## License
+
+See LICENSE file
+
+## Support
+
+For issues or questions:
+1. Check [INTEGRATION_GUIDE.md](INTEGRATION_GUIDE.md)
+2. Review [AGENT_IMPLEMENTATION.md](AGENT_IMPLEMENTATION.md)
+3. See API docs at http://localhost:8000/docs
+4. Check backend logs for errors
+
+## Getting Started
+
+### Prerequisites
+
+- Docker and Docker Compose
+- Python 3.11+ (for local development)
+- Node.js 18+ (for local development)
+
+### Quick Start with Docker
+
+1. Clone the repository:
+```bash
+git clone https://github.com/mblanke/ThreatHunt.git
+cd ThreatHunt
+```
+
+2. Start all services:
+```bash
+docker-compose up -d
+```
+
+3. Access the application:
+- Frontend: http://localhost:3000
+- Backend API: http://localhost:8000
+- API Documentation: http://localhost:8000/docs
+
+### Local Development
+
+#### Backend
+
+```bash
+cd backend
+python -m venv venv
+source venv/bin/activate  # On Windows: venv\Scripts\activate
+pip install -r requirements.txt
+
+# Set up environment variables
+cp .env.example .env
+# Edit .env with your settings
+
+# Run migrations
+alembic upgrade head
+
+# Start development server
+uvicorn app.main:app --reload
+```
+
+#### Frontend
+
+```bash
+cd frontend
+npm install
+npm start
+```
+
+## API Endpoints
+
+### Authentication
+- `POST /api/auth/register` - Register a new user
+- `POST /api/auth/login` - Login and receive JWT token
+- `GET /api/auth/me` - Get current user profile
+- `PUT /api/auth/me` - Update current user profile
+
+### User Management (Admin only)
+- `GET /api/users` - List all users in tenant
+- `GET /api/users/{user_id}` - Get user by ID
+- `PUT /api/users/{user_id}` - Update user
+- `DELETE /api/users/{user_id}` - Deactivate user
+
+### Tenants
+- `GET /api/tenants` - List tenants
+- `POST /api/tenants` - Create tenant (admin)
+- `GET /api/tenants/{tenant_id}` - Get tenant by ID
+
+### Hosts
+- `GET /api/hosts` - List hosts (scoped to tenant)
+- `POST /api/hosts` - Create host
+- `GET /api/hosts/{host_id}` - Get host by ID
+
+### Ingestion
+- `POST /api/ingestion/ingest` - Upload and parse CSV files exported from Velociraptor
+
+### VirusTotal
+- `POST /api/vt/lookup` - Lookup hash in VirusTotal
+
+## Authentication Flow
+
+1. User registers or logs in via `/api/auth/login`
+2. Backend returns JWT token with user_id, tenant_id, and role
+3. Frontend stores token in localStorage
+4. All subsequent API requests include token in Authorization header
+5. Backend validates token and enforces tenant scoping
+
+## Multi-Tenancy
+
+- All data is scoped to tenant_id
+- Users can only access data within their tenant
+- Admin users have elevated permissions within their tenant
+- Cross-tenant access requires explicit permissions
+
+## Database Migrations
+
+Create a new migration:
+```bash
+cd backend
+alembic revision --autogenerate -m "Description of changes"
+```
+
+Apply migrations:
+```bash
+alembic upgrade head
+```
+
+Rollback migrations:
+```bash
+alembic downgrade -1
+```
+
+## Environment Variables
+
+### Backend
+- `DATABASE_URL` - PostgreSQL connection string
+- `SECRET_KEY` - Secret key for JWT signing (min 32 characters)
+- `ACCESS_TOKEN_EXPIRE_MINUTES` - JWT token expiration time (default: 30)
+- `VT_API_KEY` - VirusTotal API key for hash lookups
+
+### Frontend
+- `REACT_APP_API_URL` - Backend API URL (default: http://localhost:8000)
+
+## Security
+
+- Passwords are hashed using bcrypt
+- JWT tokens include expiration time
+- All API endpoints (except login/register) require authentication
+- Role-based access control for admin operations
+- Data isolation through tenant scoping
+
+## Testing
+
+### Backend
+```bash
+cd backend
+pytest
+```
+
+### Frontend
+```bash
+cd frontend
+npm test
+```

 ## Contributing

 1. Fork the repository
-2. Create feature branch
-3. Submit pull request
+2. Create a feature branch
+3. Make your changes
+4. Submit a pull request

 ## License

-MIT License
->>>>>>> Stashed changes
+[Your License Here]
+
+## Support
+
+For issues and questions, please open an issue on GitHub.
--- a/SKILLS/00-operating-model.md
+++ b/SKILLS/00-operating-model.md
@@ -0,0 +1,21 @@
+
+# Operating Model
+
+## Default cadence
+- Prefer iterative progress over big bangs.
+- Keep diffs small: target ≤ 300 changed lines per PR unless justified.
+- Update tests/docs as part of the same change when possible.
+
+## Working agreement
+- Start with a PLAN for non-trivial tasks.
+- Implement the smallest slice that satisfies acceptance criteria.
+- Verify via DoD.
+- Write a crisp PR summary: what changed, why, and how verified.
+
+## Stop conditions (plan first)
+Stop and produce a PLAN (do not code yet) if:
+- scope is unclear
+- more than 3 files will change
+- data model changes
+- auth/security boundaries
+- performance-critical paths
--- a/SKILLS/05-agent-taxonomy.md
+++ b/SKILLS/05-agent-taxonomy.md
@@ -0,0 +1,36 @@
+# Agent Types & Roles (Practical Taxonomy)
+
+Use this skill to choose the *right* kind of agent workflow for the job.
+
+## Common agent "types" (in practice)
+
+### 1) Chat assistant (no tools)
+Best for: explanations, brainstorming, small edits.
+Risk: can hallucinate; no grounding in repo state.
+
+### 2) Tool-using single agent
+Best for: well-scoped tasks where the agent can read/write files and run commands.
+Key control: strict DoD gates + minimal permissions.
+
+### 3) Planner + Executor (2-role pattern)
+Best for: medium complexity work (multi-file changes, feature work).
+Flow: Planner writes plan + acceptance criteria → Executor implements → Reviewer checks.
+
+### 4) Multi-agent (specialists)
+Best for: bigger features with separable workstreams (UI, backend, docs, tests).
+Rule: isolate context per role; use separate branches/worktrees.
+
+### 5) Supervisor / orchestrator
+Best for: long-running workflows with checkpoints (pipelines, report generation, PAD docs).
+Rule: supervisor delegates, enforces gates, and composes final output.
+
+## Decision rules (fast)
+- If you can describe it in ≤ 5 steps → single tool-using agent.
+- If you need tradeoffs/design → Planner + Executor.
+- If UI + backend + docs/tests all move → multi-agent specialists.
+- If it's a pipeline that runs repeatedly → orchestrator.
+
+## Guardrails (always)
+- DoD is the truth gate.
+- Separate branches/worktrees for parallel work.
+- Log decisions + commands in AGENT_LOG.md.
--- a/SKILLS/10-definition-of-done.md
+++ b/SKILLS/10-definition-of-done.md
@@ -0,0 +1,24 @@
+
+# Definition of Done (DoD)
+
+A change is "done" only when:
+
+## Code correctness
+- Builds successfully (if applicable)
+- Tests pass
+- Linting/formatting passes
+- Types/checks pass (if applicable)
+
+## Quality
+- No new warnings introduced
+- Edge cases handled (inputs validated, errors meaningful)
+- Hot paths not regressed (if applicable)
+
+## Hygiene
+- No secrets committed
+- Docs updated if behavior or usage changed
+- PR summary includes verification steps
+
+## Commands
+- macOS/Linux: `./scripts/dod.sh`
+- Windows: `\scripts\dod.ps1`
--- a/SKILLS/20-repo-map.md
+++ b/SKILLS/20-repo-map.md
@@ -0,0 +1,16 @@
+
+# Repo Mapping Skill
+
+When entering a repo:
+1) Read README.md
+2) Identify entrypoints (app main / server startup / CLI)
+3) Identify config (env vars, .env.example, config files)
+4) Identify test/lint scripts (package.json, pyproject.toml, Makefile, etc.)
+5) Write a 10-line "repo map" in the PLAN before changing code
+
+Output format:
+- Purpose:
+- Key modules:
+- Data flow:
+- Commands:
+- Risks:
--- a/SKILLS/25-algorithms-performance.md
+++ b/SKILLS/25-algorithms-performance.md
@@ -0,0 +1,20 @@
+# Algorithms & Performance
+
+Use this skill when performance matters (large inputs, hot paths, or repeated calls).
+
+## Checklist
+- Identify the **state** you're recomputing.
+- Add **memoization / caching** when the same subproblem repeats.
+- Prefer **linear scans** + caches over nested loops when possible.
+- If you can write it as a **recurrence**, you can test it.
+
+## Practical heuristics
+- Measure first when possible (timing + input sizes).
+- Optimize the biggest wins: avoid repeated I/O, repeated parsing, repeated network calls.
+- Keep caches bounded (size/TTL) and invalidate safely.
+- Choose data structures intentionally: dict/set for membership, heap for top-k, deque for queues.
+
+## Review notes (for PRs)
+- Call out accidental O(n²) patterns.
+- Suggest table/DP or memoization when repeated work is obvious.
+- Add tests that cover base cases + typical cases + worst-case size.
--- a/SKILLS/26-vibe-coding-fundamentals.md
+++ b/SKILLS/26-vibe-coding-fundamentals.md
@@ -0,0 +1,31 @@
+# Vibe Coding With Fundamentals (Safety Rails)
+
+Use this skill when you're using "vibe coding" (fast, conversational building) but want production-grade outcomes.
+
+## The good
+- Rapid scaffolding and iteration
+- Fast UI prototypes
+- Quick exploration of architectures and options
+
+## The failure mode
+- "It works on my machine" code with weak tests
+- Security foot-guns (auth, input validation, secrets)
+- Performance cliffs (accidental O(n²), repeated I/O)
+- Unmaintainable abstractions
+
+## Safety rails (apply every time)
+- Always start with acceptance criteria (what "done" means).
+- Prefer small PRs; never dump a huge AI diff.
+- Require DoD gates (lint/test/build) before merge.
+- Write tests for behavior changes.
+- For anything security/data related: do a Reviewer pass.
+
+## When to slow down
+- Auth/session/token work
+- Anything touching payments, PII, secrets
+- Data migrations/schema changes
+- Performance-critical paths
+- "It's flaky" or "it only fails in CI"
+
+## Practical prompt pattern (use in PLAN)
+- "State assumptions, list files to touch, propose tests, and include rollback steps."
--- a/SKILLS/27-performance-profiling.md
+++ b/SKILLS/27-performance-profiling.md
@@ -0,0 +1,31 @@
+# Performance Profiling (Bun/Node)
+
+Use this skill when:
+- a hot path feels slow
+- CPU usage is high
+- you suspect accidental O(n²) or repeated work
+- you need evidence before optimizing
+
+## Bun CPU profiling
+Bun supports CPU profiling via `--cpu-prof` (generates a `.cpuprofile` you can open in Chrome DevTools).
+
+Upcoming: `bun --cpu-prof-md <script>` outputs a CPU profile as **Markdown** so LLMs can read/grep it easily.
+
+### Workflow (Bun)
+1) Run the workload with profiling enabled
+   - Today: `bun --cpu-prof ./path/to/script.ts`
+   - Upcoming: `bun --cpu-prof-md ./path/to/script.ts`
+2) Save the output (or `.cpuprofile`) into `./profiles/` with a timestamp.
+3) Ask the Reviewer agent to:
+   - identify the top 5 hottest functions
+   - propose the smallest fix
+   - add a regression test or benchmark
+
+## Node CPU profiling (fallback)
+- `node --cpu-prof ./script.js` writes a `.cpuprofile` file.
+- Open in Chrome DevTools → Performance → Load profile.
+
+## Rules
+- Optimize based on measured hotspots, not vibes.
+- Prefer algorithmic wins (remove repeated work) over micro-optimizations.
+- Keep profiling artifacts out of git unless explicitly needed (use `.gitignore`).
--- a/SKILLS/30-implementation-rules.md
+++ b/SKILLS/30-implementation-rules.md
@@ -0,0 +1,16 @@
+
+# Implementation Rules
+
+## Change policy
+- Prefer edits over rewrites.
+- Keep changes localized.
+- One change = one purpose.
+- Avoid unnecessary abstraction.
+
+## Dependency policy
+- Default: do not add dependencies.
+- If adding: explain why, alternatives considered, and impact.
+
+## Error handling
+- Validate inputs at boundaries.
+- Error messages must be actionable: what failed + what to do next.
--- a/SKILLS/40-testing-quality.md
+++ b/SKILLS/40-testing-quality.md
@@ -0,0 +1,14 @@
+
+# Testing & Quality
+
+## Strategy
+- If behavior changes: add/update tests.
+- Unit tests for logic; integration tests for boundaries; E2E only where needed.
+
+## Minimum for every PR
+- A test plan in the PR summary (even if "existing tests cover this").
+- Run DoD.
+
+## Flaky tests
+- Capture repro steps.
+- Quarantine only with justification + follow-up issue.
--- a/SKILLS/50-pr-review.md
+++ b/SKILLS/50-pr-review.md
@@ -0,0 +1,16 @@
+
+# PR Review Skill
+
+Reviewer must check:
+- Correctness: does it do what it claims?
+- Safety: secrets, injection, auth boundaries
+- Maintainability: readability, naming, duplication
+- Tests: added/updated appropriately
+- DoD: did it pass?
+
+Reviewer output format:
+1) Summary
+2) Must-fix
+3) Nice-to-have
+4) Risks
+5) Verification suggestions
--- a/SKILLS/56-ui-material-ui.md
+++ b/SKILLS/56-ui-material-ui.md
@@ -0,0 +1,41 @@
+# Material UI (MUI) Design System
+
+Use this skill for any React/Next "portal/admin/dashboard" UI so you stay consistent and avoid random component soup.
+
+## Standard choice
+- Preferred UI library: **MUI (Material UI)**.
+- Prefer MUI components over ad-hoc HTML/CSS unless there's a good reason.
+- One design system per repo (do not mix Chakra/Ant/Bootstrap/etc.).
+
+## Setup (Next.js/React)
+- Install: `@mui/material @emotion/react @emotion/styled`
+- If using icons: `@mui/icons-material`
+- If using data grid: `@mui/x-data-grid` (or pro if licensed)
+
+## Theming rules
+- Define a single theme (typography, spacing, palette) and reuse everywhere.
+- Use semantic colors (primary/secondary/error/warning/success/info), not hard-coded hex everywhere.
+- Prefer MUI's `sx` for small styling; use `styled()` for reusable components.
+
+## "Portal" patterns (modals, popovers, menus)
+- Use MUI Dialog/Modal/Popover/Menu components instead of DIY portals.
+- Accessibility requirements:
+  - Focus is trapped in Dialog/Modal.
+  - Escape closes modal unless explicitly prevented.
+  - All inputs have labels; buttons have clear text/aria-labels.
+  - Keyboard navigation works end-to-end.
+
+## Layout conventions (for portals)
+- Use: AppBar + Drawer (or NavigationRail equivalent) + main content.
+- Keep pages as composition of small components: Page → Sections → Widgets.
+- Keep forms consistent: FormControl + helper text + validation messages.
+
+## Performance hygiene
+- Avoid re-render storms: memoize heavy lists; use virtualization for large tables (DataGrid).
+- Prefer server pagination for huge datasets.
+
+## PR review checklist
+- Theme is used (no random styling).
+- Components are MUI where reasonable.
+- Modal/popover accessibility is correct.
+- No mixed UI libraries.
--- a/SKILLS/60-security-safety.md
+++ b/SKILLS/60-security-safety.md
@@ -0,0 +1,15 @@
+
+# Security & Safety
+
+## Secrets
+- Never output secrets or tokens.
+- Never log sensitive inputs.
+- Never commit credentials.
+
+## Inputs
+- Validate external inputs at boundaries.
+- Fail closed for auth/security decisions.
+
+## Tooling
+- No destructive commands unless requested and scoped.
+- Prefer read-only operations first.
--- a/SKILLS/70-docs-artifacts.md
+++ b/SKILLS/70-docs-artifacts.md
@@ -0,0 +1,13 @@
+
+# Docs & Artifacts
+
+Update documentation when:
+- setup steps change
+- env vars change
+- endpoints/CLI behavior changes
+- data formats change
+
+Docs standards:
+- Provide copy/paste commands
+- Provide expected outputs where helpful
+- Keep it short and accurate
--- a/SKILLS/80-mcp-tools.md
+++ b/SKILLS/80-mcp-tools.md
@@ -0,0 +1,11 @@
+
+# MCP Tools Skill (Optional)
+
+If this repo defines MCP servers/tools:
+
+Rules:
+- Tool calls must be explicit and logged.
+- Maintain an allowlist of tools; deny by default.
+- Every tool must have: purpose, inputs/outputs schema, examples, and tests.
+- Prefer idempotent tool operations.
+- Never add tools that can exfiltrate secrets without strict guards.
--- a/SKILLS/82-mcp-server-design.md
+++ b/SKILLS/82-mcp-server-design.md
@@ -0,0 +1,51 @@
+# MCP Server Design (Agent-First)
+
+Build MCP servers like you're designing a UI for a non-human user.
+
+This skill distills Phil Schmid's MCP server best practices into concrete repo rules.
+Source: "MCP is Not the Problem, It's your Server" (Jan 21, 2026).
+
+## 1) Outcomes, not operations
+- Do **not** wrap REST endpoints 1:1 as tools.
+- Expose high-level, outcome-oriented tools.
+  - Bad: `get_user`, `list_orders`, `get_order_status`
+  - Good: `track_latest_order(email)` (server orchestrates internally)
+
+## 2) Flatten arguments
+- Prefer top-level primitives + constrained enums.
+- Avoid nested `dict`/config objects (agents hallucinate keys).
+- Defaults reduce decision load.
+
+## 3) Instructions are context
+- Tool docstrings are *instructions*:
+  - when to use the tool
+  - argument formatting rules
+  - what the return means
+- Error strings are also context:
+  - return actionable, self-correcting messages (not raw stack traces)
+
+## 4) Curate ruthlessly
+- Aim for **5–15 tools** per server.
+- One server, one job. Split by persona if needed.
+- Delete unused tools. Don't dump raw data into context.
+
+## 5) Name tools for discovery
+- Avoid generic names (`create_issue`).
+- Prefer `{service}_{action}_{resource}`:
+  - `velociraptor_run_hunt`
+  - `github_list_prs`
+  - `slack_send_message`
+
+## 6) Paginate large results
+- Always support `limit` (default ~20–50).
+- Return metadata: `has_more`, `next_offset`, `total_count`.
+- Never return hundreds of rows unbounded.
+
+## Repo conventions
+- Put MCP tool specs in `mcp/` (schemas, examples, fixtures).
+- Provide at least 1 "golden path" example call per tool.
+- Add an eval that checks:
+  - tool names follow discovery convention
+  - args are flat + typed
+  - responses are concise + stable
+  - pagination works
--- a/SKILLS/83-fastmcp-3-patterns.md
+++ b/SKILLS/83-fastmcp-3-patterns.md
@@ -0,0 +1,40 @@
+# FastMCP 3 Patterns (Providers + Transforms)
+
+Use this skill when you are building MCP servers in Python and want:
+- composable tool sets
+- per-user/per-session behavior
+- auth, versioning, observability, and long-running tasks
+
+## Mental model (FastMCP 3)
+FastMCP 3 treats everything as three composable primitives:
+- **Components**: what you expose (tools, resources, prompts)
+- **Providers**: where components come from (decorators, files, OpenAPI, remote MCP, etc.)
+- **Transforms**: how you reshape what clients see (namespace, filters, auth, versioning, visibility)
+
+## Recommended architecture for Marc's platform
+Build a **single "Cyber MCP Gateway"** that composes providers:
+- LocalProvider: core cyber tools (run hunt, parse triage, generate report)
+- OpenAPIProvider: wrap stable internal APIs (ticketing, asset DB) without 1:1 endpoint exposure
+- ProxyProvider/FastMCPProvider: mount sub-servers (e.g., Velociraptor tools, Intel feeds)
+
+Then apply transforms:
+- Namespace per domain: `hunt.*`, `intel.*`, `pad.*`
+- Visibility per session: hide dangerous tools unless user/role allows
+- VersionFilter: keep old clients working while you evolve tools
+
+## Production must-haves
+- **Tool timeouts**: never let a tool hang forever
+- **Pagination**: all list tools must be bounded
+- **Background tasks**: use for long hunts / ingest jobs
+- **Tracing**: emit OpenTelemetry traces so you can debug agent/tool behavior
+
+## Auth rules
+- Prefer component-level auth for "dangerous" tools.
+- Default stance: read-only tools visible; write/execute tools gated.
+
+## Versioning rules
+- Version your components when you change schemas or semantics.
+- Keep 1 previous version callable during migrations.
+
+## Upgrade guidance
+FastMCP 3 is in beta; pin to v2 for stability in production until you've tested.
--- a/backend/.env
+++ b/backend/.env
@@ -1,9 +0,0 @@
-FLASK_ENV=development
-FLASK_DEBUG=True
-SECRET_KEY=development-secret-key-change-in-production
-MAX_CONTENT_LENGTH=104857600
-UPLOAD_FOLDER=uploaded
-OUTPUT_FOLDER=output
-VIRUSTOTAL_API_KEY=
-DATABASE_URL=sqlite:///threat_hunter.db
-REDIS_URL=redis://localhost:6379/0
--- a/backend/Dockerfile
+++ b/backend/Dockerfile
@@ -1,23 +0,0 @@
-FROM python:3.11-slim
-
-WORKDIR /app
-
-# Install system dependencies
-RUN apt-get update && apt-get install -y \
-    gcc \
-    postgresql-client \
-    && rm -rf /var/lib/apt/lists/*
-
-# Copy requirements first for better caching
-COPY requirements.txt .
-RUN pip install --no-cache-dir -r requirements.txt
-
-# Copy application code
-COPY . .
-
-# Create directories
-RUN mkdir -p uploads output
-
-EXPOSE 5000
-
-CMD ["python", "app.py"]
--- a/backend/Dockerfile.prod
+++ b/backend/Dockerfile.prod
@@ -1,24 +0,0 @@
-FROM python:3.11-slim
-
-WORKDIR /app
-
-# Install system dependencies
-RUN apt-get update && apt-get install -y \
-    gcc \
-    postgresql-client \
-    && rm -rf /var/lib/apt/lists/*
-
-# Create non-root user
-RUN useradd --create-home --shell /bin/bash app
-
-COPY requirements.txt .
-RUN pip install --no-cache-dir -r requirements.txt
-
-COPY . .
-RUN chown -R app:app /app
-
-USER app
-
-EXPOSE 5000
-
-CMD ["gunicorn", "--bind", "0.0.0.0:5000", "--workers", "4", "app:app"]
--- a/backend/alembic.ini
+++ b/backend/alembic.ini
@@ -0,0 +1,149 @@
+# A generic, single database configuration.
+
+[alembic]
+# path to migration scripts.
+# this is typically a path given in POSIX (e.g. forward slashes)
+# format, relative to the token %(here)s which refers to the location of this
+# ini file
+script_location = %(here)s/alembic
+
+# template used to generate migration file names; The default value is %%(rev)s_%%(slug)s
+# Uncomment the line below if you want the files to be prepended with date and time
+# see https://alembic.sqlalchemy.org/en/latest/tutorial.html#editing-the-ini-file
+# for all available tokens
+# file_template = %%(year)d_%%(month).2d_%%(day).2d_%%(hour).2d%%(minute).2d-%%(rev)s_%%(slug)s
+# Or organize into date-based subdirectories (requires recursive_version_locations = true)
+# file_template = %%(year)d/%%(month).2d/%%(day).2d_%%(hour).2d%%(minute).2d_%%(second).2d_%%(rev)s_%%(slug)s
+
+# sys.path path, will be prepended to sys.path if present.
+# defaults to the current working directory.  for multiple paths, the path separator
+# is defined by "path_separator" below.
+prepend_sys_path = .
+
+
+# timezone to use when rendering the date within the migration file
+# as well as the filename.
+# If specified, requires the tzdata library which can be installed by adding
+# `alembic[tz]` to the pip requirements.
+# string value is passed to ZoneInfo()
+# leave blank for localtime
+# timezone =
+
+# max length of characters to apply to the "slug" field
+# truncate_slug_length = 40
+
+# set to 'true' to run the environment during
+# the 'revision' command, regardless of autogenerate
+# revision_environment = false
+
+# set to 'true' to allow .pyc and .pyo files without
+# a source .py file to be detected as revisions in the
+# versions/ directory
+# sourceless = false
+
+# version location specification; This defaults
+# to <script_location>/versions.  When using multiple version
+# directories, initial revisions must be specified with --version-path.
+# The path separator used here should be the separator specified by "path_separator"
+# below.
+# version_locations = %(here)s/bar:%(here)s/bat:%(here)s/alembic/versions
+
+# path_separator; This indicates what character is used to split lists of file
+# paths, including version_locations and prepend_sys_path within configparser
+# files such as alembic.ini.
+# The default rendered in new alembic.ini files is "os", which uses os.pathsep
+# to provide os-dependent path splitting.
+#
+# Note that in order to support legacy alembic.ini files, this default does NOT
+# take place if path_separator is not present in alembic.ini.  If this
+# option is omitted entirely, fallback logic is as follows:
+#
+# 1. Parsing of the version_locations option falls back to using the legacy
+#    "version_path_separator" key, which if absent then falls back to the legacy
+#    behavior of splitting on spaces and/or commas.
+# 2. Parsing of the prepend_sys_path option falls back to the legacy
+#    behavior of splitting on spaces, commas, or colons.
+#
+# Valid values for path_separator are:
+#
+# path_separator = :
+# path_separator = ;
+# path_separator = space
+# path_separator = newline
+#
+# Use os.pathsep. Default configuration used for new projects.
+path_separator = os
+
+# set to 'true' to search source files recursively
+# in each "version_locations" directory
+# new in Alembic version 1.10
+# recursive_version_locations = false
+
+# the output encoding used when revision files
+# are written from script.py.mako
+# output_encoding = utf-8
+
+# database URL.  This is consumed by the user-maintained env.py script only.
+# other means of configuring database URLs may be customized within the env.py
+# file.
+sqlalchemy.url = sqlite+aiosqlite:///./threathunt.db
+
+
+[post_write_hooks]
+# post_write_hooks defines scripts or Python functions that are run
+# on newly generated revision scripts.  See the documentation for further
+# detail and examples
+
+# format using "black" - use the console_scripts runner, against the "black" entrypoint
+# hooks = black
+# black.type = console_scripts
+# black.entrypoint = black
+# black.options = -l 79 REVISION_SCRIPT_FILENAME
+
+# lint with attempts to fix using "ruff" - use the module runner, against the "ruff" module
+# hooks = ruff
+# ruff.type = module
+# ruff.module = ruff
+# ruff.options = check --fix REVISION_SCRIPT_FILENAME
+
+# Alternatively, use the exec runner to execute a binary found on your PATH
+# hooks = ruff
+# ruff.type = exec
+# ruff.executable = ruff
+# ruff.options = check --fix REVISION_SCRIPT_FILENAME
+
+# Logging configuration.  This is also consumed by the user-maintained
+# env.py script only.
+[loggers]
+keys = root,sqlalchemy,alembic
+
+[handlers]
+keys = console
+
+[formatters]
+keys = generic
+
+[logger_root]
+level = WARNING
+handlers = console
+qualname =
+
+[logger_sqlalchemy]
+level = WARNING
+handlers =
+qualname = sqlalchemy.engine
+
+[logger_alembic]
+level = INFO
+handlers =
+qualname = alembic
+
+[handler_console]
+class = StreamHandler
+args = (sys.stderr,)
+level = NOTSET
+formatter = generic
+
+[formatter_generic]
+format = %(levelname)-5.5s [%(name)s] %(message)s
+datefmt = %H:%M:%S
--- a/backend/alembic/README
+++ b/backend/alembic/README
@@ -0,0 +1 @@
+Generic single-database configuration.
--- a/backend/alembic/env.py
+++ b/backend/alembic/env.py
@@ -0,0 +1,67 @@
+"""Alembic async env — autogenerate from app.db.models."""
+
+import asyncio
+from logging.config import fileConfig
+
+from sqlalchemy import pool
+from sqlalchemy.ext.asyncio import async_engine_from_config
+
+from alembic import context
+
+# Alembic Config
+config = context.config
+
+if config.config_file_name is not None:
+    fileConfig(config.config_file_name)
+
+# Import all models so autogenerate sees them
+from app.db.engine import Base  # noqa: E402
+from app.db import models as _models  # noqa: E402, F401
+
+target_metadata = Base.metadata
+
+
+def run_migrations_offline() -> None:
+    """Run migrations in 'offline' mode."""
+    url = config.get_main_option("sqlalchemy.url")
+    context.configure(
+        url=url,
+        target_metadata=target_metadata,
+        literal_binds=True,
+        dialect_opts={"paramstyle": "named"},
+        render_as_batch=True,  # required for SQLite ALTER TABLE
+    )
+    with context.begin_transaction():
+        context.run_migrations()
+
+
+def do_run_migrations(connection):
+    context.configure(
+        connection=connection,
+        target_metadata=target_metadata,
+        render_as_batch=True,
+    )
+    with context.begin_transaction():
+        context.run_migrations()
+
+
+async def run_async_migrations() -> None:
+    """Run migrations in 'online' mode with an async engine."""
+    connectable = async_engine_from_config(
+        config.get_section(config.config_ini_section, {}),
+        prefix="sqlalchemy.",
+        poolclass=pool.NullPool,
+    )
+    async with connectable.connect() as connection:
+        await connection.run_sync(do_run_migrations)
+    await connectable.dispose()
+
+
+def run_migrations_online() -> None:
+    asyncio.run(run_async_migrations())
+
+
+if context.is_offline_mode():
+    run_migrations_offline()
+else:
+    run_migrations_online()
--- a/backend/alembic/script.py.mako
+++ b/backend/alembic/script.py.mako
@@ -0,0 +1,28 @@
+"""${message}
+
+Revision ID: ${up_revision}
+Revises: ${down_revision | comma,n}
+Create Date: ${create_date}
+
+"""
+from typing import Sequence, Union
+
+from alembic import op
+import sqlalchemy as sa
+${imports if imports else ""}
+
+# revision identifiers, used by Alembic.
+revision: str = ${repr(up_revision)}
+down_revision: Union[str, Sequence[str], None] = ${repr(down_revision)}
+branch_labels: Union[str, Sequence[str], None] = ${repr(branch_labels)}
+depends_on: Union[str, Sequence[str], None] = ${repr(depends_on)}
+
+
+def upgrade() -> None:
+    """Upgrade schema."""
+    ${upgrades if upgrades else "pass"}
+
+
+def downgrade() -> None:
+    """Downgrade schema."""
+    ${downgrades if downgrades else "pass"}
--- a/backend/alembic/versions/9790f482da06_initial_schema.py
+++ b/backend/alembic/versions/9790f482da06_initial_schema.py
@@ -0,0 +1,210 @@
+"""initial schema
+
+Revision ID: 9790f482da06
+Revises: 
+Create Date: 2026-02-19 11:40:02.108830
+
+"""
+from typing import Sequence, Union
+
+from alembic import op
+import sqlalchemy as sa
+
+
+# revision identifiers, used by Alembic.
+revision: str = '9790f482da06'
+down_revision: Union[str, Sequence[str], None] = None
+branch_labels: Union[str, Sequence[str], None] = None
+depends_on: Union[str, Sequence[str], None] = None
+
+
+def upgrade() -> None:
+    """Upgrade schema."""
+    # ### commands auto generated by Alembic - please adjust! ###
+    op.create_table('users',
+    sa.Column('id', sa.String(length=32), nullable=False),
+    sa.Column('username', sa.String(length=64), nullable=False),
+    sa.Column('email', sa.String(length=256), nullable=False),
+    sa.Column('hashed_password', sa.String(length=256), nullable=False),
+    sa.Column('role', sa.String(length=16), nullable=False),
+    sa.Column('is_active', sa.Boolean(), nullable=False),
+    sa.Column('created_at', sa.DateTime(timezone=True), nullable=False),
+    sa.PrimaryKeyConstraint('id'),
+    sa.UniqueConstraint('email')
+    )
+    with op.batch_alter_table('users', schema=None) as batch_op:
+        batch_op.create_index(batch_op.f('ix_users_username'), ['username'], unique=True)
+
+    op.create_table('hunts',
+    sa.Column('id', sa.String(length=32), nullable=False),
+    sa.Column('name', sa.String(length=256), nullable=False),
+    sa.Column('description', sa.Text(), nullable=True),
+    sa.Column('status', sa.String(length=32), nullable=False),
+    sa.Column('owner_id', sa.String(length=32), nullable=True),
+    sa.Column('created_at', sa.DateTime(timezone=True), nullable=False),
+    sa.Column('updated_at', sa.DateTime(timezone=True), nullable=False),
+    sa.ForeignKeyConstraint(['owner_id'], ['users.id'], ),
+    sa.PrimaryKeyConstraint('id')
+    )
+    op.create_table('datasets',
+    sa.Column('id', sa.String(length=32), nullable=False),
+    sa.Column('name', sa.String(length=256), nullable=False),
+    sa.Column('filename', sa.String(length=512), nullable=False),
+    sa.Column('source_tool', sa.String(length=64), nullable=True),
+    sa.Column('row_count', sa.Integer(), nullable=False),
+    sa.Column('column_schema', sa.JSON(), nullable=True),
+    sa.Column('normalized_columns', sa.JSON(), nullable=True),
+    sa.Column('ioc_columns', sa.JSON(), nullable=True),
+    sa.Column('file_size_bytes', sa.Integer(), nullable=False),
+    sa.Column('encoding', sa.String(length=32), nullable=True),
+    sa.Column('delimiter', sa.String(length=4), nullable=True),
+    sa.Column('time_range_start', sa.DateTime(timezone=True), nullable=True),
+    sa.Column('time_range_end', sa.DateTime(timezone=True), nullable=True),
+    sa.Column('hunt_id', sa.String(length=32), nullable=True),
+    sa.Column('uploaded_by', sa.String(length=32), nullable=True),
+    sa.Column('created_at', sa.DateTime(timezone=True), nullable=False),
+    sa.ForeignKeyConstraint(['hunt_id'], ['hunts.id'], ),
+    sa.PrimaryKeyConstraint('id')
+    )
+    with op.batch_alter_table('datasets', schema=None) as batch_op:
+        batch_op.create_index('ix_datasets_hunt', ['hunt_id'], unique=False)
+        batch_op.create_index(batch_op.f('ix_datasets_name'), ['name'], unique=False)
+
+    op.create_table('hypotheses',
+    sa.Column('id', sa.String(length=32), nullable=False),
+    sa.Column('hunt_id', sa.String(length=32), nullable=True),
+    sa.Column('title', sa.String(length=256), nullable=False),
+    sa.Column('description', sa.Text(), nullable=True),
+    sa.Column('mitre_technique', sa.String(length=32), nullable=True),
+    sa.Column('status', sa.String(length=16), nullable=False),
+    sa.Column('evidence_row_ids', sa.JSON(), nullable=True),
+    sa.Column('evidence_notes', sa.Text(), nullable=True),
+    sa.Column('created_at', sa.DateTime(timezone=True), nullable=False),
+    sa.Column('updated_at', sa.DateTime(timezone=True), nullable=False),
+    sa.ForeignKeyConstraint(['hunt_id'], ['hunts.id'], ),
+    sa.PrimaryKeyConstraint('id')
+    )
+    with op.batch_alter_table('hypotheses', schema=None) as batch_op:
+        batch_op.create_index('ix_hypotheses_hunt', ['hunt_id'], unique=False)
+
+    op.create_table('conversations',
+    sa.Column('id', sa.String(length=32), nullable=False),
+    sa.Column('title', sa.String(length=256), nullable=True),
+    sa.Column('hunt_id', sa.String(length=32), nullable=True),
+    sa.Column('dataset_id', sa.String(length=32), nullable=True),
+    sa.Column('created_at', sa.DateTime(timezone=True), nullable=False),
+    sa.Column('updated_at', sa.DateTime(timezone=True), nullable=False),
+    sa.ForeignKeyConstraint(['dataset_id'], ['datasets.id'], ),
+    sa.ForeignKeyConstraint(['hunt_id'], ['hunts.id'], ),
+    sa.PrimaryKeyConstraint('id')
+    )
+    op.create_table('dataset_rows',
+    sa.Column('id', sa.Integer(), autoincrement=True, nullable=False),
+    sa.Column('dataset_id', sa.String(length=32), nullable=False),
+    sa.Column('row_index', sa.Integer(), nullable=False),
+    sa.Column('data', sa.JSON(), nullable=False),
+    sa.Column('normalized_data', sa.JSON(), nullable=True),
+    sa.ForeignKeyConstraint(['dataset_id'], ['datasets.id'], ondelete='CASCADE'),
+    sa.PrimaryKeyConstraint('id')
+    )
+    with op.batch_alter_table('dataset_rows', schema=None) as batch_op:
+        batch_op.create_index('ix_dataset_rows_dataset', ['dataset_id'], unique=False)
+        batch_op.create_index('ix_dataset_rows_dataset_idx', ['dataset_id', 'row_index'], unique=False)
+
+    op.create_table('enrichment_results',
+    sa.Column('id', sa.String(length=32), nullable=False),
+    sa.Column('ioc_value', sa.String(length=512), nullable=False),
+    sa.Column('ioc_type', sa.String(length=32), nullable=False),
+    sa.Column('source', sa.String(length=32), nullable=False),
+    sa.Column('verdict', sa.String(length=16), nullable=True),
+    sa.Column('confidence', sa.Float(), nullable=True),
+    sa.Column('raw_result', sa.JSON(), nullable=True),
+    sa.Column('summary', sa.Text(), nullable=True),
+    sa.Column('dataset_id', sa.String(length=32), nullable=True),
+    sa.Column('cached_at', sa.DateTime(timezone=True), nullable=False),
+    sa.Column('expires_at', sa.DateTime(timezone=True), nullable=True),
+    sa.ForeignKeyConstraint(['dataset_id'], ['datasets.id'], ),
+    sa.PrimaryKeyConstraint('id')
+    )
+    with op.batch_alter_table('enrichment_results', schema=None) as batch_op:
+        batch_op.create_index('ix_enrichment_ioc_source', ['ioc_value', 'source'], unique=False)
+        batch_op.create_index(batch_op.f('ix_enrichment_results_ioc_value'), ['ioc_value'], unique=False)
+
+    op.create_table('annotations',
+    sa.Column('id', sa.String(length=32), nullable=False),
+    sa.Column('row_id', sa.Integer(), nullable=True),
+    sa.Column('dataset_id', sa.String(length=32), nullable=True),
+    sa.Column('author_id', sa.String(length=32), nullable=True),
+    sa.Column('text', sa.Text(), nullable=False),
+    sa.Column('severity', sa.String(length=16), nullable=False),
+    sa.Column('tag', sa.String(length=32), nullable=True),
+    sa.Column('highlight_color', sa.String(length=16), nullable=True),
+    sa.Column('created_at', sa.DateTime(timezone=True), nullable=False),
+    sa.Column('updated_at', sa.DateTime(timezone=True), nullable=False),
+    sa.ForeignKeyConstraint(['author_id'], ['users.id'], ),
+    sa.ForeignKeyConstraint(['dataset_id'], ['datasets.id'], ),
+    sa.ForeignKeyConstraint(['row_id'], ['dataset_rows.id'], ondelete='SET NULL'),
+    sa.PrimaryKeyConstraint('id')
+    )
+    with op.batch_alter_table('annotations', schema=None) as batch_op:
+        batch_op.create_index('ix_annotations_dataset', ['dataset_id'], unique=False)
+        batch_op.create_index('ix_annotations_row', ['row_id'], unique=False)
+
+    op.create_table('messages',
+    sa.Column('id', sa.Integer(), autoincrement=True, nullable=False),
+    sa.Column('conversation_id', sa.String(length=32), nullable=False),
+    sa.Column('role', sa.String(length=16), nullable=False),
+    sa.Column('content', sa.Text(), nullable=False),
+    sa.Column('model_used', sa.String(length=128), nullable=True),
+    sa.Column('node_used', sa.String(length=64), nullable=True),
+    sa.Column('token_count', sa.Integer(), nullable=True),
+    sa.Column('latency_ms', sa.Integer(), nullable=True),
+    sa.Column('response_meta', sa.JSON(), nullable=True),
+    sa.Column('created_at', sa.DateTime(timezone=True), nullable=False),
+    sa.ForeignKeyConstraint(['conversation_id'], ['conversations.id'], ondelete='CASCADE'),
+    sa.PrimaryKeyConstraint('id')
+    )
+    with op.batch_alter_table('messages', schema=None) as batch_op:
+        batch_op.create_index('ix_messages_conversation', ['conversation_id'], unique=False)
+
+    # ### end Alembic commands ###
+
+
+def downgrade() -> None:
+    """Downgrade schema."""
+    # ### commands auto generated by Alembic - please adjust! ###
+    with op.batch_alter_table('messages', schema=None) as batch_op:
+        batch_op.drop_index('ix_messages_conversation')
+
+    op.drop_table('messages')
+    with op.batch_alter_table('annotations', schema=None) as batch_op:
+        batch_op.drop_index('ix_annotations_row')
+        batch_op.drop_index('ix_annotations_dataset')
+
+    op.drop_table('annotations')
+    with op.batch_alter_table('enrichment_results', schema=None) as batch_op:
+        batch_op.drop_index(batch_op.f('ix_enrichment_results_ioc_value'))
+        batch_op.drop_index('ix_enrichment_ioc_source')
+
+    op.drop_table('enrichment_results')
+    with op.batch_alter_table('dataset_rows', schema=None) as batch_op:
+        batch_op.drop_index('ix_dataset_rows_dataset_idx')
+        batch_op.drop_index('ix_dataset_rows_dataset')
+
+    op.drop_table('dataset_rows')
+    op.drop_table('conversations')
+    with op.batch_alter_table('hypotheses', schema=None) as batch_op:
+        batch_op.drop_index('ix_hypotheses_hunt')
+
+    op.drop_table('hypotheses')
+    with op.batch_alter_table('datasets', schema=None) as batch_op:
+        batch_op.drop_index(batch_op.f('ix_datasets_name'))
+        batch_op.drop_index('ix_datasets_hunt')
+
+    op.drop_table('datasets')
+    op.drop_table('hunts')
+    with op.batch_alter_table('users', schema=None) as batch_op:
+        batch_op.drop_index(batch_op.f('ix_users_username'))
+
+    op.drop_table('users')
+    # ### end Alembic commands ###
--- a/backend/alembic/versions/98ab619418bc_add_keyword_themes_and_keywords_tables.py
+++ b/backend/alembic/versions/98ab619418bc_add_keyword_themes_and_keywords_tables.py
@@ -0,0 +1,64 @@
+"""add_keyword_themes_and_keywords_tables
+
+Revision ID: 98ab619418bc
+Revises: 9790f482da06
+Create Date: 2026-02-19 12:01:38.174653
+
+"""
+from typing import Sequence, Union
+
+from alembic import op
+import sqlalchemy as sa
+
+
+# revision identifiers, used by Alembic.
+revision: str = '98ab619418bc'
+down_revision: Union[str, Sequence[str], None] = '9790f482da06'
+branch_labels: Union[str, Sequence[str], None] = None
+depends_on: Union[str, Sequence[str], None] = None
+
+
+def upgrade() -> None:
+    """Upgrade schema."""
+    # ### commands auto generated by Alembic - please adjust! ###
+    op.create_table('keyword_themes',
+    sa.Column('id', sa.String(length=32), nullable=False),
+    sa.Column('name', sa.String(length=128), nullable=False),
+    sa.Column('color', sa.String(length=16), nullable=False),
+    sa.Column('enabled', sa.Boolean(), nullable=False),
+    sa.Column('is_builtin', sa.Boolean(), nullable=False),
+    sa.Column('created_at', sa.DateTime(timezone=True), nullable=False),
+    sa.PrimaryKeyConstraint('id')
+    )
+    with op.batch_alter_table('keyword_themes', schema=None) as batch_op:
+        batch_op.create_index(batch_op.f('ix_keyword_themes_name'), ['name'], unique=True)
+
+    op.create_table('keywords',
+    sa.Column('id', sa.Integer(), autoincrement=True, nullable=False),
+    sa.Column('theme_id', sa.String(length=32), nullable=False),
+    sa.Column('value', sa.String(length=256), nullable=False),
+    sa.Column('is_regex', sa.Boolean(), nullable=False),
+    sa.Column('created_at', sa.DateTime(timezone=True), nullable=False),
+    sa.ForeignKeyConstraint(['theme_id'], ['keyword_themes.id'], ondelete='CASCADE'),
+    sa.PrimaryKeyConstraint('id')
+    )
+    with op.batch_alter_table('keywords', schema=None) as batch_op:
+        batch_op.create_index('ix_keywords_theme', ['theme_id'], unique=False)
+        batch_op.create_index('ix_keywords_value', ['value'], unique=False)
+
+    # ### end Alembic commands ###
+
+
+def downgrade() -> None:
+    """Downgrade schema."""
+    # ### commands auto generated by Alembic - please adjust! ###
+    with op.batch_alter_table('keywords', schema=None) as batch_op:
+        batch_op.drop_index('ix_keywords_value')
+        batch_op.drop_index('ix_keywords_theme')
+
+    op.drop_table('keywords')
+    with op.batch_alter_table('keyword_themes', schema=None) as batch_op:
+        batch_op.drop_index(batch_op.f('ix_keyword_themes_name'))
+
+    op.drop_table('keyword_themes')
+    # ### end Alembic commands ###
--- a/backend/alembic/versions/a1b2c3d4e5f6_add_processing_status_ai_tables.py
+++ b/backend/alembic/versions/a1b2c3d4e5f6_add_processing_status_ai_tables.py
@@ -0,0 +1,112 @@
+"""add processing_status and AI analysis tables
+
+Revision ID: a1b2c3d4e5f6
+Revises: 98ab619418bc
+Create Date: 2026-02-19 18:00:00.000000
+
+"""
+from typing import Sequence, Union
+
+from alembic import op
+import sqlalchemy as sa
+
+
+revision: str = "a1b2c3d4e5f6"
+down_revision: Union[str, Sequence[str], None] = "98ab619418bc"
+branch_labels: Union[str, Sequence[str], None] = None
+depends_on: Union[str, Sequence[str], None] = None
+
+
+def upgrade() -> None:
+    # Add columns to datasets table
+    with op.batch_alter_table("datasets") as batch_op:
+        batch_op.add_column(sa.Column("processing_status", sa.String(20), server_default="ready"))
+        batch_op.add_column(sa.Column("artifact_type", sa.String(128), nullable=True))
+        batch_op.add_column(sa.Column("error_message", sa.Text(), nullable=True))
+        batch_op.add_column(sa.Column("file_path", sa.String(512), nullable=True))
+        batch_op.create_index("ix_datasets_status", ["processing_status"])
+
+    # Create triage_results table
+    op.create_table(
+        "triage_results",
+        sa.Column("id", sa.String(32), primary_key=True),
+        sa.Column("dataset_id", sa.String(32), sa.ForeignKey("datasets.id", ondelete="CASCADE"), nullable=False, index=True),
+        sa.Column("row_start", sa.Integer(), nullable=False),
+        sa.Column("row_end", sa.Integer(), nullable=False),
+        sa.Column("risk_score", sa.Float(), nullable=False, server_default="0.0"),
+        sa.Column("verdict", sa.String(20), nullable=False, server_default="pending"),
+        sa.Column("findings", sa.JSON(), nullable=True),
+        sa.Column("suspicious_indicators", sa.JSON(), nullable=True),
+        sa.Column("mitre_techniques", sa.JSON(), nullable=True),
+        sa.Column("model_used", sa.String(128), nullable=True),
+        sa.Column("node_used", sa.String(64), nullable=True),
+        sa.Column("created_at", sa.DateTime(timezone=True), server_default=sa.func.now()),
+    )
+
+    # Create host_profiles table
+    op.create_table(
+        "host_profiles",
+        sa.Column("id", sa.String(32), primary_key=True),
+        sa.Column("hunt_id", sa.String(32), sa.ForeignKey("hunts.id", ondelete="CASCADE"), nullable=False, index=True),
+        sa.Column("hostname", sa.String(256), nullable=False),
+        sa.Column("fqdn", sa.String(512), nullable=True),
+        sa.Column("client_id", sa.String(64), nullable=True),
+        sa.Column("risk_score", sa.Float(), nullable=False, server_default="0.0"),
+        sa.Column("risk_level", sa.String(20), nullable=False, server_default="unknown"),
+        sa.Column("artifact_summary", sa.JSON(), nullable=True),
+        sa.Column("timeline_summary", sa.Text(), nullable=True),
+        sa.Column("suspicious_findings", sa.JSON(), nullable=True),
+        sa.Column("mitre_techniques", sa.JSON(), nullable=True),
+        sa.Column("llm_analysis", sa.Text(), nullable=True),
+        sa.Column("model_used", sa.String(128), nullable=True),
+        sa.Column("node_used", sa.String(64), nullable=True),
+        sa.Column("created_at", sa.DateTime(timezone=True), server_default=sa.func.now()),
+        sa.Column("updated_at", sa.DateTime(timezone=True), server_default=sa.func.now()),
+    )
+
+    # Create hunt_reports table
+    op.create_table(
+        "hunt_reports",
+        sa.Column("id", sa.String(32), primary_key=True),
+        sa.Column("hunt_id", sa.String(32), sa.ForeignKey("hunts.id", ondelete="CASCADE"), nullable=False, index=True),
+        sa.Column("status", sa.String(20), nullable=False, server_default="pending"),
+        sa.Column("exec_summary", sa.Text(), nullable=True),
+        sa.Column("full_report", sa.Text(), nullable=True),
+        sa.Column("findings", sa.JSON(), nullable=True),
+        sa.Column("recommendations", sa.JSON(), nullable=True),
+        sa.Column("mitre_mapping", sa.JSON(), nullable=True),
+        sa.Column("ioc_table", sa.JSON(), nullable=True),
+        sa.Column("host_risk_summary", sa.JSON(), nullable=True),
+        sa.Column("models_used", sa.JSON(), nullable=True),
+        sa.Column("generation_time_ms", sa.Integer(), nullable=True),
+        sa.Column("created_at", sa.DateTime(timezone=True), server_default=sa.func.now()),
+        sa.Column("updated_at", sa.DateTime(timezone=True), server_default=sa.func.now()),
+    )
+
+    # Create anomaly_results table
+    op.create_table(
+        "anomaly_results",
+        sa.Column("id", sa.String(32), primary_key=True),
+        sa.Column("dataset_id", sa.String(32), sa.ForeignKey("datasets.id", ondelete="CASCADE"), nullable=False, index=True),
+        sa.Column("row_id", sa.String(32), sa.ForeignKey("dataset_rows.id", ondelete="CASCADE"), nullable=True),
+        sa.Column("anomaly_score", sa.Float(), nullable=False, server_default="0.0"),
+        sa.Column("distance_from_centroid", sa.Float(), nullable=True),
+        sa.Column("cluster_id", sa.Integer(), nullable=True),
+        sa.Column("is_outlier", sa.Boolean(), nullable=False, server_default="0"),
+        sa.Column("explanation", sa.Text(), nullable=True),
+        sa.Column("created_at", sa.DateTime(timezone=True), server_default=sa.func.now()),
+    )
+
+
+def downgrade() -> None:
+    op.drop_table("anomaly_results")
+    op.drop_table("hunt_reports")
+    op.drop_table("host_profiles")
+    op.drop_table("triage_results")
+
+    with op.batch_alter_table("datasets") as batch_op:
+        batch_op.drop_index("ix_datasets_status")
+        batch_op.drop_column("file_path")
+        batch_op.drop_column("error_message")
+        batch_op.drop_column("artifact_type")
+        batch_op.drop_column("processing_status")
--- a/backend/app.py
+++ b/backend/app.py
@@ -1,323 +0,0 @@
-import os
-import logging
-from datetime import datetime, timedelta
-from flask import Flask, request, jsonify, send_from_directory
-from flask_sqlalchemy import SQLAlchemy
-from flask_jwt_extended import JWTManager, jwt_required, create_access_token, get_jwt_identity
-from werkzeug.utils import secure_filename
-import bcrypt
-
-# Try to import flask-cors
-try:
-    from flask_cors import CORS
-    CORS_AVAILABLE = True
-except ImportError:
-    CORS_AVAILABLE = False
-    print("Warning: flask-cors not available")
-
-# Configure logging
-logging.basicConfig(level=logging.INFO)
-logger = logging.getLogger(__name__)
-
-app = Flask(__name__, static_folder="../frontend/dist")
-
-# Configuration
-app.config['SQLALCHEMY_DATABASE_URI'] = os.getenv('DATABASE_URL', 'postgresql://admin:secure_password_123@localhost:5432/threat_hunter')
-app.config['SQLALCHEMY_TRACK_MODIFICATIONS'] = False
-app.config['JWT_SECRET_KEY'] = os.getenv('SECRET_KEY', 'change-this-in-production')
-app.config['JWT_ACCESS_TOKEN_EXPIRES'] = timedelta(hours=24)
-app.config['MAX_CONTENT_LENGTH'] = 100 * 1024 * 1024
-app.config['UPLOAD_FOLDER'] = 'uploaded'
-app.config['OUTPUT_FOLDER'] = 'output'
-app.config['ALLOWED_EXTENSIONS'] = {'csv', 'json', 'txt', 'log'}
-
-# Ensure upload directories exist
-os.makedirs(app.config['UPLOAD_FOLDER'], exist_ok=True)
-os.makedirs(app.config['OUTPUT_FOLDER'], exist_ok=True)
-
-# Initialize extensions
-db = SQLAlchemy(app)
-jwt = JWTManager(app)
-
-# Enable CORS
-if CORS_AVAILABLE:
-    CORS(app)
-else:
-    @app.after_request
-    def after_request(response):
-        response.headers.add('Access-Control-Allow-Origin', '*')
-        response.headers.add('Access-Control-Allow-Headers', 'Content-Type,Authorization')
-        response.headers.add('Access-Control-Allow-Methods', 'GET,PUT,POST,DELETE')
-        return response
-
-def allowed_file(filename):
-    return '.' in filename and \
-           filename.rsplit('.', 1)[1].lower() in app.config['ALLOWED_EXTENSIONS']
-
-@app.errorhandler(RequestEntityTooLarge)
-def handle_file_too_large(e):
-    return jsonify({'error': 'File too large. Maximum size is 100MB.'}), 413
-
-@app.errorhandler(Exception)
-def handle_exception(e):
-    logger.error(f"Unhandled exception: {e}")
-    return jsonify({'error': 'Internal server error'}), 500
-
-# Database Models
-class User(db.Model):
-    __tablename__ = 'users'
-    id = db.Column(db.String, primary_key=True, default=lambda: str(uuid.uuid4()))
-    username = db.Column(db.String(50), unique=True, nullable=False)
-    email = db.Column(db.String(100), unique=True, nullable=False)
-    password_hash = db.Column(db.String(255), nullable=False)
-    created_at = db.Column(db.DateTime, default=datetime.utcnow)
-    last_login = db.Column(db.DateTime)
-
-class Hunt(db.Model):
-    __tablename__ = 'hunts'
-    id = db.Column(db.String, primary_key=True, default=lambda: str(uuid.uuid4()))
-    name = db.Column(db.String(100), nullable=False)
-    description = db.Column(db.Text)
-    created_by = db.Column(db.String, db.ForeignKey('users.id'))
-    created_at = db.Column(db.DateTime, default=datetime.utcnow)
-    updated_at = db.Column(db.DateTime, default=datetime.utcnow)
-    status = db.Column(db.String(20), default='active')
-
-# Authentication Routes
-@app.route('/api/auth/login', methods=['POST'])
-def login():
-    try:
-        data = request.get_json()
-        username = data.get('username')
-        password = data.get('password')
-        
-        user = User.query.filter_by(username=username).first()
-        
-        if user and bcrypt.checkpw(password.encode('utf-8'), user.password_hash.encode('utf-8')):
-            access_token = create_access_token(identity=user.id)
-            user.last_login = datetime.utcnow()
-            db.session.commit()
-            
-            return jsonify({
-                'access_token': access_token,
-                'user': {
-                    'id': user.id,
-                    'username': user.username,
-                    'email': user.email
-                }
-            })
-        else:
-            return jsonify({'error': 'Invalid credentials'}), 401
-            
-    except Exception as e:
-        logger.error(f"Login error: {e}")
-        return jsonify({'error': 'Login failed'}), 500
-
-@app.route('/api/auth/register', methods=['POST'])
-def register():
-    try:
-        data = request.get_json()
-        username = data.get('username')
-        email = data.get('email')
-        password = data.get('password')
-        
-        # Check if user exists
-        if User.query.filter_by(username=username).first():
-            return jsonify({'error': 'Username already exists'}), 400
-            
-        if User.query.filter_by(email=email).first():
-            return jsonify({'error': 'Email already exists'}), 400
-        
-        # Hash password
-        password_hash = bcrypt.hashpw(password.encode('utf-8'), bcrypt.gensalt()).decode('utf-8')
-        
-        # Create user
-        user = User(username=username, email=email, password_hash=password_hash)
-        db.session.add(user)
-        db.session.commit()
-        
-        access_token = create_access_token(identity=user.id)
-        
-        return jsonify({
-            'access_token': access_token,
-            'user': {
-                'id': user.id,
-                'username': user.username,
-                'email': user.email
-            }
-        })
-        
-    except Exception as e:
-        logger.error(f"Registration error: {e}")
-        return jsonify({'error': 'Registration failed'}), 500
-
-# Hunt Management Routes
-@app.route('/api/hunts', methods=['GET'])
-@jwt_required()
-def get_hunts():
-    try:
-        user_id = get_jwt_identity()
-        hunts = Hunt.query.filter_by(created_by=user_id).all()
-        
-        return jsonify({
-            'hunts': [{
-                'id': hunt.id,
-                'name': hunt.name,
-                'description': hunt.description,
-                'created_at': hunt.created_at.isoformat(),
-                'status': hunt.status
-            } for hunt in hunts]
-        })
-    except Exception as e:
-        logger.error(f"Get hunts error: {e}")
-        return jsonify({'error': 'Failed to fetch hunts'}), 500
-
-@app.route('/api/hunts', methods=['POST'])
-@jwt_required()
-def create_hunt():
-    try:
-        user_id = get_jwt_identity()
-        data = request.get_json()
-        
-        hunt = Hunt(
-            name=data.get('name'),
-            description=data.get('description'),
-            created_by=user_id
-        )
-        
-        db.session.add(hunt)
-        db.session.commit()
-        
-        return jsonify({
-            'hunt': {
-                'id': hunt.id,
-                'name': hunt.name,
-                'description': hunt.description,
-                'created_at': hunt.created_at.isoformat(),
-                'status': hunt.status
-            }
-        })
-    except Exception as e:
-        logger.error(f"Create hunt error: {e}")
-        return jsonify({'error': 'Failed to create hunt'}), 500
-
-# API Routes
-@app.route('/api/health')
-def health_check():
-    return jsonify({
-        'status': 'healthy',
-        'timestamp': datetime.utcnow().isoformat(),
-        'version': '1.0.0',
-        'service': 'Cyber Threat Hunter API'
-    })
-
-@app.route('/api/upload', methods=['POST'])
-def upload_file():
-    try:
-        if 'file' not in request.files:
-            return jsonify({'error': 'No file provided'}), 400
-        
-        file = request.files['file']
-        if file.filename == '':
-            return jsonify({'error': 'No file selected'}), 400
-        
-        if not allowed_file(file.filename):
-            return jsonify({'error': 'File type not allowed'}), 400
-        
-        filename = secure_filename(file.filename)
-        timestamp = datetime.now().strftime('%Y%m%d_%H%M%S')
-        filename = f"{timestamp}_{filename}"
-        filepath = os.path.join(app.config['UPLOAD_FOLDER'], filename)
-        
-        file.save(filepath)
-        
-        logger.info(f"File uploaded successfully: {filename}")
-        return jsonify({
-            'message': 'File uploaded successfully',
-            'filename': filename,
-            'size': os.path.getsize(filepath)
-        })
-    
-    except Exception as e:
-        logger.error(f"Upload error: {e}")
-        return jsonify({'error': 'Upload failed'}), 500
-
-@app.route('/api/files')
-def list_files():
-    try:
-        files = []
-        upload_dir = app.config['UPLOAD_FOLDER']
-        
-        if os.path.exists(upload_dir):
-            for filename in os.listdir(upload_dir):
-                filepath = os.path.join(upload_dir, filename)
-                if os.path.isfile(filepath):
-                    stat = os.stat(filepath)
-                    files.append({
-                        'name': filename,
-                        'size': stat.st_size,
-                        'modified': datetime.fromtimestamp(stat.st_mtime).isoformat()
-                    })
-        
-        return jsonify({'files': files})
-    
-    except Exception as e:
-        logger.error(f"List files error: {e}")
-        return jsonify({'error': 'Failed to list files'}), 500
-
-@app.route('/api/stats')
-def get_stats():
-    try:
-        upload_dir = app.config['UPLOAD_FOLDER']
-        files_count = 0
-        if os.path.exists(upload_dir):
-            files_count = len([f for f in os.listdir(upload_dir) if os.path.isfile(os.path.join(upload_dir, f))])
-        
-        return jsonify({
-            'filesUploaded': files_count,
-            'analysesCompleted': files_count,
-            'threatsDetected': 0
-        })
-    except Exception as e:
-        logger.error(f"Stats error: {e}")
-        return jsonify({'error': 'Failed to get stats'}), 500
-
-# Static file serving for React app
-@app.route("/assets/<path:path>")
-def send_assets(path):
-    return send_from_directory(os.path.join(app.static_folder, "assets"), path)
-
-@app.route("/")
-def index():
-    if os.path.exists(os.path.join(app.static_folder, "index.html")):
-        return send_from_directory(app.static_folder, "index.html")
-    else:
-        return jsonify({
-            'message': 'Cyber Threat Hunter API', 
-            'status': 'running',
-            'endpoints': [
-                'GET /api/health',
-                'POST /api/upload',
-                'GET /api/files',
-                'GET /api/stats'
-            ]
-        })
-
-# Catch-all route for React Router
-@app.route("/<path:path>")
-def catch_all(path):
-    if os.path.exists(os.path.join(app.static_folder, "index.html")):
-        return send_from_directory(app.static_folder, "index.html")
-    else:
-        return jsonify({'error': 'Frontend not built yet'})
-
-if __name__ == "__main__":
-    with app.app_context():
-        db.create_all()
-    
-    print("=" * 50)
-    print("Starting Cyber Threat Hunter Backend...")
-    print("API available at: http://localhost:5000")
-    print("Database: Connected to PostgreSQL")
-    print("=" * 50)
-    app.run(host="0.0.0.0", port=5000, debug=True)
--- a/backend/app/init.py
+++ b/backend/app/init.py
@@ -0,0 +1 @@
+"""Backend initialization."""
--- a/backend/app/agent/debate.py
+++ b/backend/app/agent/debate.py
@@ -0,0 +1,67 @@
+import asyncio
+
+async def debated_generate(provider, prompt: str) -> str:
+    """
+    Minimal behind-the-scenes debate.
+    Same logic for all apps.
+    Advisory only. No execution.
+    """
+
+    planner = f"""
+You are the Planner.
+Give structured advisory guidance only.
+No execution. No tools.
+
+Request:
+{prompt}
+"""
+
+    critic = f"""
+You are the Critic.
+Identify risks, missing steps, and assumptions.
+No execution. No tools.
+
+Request:
+{prompt}
+"""
+
+    pragmatist = f"""
+You are the Pragmatist.
+Suggest the safest and simplest approach.
+No execution. No tools.
+
+Request:
+{prompt}
+"""
+
+    planner_task = provider.generate(planner)
+    critic_task = provider.generate(critic)
+    prag_task = provider.generate(pragmatist)
+
+    planner_resp, critic_resp, prag_resp = await asyncio.gather(
+        planner_task, critic_task, prag_task
+    )
+
+    judge = f"""
+You are the Judge.
+
+Merge the three responses into ONE final advisory answer.
+
+Rules:
+- Advisory only
+- No execution
+- Clearly list risks and assumptions
+- Be concise
+
+Planner:
+{planner_resp}
+
+Critic:
+{critic_resp}
+
+Pragmatist:
+{prag_resp}
+"""
+
+    final = await provider.generate(judge)
+    return final
--- a/backend/app/agents/init.py
+++ b/backend/app/agents/init.py
@@ -0,0 +1,16 @@
+"""Analyst-assist agent module for ThreatHunt.
+
+Provides read-only guidance on CSV artifact data, analytical pivots, and hypotheses.
+Agents are advisory only and do not execute actions or modify data.
+"""
+
+from .core import ThreatHuntAgent
+from .providers import LLMProvider, LocalProvider, NetworkedProvider, OnlineProvider
+
+__all__ = [
+    "ThreatHuntAgent",
+    "LLMProvider",
+    "LocalProvider",
+    "NetworkedProvider",
+    "OnlineProvider",
+]
--- a/backend/app/agents/config.py
+++ b/backend/app/agents/config.py
@@ -0,0 +1,59 @@
+"""Configuration for agent settings."""
+
+import os
+from typing import Literal
+
+
+class AgentConfig:
+    """Configuration for analyst-assist agents."""
+
+    # Provider type: 'local', 'networked', 'online', or 'auto'
+    PROVIDER_TYPE: Literal["local", "networked", "online", "auto"] = os.getenv(
+        "THREAT_HUNT_AGENT_PROVIDER", "auto"
+    )
+
+    # Local provider settings
+    LOCAL_MODEL_PATH: str | None = os.getenv("THREAT_HUNT_LOCAL_MODEL_PATH")
+
+    # Networked provider settings
+    NETWORKED_ENDPOINT: str | None = os.getenv("THREAT_HUNT_NETWORKED_ENDPOINT")
+    NETWORKED_API_KEY: str | None = os.getenv("THREAT_HUNT_NETWORKED_KEY")
+
+    # Online provider settings
+    ONLINE_API_PROVIDER: str = os.getenv("THREAT_HUNT_ONLINE_PROVIDER", "openai")
+    ONLINE_API_KEY: str | None = os.getenv("THREAT_HUNT_ONLINE_API_KEY")
+    ONLINE_MODEL: str | None = os.getenv("THREAT_HUNT_ONLINE_MODEL")
+
+    # Agent behavior settings
+    MAX_RESPONSE_TOKENS: int = int(
+        os.getenv("THREAT_HUNT_AGENT_MAX_TOKENS", "1024")
+    )
+    ENABLE_REASONING: bool = os.getenv(
+        "THREAT_HUNT_AGENT_REASONING", "true"
+    ).lower() in ("true", "1", "yes")
+    CONVERSATION_HISTORY_LENGTH: int = int(
+        os.getenv("THREAT_HUNT_AGENT_HISTORY_LENGTH", "10")
+    )
+
+    # Privacy settings
+    FILTER_SENSITIVE_DATA: bool = os.getenv(
+        "THREAT_HUNT_AGENT_FILTER_SENSITIVE", "true"
+    ).lower() in ("true", "1", "yes")
+
+    @classmethod
+    def is_agent_enabled(cls) -> bool:
+        """Check if agent is enabled and properly configured."""
+        # Agent is disabled if no provider can be used
+        if cls.PROVIDER_TYPE == "auto":
+            return bool(
+                cls.LOCAL_MODEL_PATH
+                or cls.NETWORKED_ENDPOINT
+                or cls.ONLINE_API_KEY
+            )
+        elif cls.PROVIDER_TYPE == "local":
+            return bool(cls.LOCAL_MODEL_PATH)
+        elif cls.PROVIDER_TYPE == "networked":
+            return bool(cls.NETWORKED_ENDPOINT)
+        elif cls.PROVIDER_TYPE == "online":
+            return bool(cls.ONLINE_API_KEY)
+        return False
--- a/backend/app/agents/core.py
+++ b/backend/app/agents/core.py
@@ -0,0 +1,208 @@
+"""Core ThreatHunt analyst-assist agent.
+
+Provides read-only guidance on CSV artifact data, analytical pivots, and hypotheses.
+Agents are advisory only - no execution, no alerts, no data modifications.
+"""
+
+import logging
+from typing import Optional
+from pydantic import BaseModel, Field
+
+from .providers import LLMProvider, get_provider
+
+logger = logging.getLogger(__name__)
+
+
+class AgentContext(BaseModel):
+    """Context for agent guidance requests."""
+
+    query: str = Field(
+        ..., description="Analyst question or request for guidance"
+    )
+    dataset_name: Optional[str] = Field(None, description="Name of CSV dataset")
+    artifact_type: Optional[str] = Field(None, description="Artifact type (e.g., file, process, network)")
+    host_identifier: Optional[str] = Field(
+        None, description="Host name, IP, or identifier"
+    )
+    data_summary: Optional[str] = Field(
+        None, description="Brief description of uploaded data"
+    )
+    conversation_history: Optional[list[dict]] = Field(
+        default_factory=list, description="Previous messages in conversation"
+    )
+
+
+class AgentResponse(BaseModel):
+    """Response from analyst-assist agent."""
+
+    guidance: str = Field(..., description="Advisory guidance for analyst")
+    confidence: float = Field(
+        ..., ge=0.0, le=1.0, description="Confidence in guidance (0-1)"
+    )
+    suggested_pivots: list[str] = Field(
+        default_factory=list, description="Suggested analytical directions"
+    )
+    suggested_filters: list[str] = Field(
+        default_factory=list, description="Suggested data filters or queries"
+    )
+    caveats: Optional[str] = Field(
+        None, description="Assumptions, limitations, or caveats"
+    )
+    reasoning: Optional[str] = Field(
+        None, description="Explanation of how guidance was generated"
+    )
+
+
+class ThreatHuntAgent:
+    """Analyst-assist agent for ThreatHunt.
+    
+    Provides guidance on:
+    - Interpreting CSV artifact data
+    - Suggesting analytical pivots and filters
+    - Forming and testing hypotheses
+    
+    Policy:
+    - Advisory guidance only (no execution)
+    - No database or schema changes
+    - No alert escalation
+    - Transparent reasoning
+    """
+
+    def __init__(self, provider: Optional[LLMProvider] = None):
+        """Initialize agent with LLM provider.
+        
+        Args:
+            provider: LLM provider instance. If None, uses get_provider() with auto mode.
+        """
+        if provider is None:
+            try:
+                provider = get_provider("auto")
+            except RuntimeError as e:
+                logger.warning(f"Could not initialize default provider: {e}")
+                provider = None
+
+        self.provider = provider
+        self.system_prompt = self._build_system_prompt()
+
+    def _build_system_prompt(self) -> str:
+        """Build the system prompt that governs agent behavior."""
+        return """You are an analyst-assist agent for ThreatHunt, a threat hunting platform.
+
+Your role:
+- Interpret and explain CSV artifact data from Velociraptor
+- Suggest analytical pivots, filters, and hypotheses
+- Highlight anomalies, patterns, or points of interest
+- Guide analysts without replacing their judgment
+
+Your constraints:
+- You ONLY provide guidance and suggestions
+- You do NOT execute actions or tools
+- You do NOT modify data or escalate alerts
+- You do NOT make autonomous decisions
+- You ONLY analyze data presented to you
+- You explain your reasoning transparently
+- You acknowledge limitations and assumptions
+- You suggest next investigative steps
+
+When responding:
+1. Start with a clear, direct answer to the query
+2. Explain your reasoning based on the data context provided
+3. Suggest 2-4 analytical pivots the analyst might explore
+4. Suggest 2-4 data filters or queries that might be useful
+5. Include relevant caveats or assumptions
+6. Be honest about what you cannot determine from the data
+
+Remember: The analyst is the decision-maker. You are an assistant."""
+
+    async def assist(self, context: AgentContext) -> AgentResponse:
+        """Provide guidance on artifact data and analysis.
+        
+        Args:
+            context: Request context including query and data context.
+            
+        Returns:
+            Guidance response with suggestions and reasoning.
+            
+        Raises:
+            RuntimeError: If no provider is available.
+        """
+        if not self.provider:
+            raise RuntimeError(
+                "No LLM provider available. Configure at least one of: "
+                "THREAT_HUNT_LOCAL_MODEL_PATH, THREAT_HUNT_NETWORKED_ENDPOINT, "
+                "or THREAT_HUNT_ONLINE_API_KEY"
+            )
+
+        # Build prompt with context
+        prompt = self._build_prompt(context)
+
+        try:
+            # Get guidance from LLM provider
+            guidance = await self.provider.generate(prompt, max_tokens=1024)
+
+            # Parse response into structured format
+            response = self._parse_response(guidance, context)
+
+            logger.info(
+                f"Agent assisted with query: {context.query[:50]}... "
+                f"(dataset: {context.dataset_name})"
+            )
+
+            return response
+
+        except Exception as e:
+            logger.error(f"Error generating guidance: {e}")
+            raise
+
+    def _build_prompt(self, context: AgentContext) -> str:
+        """Build the prompt for the LLM."""
+        prompt_parts = [
+            f"Analyst query: {context.query}",
+        ]
+
+        if context.dataset_name:
+            prompt_parts.append(f"Dataset: {context.dataset_name}")
+
+        if context.artifact_type:
+            prompt_parts.append(f"Artifact type: {context.artifact_type}")
+
+        if context.host_identifier:
+            prompt_parts.append(f"Host: {context.host_identifier}")
+
+        if context.data_summary:
+            prompt_parts.append(f"Data summary: {context.data_summary}")
+
+        if context.conversation_history:
+            prompt_parts.append("\nConversation history:")
+            for msg in context.conversation_history[-5:]:  # Last 5 messages for context
+                prompt_parts.append(f"  {msg.get('role', 'unknown')}: {msg.get('content', '')}")
+
+        return "\n".join(prompt_parts)
+
+    def _parse_response(self, response_text: str, context: AgentContext) -> AgentResponse:
+        """Parse LLM response into structured format.
+        
+        Note: This is a simplified parser. In production, use structured output
+        from the LLM (JSON mode, function calling, etc.) for better reliability.
+        """
+        # For now, return a structured response based on the raw guidance
+        # In production, parse JSON or use structured output from LLM
+        return AgentResponse(
+            guidance=response_text,
+            confidence=0.8,  # Placeholder
+            suggested_pivots=[
+                "Analyze temporal patterns",
+                "Cross-reference with known indicators",
+                "Examine outliers in the dataset",
+                "Compare with baseline behavior",
+            ],
+            suggested_filters=[
+                "Filter by high-risk indicators",
+                "Sort by timestamp for timeline analysis",
+                "Group by host or user",
+                "Filter by anomaly score",
+            ],
+            caveats="Guidance is based on available data context. "
+            "Analysts should verify findings with additional sources.",
+            reasoning="Analysis generated based on artifact data patterns and analyst query.",
+        )
--- a/backend/app/agents/core_v2.py
+++ b/backend/app/agents/core_v2.py
@@ -0,0 +1,408 @@
+"""Core ThreatHunt analyst-assist agent — v2.
+
+Uses TaskRouter to select the right model/node for each query,
+real LLM providers (Ollama/OpenWebUI), and structured response parsing.
+Integrates SANS RAG context from Open WebUI.
+"""
+
+import json
+import logging
+import re
+import time
+from typing import AsyncIterator, Optional
+
+from pydantic import BaseModel, Field
+
+from app.config import settings
+from app.services.sans_rag import sans_rag
+from .router import TaskRouter, TaskType, RoutingDecision, task_router
+from .providers_v2 import OllamaProvider, OpenWebUIProvider
+
+logger = logging.getLogger(__name__)
+
+
+# ── Models ────────────────────────────────────────────────────────────
+
+
+class AgentContext(BaseModel):
+    """Context for agent guidance requests."""
+
+    query: str = Field(..., description="Analyst question or request for guidance")
+    dataset_name: Optional[str] = Field(None, description="Name of CSV dataset")
+    artifact_type: Optional[str] = Field(None, description="Artifact type")
+    host_identifier: Optional[str] = Field(None, description="Host name, IP, or identifier")
+    data_summary: Optional[str] = Field(None, description="Brief description of data")
+    conversation_history: Optional[list[dict]] = Field(
+        default_factory=list, description="Previous messages"
+    )
+    active_hypotheses: Optional[list[str]] = Field(
+        default_factory=list, description="Active investigation hypotheses"
+    )
+    annotations_summary: Optional[str] = Field(
+        None, description="Summary of analyst annotations"
+    )
+    enrichment_summary: Optional[str] = Field(
+        None, description="Summary of enrichment results"
+    )
+    mode: str = Field(default="quick", description="quick | deep | debate")
+    model_override: Optional[str] = Field(None, description="Force a specific model")
+
+
+class Perspective(BaseModel):
+    """A single perspective from the debate agent."""
+    role: str
+    content: str
+    model_used: str
+    node_used: str
+    latency_ms: int
+
+
+class AgentResponse(BaseModel):
+    """Response from analyst-assist agent."""
+
+    guidance: str = Field(..., description="Advisory guidance for analyst")
+    confidence: float = Field(..., ge=0.0, le=1.0, description="Confidence (0-1)")
+    suggested_pivots: list[str] = Field(default_factory=list)
+    suggested_filters: list[str] = Field(default_factory=list)
+    caveats: Optional[str] = None
+    reasoning: Optional[str] = None
+    sans_references: list[str] = Field(
+        default_factory=list, description="SANS course references"
+    )
+    model_used: str = Field(default="", description="Model that generated the response")
+    node_used: str = Field(default="", description="Node that processed the request")
+    latency_ms: int = Field(default=0, description="Total latency in ms")
+    perspectives: Optional[list[Perspective]] = Field(
+        None, description="Debate perspectives (only in debate mode)"
+    )
+
+
+# ── System prompt ─────────────────────────────────────────────────────
+
+SYSTEM_PROMPT = """You are an analyst-assist agent for ThreatHunt, a threat hunting platform.
+You have access to 300GB of SANS cybersecurity course material for reference.
+
+Your role:
+- Interpret and explain CSV artifact data from Velociraptor and other forensic tools
+- Suggest analytical pivots, filters, and hypotheses
+- Highlight anomalies, patterns, or points of interest
+- Reference relevant SANS methodologies and techniques when applicable
+- Guide analysts without replacing their judgment
+
+Your constraints:
+- You ONLY provide guidance and suggestions
+- You do NOT execute actions or tools
+- You do NOT modify data or escalate alerts
+- You explain your reasoning transparently
+
+RESPONSE FORMAT — you MUST respond with valid JSON:
+{
+  "guidance": "Your main guidance text here",
+  "confidence": 0.85,
+  "suggested_pivots": ["Pivot 1", "Pivot 2"],
+  "suggested_filters": ["filter expression 1", "filter expression 2"],
+  "caveats": "Any assumptions or limitations",
+  "reasoning": "How you arrived at this guidance",
+  "sans_references": ["SANS SEC504: ...", "SANS FOR508: ..."]
+}
+
+Respond ONLY with the JSON object. No markdown, no code fences, no extra text."""
+
+
+# ── Agent ─────────────────────────────────────────────────────────────
+
+
+class ThreatHuntAgent:
+    """Analyst-assist agent backed by Wile + Roadrunner LLM cluster."""
+
+    def __init__(self, router: TaskRouter | None = None):
+        self.router = router or task_router
+        self.system_prompt = SYSTEM_PROMPT
+
+    async def assist(self, context: AgentContext) -> AgentResponse:
+        """Provide guidance on artifact data and analysis."""
+        start = time.monotonic()
+
+        if context.mode == "debate":
+            return await self._debate_assist(context)
+
+        # Classify task and route
+        task_type = self.router.classify_task(context.query)
+        if context.mode == "deep":
+            task_type = TaskType.DEEP_ANALYSIS
+
+        decision = self.router.route(task_type, model_override=context.model_override)
+        logger.info(f"Routing: {decision.reason}")
+
+        # Enrich prompt with SANS RAG context
+        prompt = self._build_prompt(context)
+        try:
+            rag_context = await sans_rag.enrich_prompt(
+                context.query,
+                investigation_context=context.data_summary or "",
+            )
+            if rag_context:
+                prompt = f"{prompt}\n\n{rag_context}"
+        except Exception as e:
+            logger.warning(f"SANS RAG enrichment failed: {e}")
+
+        # Call LLM
+        provider = self.router.get_provider(decision)
+        if isinstance(provider, OpenWebUIProvider):
+            messages = [
+                {"role": "system", "content": self.system_prompt},
+                {"role": "user", "content": prompt},
+            ]
+            result = await provider.chat(
+                messages,
+                max_tokens=settings.AGENT_MAX_TOKENS,
+                temperature=settings.AGENT_TEMPERATURE,
+            )
+        else:
+            result = await provider.generate(
+                prompt,
+                system=self.system_prompt,
+                max_tokens=settings.AGENT_MAX_TOKENS,
+                temperature=settings.AGENT_TEMPERATURE,
+            )
+
+        raw_text = result.get("response", "")
+        latency_ms = result.get("_latency_ms", 0)
+
+        # Parse structured response
+        response = self._parse_response(raw_text, context)
+        response.model_used = decision.model
+        response.node_used = decision.node.value
+        response.latency_ms = latency_ms
+
+        total_ms = int((time.monotonic() - start) * 1000)
+        logger.info(
+            f"Agent assist: {context.query[:60]}... → "
+            f"{decision.model} on {decision.node.value} "
+            f"({total_ms}ms total, {latency_ms}ms LLM)"
+        )
+
+        return response
+
+    async def assist_stream(
+        self,
+        context: AgentContext,
+    ) -> AsyncIterator[str]:
+        """Stream agent response tokens."""
+        task_type = self.router.classify_task(context.query)
+        decision = self.router.route(task_type, model_override=context.model_override)
+        prompt = self._build_prompt(context)
+
+        provider = self.router.get_provider(decision)
+        if isinstance(provider, OllamaProvider):
+            async for token in provider.generate_stream(
+                prompt,
+                system=self.system_prompt,
+                max_tokens=settings.AGENT_MAX_TOKENS,
+                temperature=settings.AGENT_TEMPERATURE,
+            ):
+                yield token
+        elif isinstance(provider, OpenWebUIProvider):
+            messages = [
+                {"role": "system", "content": self.system_prompt},
+                {"role": "user", "content": prompt},
+            ]
+            async for token in provider.chat_stream(
+                messages,
+                max_tokens=settings.AGENT_MAX_TOKENS,
+                temperature=settings.AGENT_TEMPERATURE,
+            ):
+                yield token
+
+    async def _debate_assist(self, context: AgentContext) -> AgentResponse:
+        """Multi-perspective analysis using diverse models on Wile."""
+        import asyncio
+
+        start = time.monotonic()
+        prompt = self._build_prompt(context)
+
+        # Route each perspective to a different heavy model
+        roles = {
+            TaskType.DEBATE_PLANNER: (
+                "Planner",
+                "You are the Planner for a threat hunting investigation.\n"
+                "Provide a structured investigation strategy. Reference SANS methodologies.\n"
+                "Focus on: investigation steps, data sources to examine, MITRE ATT&CK mapping.\n"
+                "Be specific to the data context provided.\n\n",
+            ),
+            TaskType.DEBATE_CRITIC: (
+                "Critic",
+                "You are the Critic for a threat hunting investigation.\n"
+                "Identify risks, false positive scenarios, missing evidence, and assumptions.\n"
+                "Reference SANS training on common analyst mistakes.\n"
+                "Challenge the obvious interpretation.\n\n",
+            ),
+            TaskType.DEBATE_PRAGMATIST: (
+                "Pragmatist",
+                "You are the Pragmatist for a threat hunting investigation.\n"
+                "Suggest the most actionable, efficient next steps.\n"
+                "Reference SANS incident response playbooks.\n"
+                "Focus on: quick wins, triage priorities, what to escalate.\n\n",
+            ),
+        }
+
+        async def _call_perspective(task_type: TaskType, role_name: str, prefix: str):
+            decision = self.router.route(task_type)
+            provider = self.router.get_provider(decision)
+            full_prompt = prefix + prompt
+
+            if isinstance(provider, OpenWebUIProvider):
+                result = await provider.generate(
+                    full_prompt,
+                    system=f"You are the {role_name}. Provide analysis only. No execution.",
+                    max_tokens=settings.AGENT_MAX_TOKENS,
+                    temperature=0.4,
+                )
+            else:
+                result = await provider.generate(
+                    full_prompt,
+                    system=f"You are the {role_name}. Provide analysis only. No execution.",
+                    max_tokens=settings.AGENT_MAX_TOKENS,
+                    temperature=0.4,
+                )
+
+            return Perspective(
+                role=role_name,
+                content=result.get("response", ""),
+                model_used=decision.model,
+                node_used=decision.node.value,
+                latency_ms=result.get("_latency_ms", 0),
+            )
+
+        # Run perspectives in parallel
+        perspective_tasks = [
+            _call_perspective(tt, name, prefix)
+            for tt, (name, prefix) in roles.items()
+        ]
+        perspectives = await asyncio.gather(*perspective_tasks)
+
+        # Judge merges the perspectives
+        judge_prompt = (
+            "You are the Judge. Merge these three threat hunting perspectives into "
+            "ONE final advisory answer.\n\n"
+            "Rules:\n"
+            "- Advisory only — no execution\n"
+            "- Clearly list risks and assumptions\n"
+            "- Highlight where perspectives agree and disagree\n"
+            "- Provide a unified recommendation\n"
+            "- Reference SANS methodologies where relevant\n\n"
+        )
+        for p in perspectives:
+            judge_prompt += f"=== {p.role} (via {p.model_used}) ===\n{p.content}\n\n"
+
+        judge_prompt += (
+            f"\nOriginal analyst query:\n{context.query}\n\n"
+            "Respond with the merged analysis in this JSON format:\n"
+            '{"guidance": "...", "confidence": 0.85, "suggested_pivots": [...], '
+            '"suggested_filters": [...], "caveats": "...", "reasoning": "...", '
+            '"sans_references": [...]}'
+        )
+
+        judge_decision = self.router.route(TaskType.DEBATE_JUDGE)
+        judge_provider = self.router.get_provider(judge_decision)
+
+        if isinstance(judge_provider, OpenWebUIProvider):
+            judge_result = await judge_provider.generate(
+                judge_prompt,
+                system="You are the Judge. Merge perspectives into a final advisory answer. Respond with JSON only.",
+                max_tokens=settings.AGENT_MAX_TOKENS,
+                temperature=0.2,
+            )
+        else:
+            judge_result = await judge_provider.generate(
+                judge_prompt,
+                system="You are the Judge. Merge perspectives into a final advisory answer. Respond with JSON only.",
+                max_tokens=settings.AGENT_MAX_TOKENS,
+                temperature=0.2,
+            )
+
+        raw_text = judge_result.get("response", "")
+        response = self._parse_response(raw_text, context)
+        response.model_used = judge_decision.model
+        response.node_used = judge_decision.node.value
+        response.latency_ms = int((time.monotonic() - start) * 1000)
+        response.perspectives = list(perspectives)
+
+        return response
+
+    def _build_prompt(self, context: AgentContext) -> str:
+        """Build the prompt with all available context."""
+        parts = [f"Analyst query: {context.query}"]
+
+        if context.dataset_name:
+            parts.append(f"Dataset: {context.dataset_name}")
+        if context.artifact_type:
+            parts.append(f"Artifact type: {context.artifact_type}")
+        if context.host_identifier:
+            parts.append(f"Host: {context.host_identifier}")
+        if context.data_summary:
+            parts.append(f"Data summary: {context.data_summary}")
+        if context.active_hypotheses:
+            parts.append(f"Active hypotheses: {'; '.join(context.active_hypotheses)}")
+        if context.annotations_summary:
+            parts.append(f"Analyst annotations: {context.annotations_summary}")
+        if context.enrichment_summary:
+            parts.append(f"Enrichment data: {context.enrichment_summary}")
+        if context.conversation_history:
+            parts.append("\nRecent conversation:")
+            for msg in context.conversation_history[-settings.AGENT_HISTORY_LENGTH:]:
+                parts.append(f"  {msg.get('role', 'unknown')}: {msg.get('content', '')[:500]}")
+
+        return "\n".join(parts)
+
+    def _parse_response(self, raw: str, context: AgentContext) -> AgentResponse:
+        """Parse LLM output into structured AgentResponse.
+
+        Tries JSON extraction first, falls back to raw text with defaults.
+        """
+        parsed = self._try_parse_json(raw)
+        if parsed:
+            return AgentResponse(
+                guidance=parsed.get("guidance", raw),
+                confidence=min(max(float(parsed.get("confidence", 0.7)), 0.0), 1.0),
+                suggested_pivots=parsed.get("suggested_pivots", [])[:6],
+                suggested_filters=parsed.get("suggested_filters", [])[:6],
+                caveats=parsed.get("caveats"),
+                reasoning=parsed.get("reasoning"),
+                sans_references=parsed.get("sans_references", []),
+            )
+
+        # Fallback: use raw text as guidance
+        return AgentResponse(
+            guidance=raw.strip() or "No guidance generated. Please try rephrasing your question.",
+            confidence=0.5,
+            suggested_pivots=[],
+            suggested_filters=[],
+            caveats="Response was not in structured format. Pivots and filters may be embedded in the guidance text.",
+            reasoning=None,
+            sans_references=[],
+        )
+
+    def _try_parse_json(self, text: str) -> dict | None:
+        """Try to extract JSON from LLM output."""
+        # Direct parse
+        try:
+            return json.loads(text.strip())
+        except json.JSONDecodeError:
+            pass
+
+        # Extract from code fences
+        patterns = [
+            r"```json\s*(.*?)\s*```",
+            r"```\s*(.*?)\s*```",
+            r"\{[^{}]*(?:\{[^{}]*\}[^{}]*)*\}",
+        ]
+        for pattern in patterns:
+            match = re.search(pattern, text, re.DOTALL)
+            if match:
+                try:
+                    return json.loads(match.group(1) if match.lastindex else match.group(0))
+                except (json.JSONDecodeError, IndexError):
+                    continue
+
+        return None
--- a/backend/app/agents/providers.py
+++ b/backend/app/agents/providers.py
@@ -0,0 +1,190 @@
+"""Pluggable LLM provider interface for analyst-assist agents.
+
+Supports three provider types:
+- Local: On-device or on-prem models
+- Networked: Shared internal inference services
+- Online: External hosted APIs
+"""
+
+import os
+from abc import ABC, abstractmethod
+from typing import Optional
+
+
+class LLMProvider(ABC):
+    """Abstract base class for LLM providers."""
+
+    @abstractmethod
+    async def generate(self, prompt: str, max_tokens: int = 1024) -> str:
+        """Generate a response from the LLM.
+        
+        Args:
+            prompt: The input prompt
+            max_tokens: Maximum tokens in response
+            
+        Returns:
+            Generated text response
+        """
+        pass
+
+    @abstractmethod
+    def is_available(self) -> bool:
+        """Check if provider backend is available."""
+        pass
+
+
+class LocalProvider(LLMProvider):
+    """Local LLM provider (on-device or on-prem models)."""
+
+    def __init__(self, model_path: Optional[str] = None):
+        """Initialize local provider.
+        
+        Args:
+            model_path: Path to local model. If None, uses THREAT_HUNT_LOCAL_MODEL_PATH env var.
+        """
+        self.model_path = model_path or os.getenv("THREAT_HUNT_LOCAL_MODEL_PATH")
+        self.model = None
+
+    def is_available(self) -> bool:
+        """Check if local model is available."""
+        if not self.model_path:
+            return False
+        # In production, would verify model file exists and can be loaded
+        return os.path.exists(str(self.model_path))
+
+    async def generate(self, prompt: str, max_tokens: int = 1024) -> str:
+        """Generate response using local model.
+        
+        Note: This is a placeholder. In production, integrate with:
+        - llama-cpp-python for GGML models
+        - Ollama API
+        - vLLM
+        - Other local inference engines
+        """
+        if not self.is_available():
+            raise RuntimeError("Local model not available")
+
+        # Placeholder implementation
+        return f"[Local model response to: {prompt[:50]}...]"
+
+
+class NetworkedProvider(LLMProvider):
+    """Networked LLM provider (shared internal inference services)."""
+
+    def __init__(
+        self,
+        api_endpoint: Optional[str] = None,
+        api_key: Optional[str] = None,
+        model_name: str = "default",
+    ):
+        """Initialize networked provider.
+        
+        Args:
+            api_endpoint: URL to inference service. Defaults to env var THREAT_HUNT_NETWORKED_ENDPOINT.
+            api_key: API key for service. Defaults to env var THREAT_HUNT_NETWORKED_KEY.
+            model_name: Model name/ID on the service.
+        """
+        self.api_endpoint = api_endpoint or os.getenv("THREAT_HUNT_NETWORKED_ENDPOINT")
+        self.api_key = api_key or os.getenv("THREAT_HUNT_NETWORKED_KEY")
+        self.model_name = model_name
+
+    def is_available(self) -> bool:
+        """Check if networked service is available."""
+        return bool(self.api_endpoint)
+
+    async def generate(self, prompt: str, max_tokens: int = 1024) -> str:
+        """Generate response using networked service.
+        
+        Note: This is a placeholder. In production, integrate with:
+        - Internal inference service API
+        - LLM inference container cluster
+        - Enterprise inference gateway
+        """
+        if not self.is_available():
+            raise RuntimeError("Networked service not available")
+
+        # Placeholder implementation
+        return f"[Networked response from {self.model_name}: {prompt[:50]}...]"
+
+
+class OnlineProvider(LLMProvider):
+    """Online LLM provider (external hosted APIs)."""
+
+    def __init__(
+        self,
+        api_provider: str = "openai",
+        api_key: Optional[str] = None,
+        model_name: Optional[str] = None,
+    ):
+        """Initialize online provider.
+        
+        Args:
+            api_provider: Provider name (openai, anthropic, google, etc.)
+            api_key: API key. Defaults to env var THREAT_HUNT_ONLINE_API_KEY.
+            model_name: Model name. Defaults to env var THREAT_HUNT_ONLINE_MODEL.
+        """
+        self.api_provider = api_provider
+        self.api_key = api_key or os.getenv("THREAT_HUNT_ONLINE_API_KEY")
+        self.model_name = model_name or os.getenv(
+            "THREAT_HUNT_ONLINE_MODEL", f"{api_provider}-default"
+        )
+
+    def is_available(self) -> bool:
+        """Check if online API is available."""
+        return bool(self.api_key)
+
+    async def generate(self, prompt: str, max_tokens: int = 1024) -> str:
+        """Generate response using online API.
+        
+        Note: This is a placeholder. In production, integrate with:
+        - OpenAI API (GPT-3.5, GPT-4, etc.)
+        - Anthropic Claude API
+        - Google Gemini API
+        - Other hosted LLM services
+        """
+        if not self.is_available():
+            raise RuntimeError("Online API not available or API key not set")
+
+        # Placeholder implementation
+        return f"[Online {self.api_provider} response: {prompt[:50]}...]"
+
+
+def get_provider(provider_type: str = "auto") -> LLMProvider:
+    """Get an LLM provider based on configuration.
+    
+    Args:
+        provider_type: Type of provider to use: 'local', 'networked', 'online', or 'auto'.
+                      'auto' attempts to use the first available provider in order:
+                      local -> networked -> online.
+    
+    Returns:
+        Configured LLM provider instance.
+        
+    Raises:
+        RuntimeError: If no provider is available.
+    """
+    # Explicit provider selection
+    if provider_type == "local":
+        provider = LocalProvider()
+    elif provider_type == "networked":
+        provider = NetworkedProvider()
+    elif provider_type == "online":
+        provider = OnlineProvider()
+    elif provider_type == "auto":
+        # Try providers in order of preference
+        for Provider in [LocalProvider, NetworkedProvider, OnlineProvider]:
+            provider = Provider()
+            if provider.is_available():
+                return provider
+        raise RuntimeError(
+            "No LLM provider available. Configure at least one of: "
+            "THREAT_HUNT_LOCAL_MODEL_PATH, THREAT_HUNT_NETWORKED_ENDPOINT, "
+            "or THREAT_HUNT_ONLINE_API_KEY"
+        )
+    else:
+        raise ValueError(f"Unknown provider type: {provider_type}")
+
+    if not provider.is_available():
+        raise RuntimeError(f"{provider_type} provider not available")
+
+    return provider
--- a/backend/app/agents/providers_v2.py
+++ b/backend/app/agents/providers_v2.py
@@ -0,0 +1,362 @@
+"""LLM providers — real implementations for Ollama nodes and Open WebUI cluster.
+
+Three providers:
+- OllamaProvider: Direct calls to Ollama on Wile/Roadrunner via Tailscale
+- OpenWebUIProvider: Calls to the Open WebUI cluster (OpenAI-compatible)
+- EmbeddingProvider: Embedding generation via Ollama /api/embeddings
+"""
+
+import asyncio
+import json
+import logging
+import time
+from typing import AsyncIterator
+
+import httpx
+
+from app.config import settings
+from .registry import ModelEntry, Node
+
+logger = logging.getLogger(__name__)
+
+# Shared HTTP client with reasonable timeouts
+_client: httpx.AsyncClient | None = None
+
+
+def _get_client() -> httpx.AsyncClient:
+    global _client
+    if _client is None or _client.is_closed:
+        _client = httpx.AsyncClient(
+            timeout=httpx.Timeout(connect=10, read=300, write=30, pool=10),
+            limits=httpx.Limits(max_connections=20, max_keepalive_connections=10),
+        )
+    return _client
+
+
+async def cleanup_client():
+    global _client
+    if _client and not _client.is_closed:
+        await _client.aclose()
+        _client = None
+
+
+def _ollama_url(node: Node) -> str:
+    """Get the Ollama base URL for a node."""
+    if node == Node.WILE:
+        return settings.wile_url
+    elif node == Node.ROADRUNNER:
+        return settings.roadrunner_url
+    else:
+        raise ValueError(f"No direct Ollama URL for node: {node}")
+
+
+# ── Ollama Provider ──────────────────────────────────────────────────
+
+
+class OllamaProvider:
+    """Direct Ollama API calls to Wile or Roadrunner."""
+
+    def __init__(self, model: str, node: Node):
+        self.model = model
+        self.node = node
+        self.base_url = _ollama_url(node)
+
+    async def generate(
+        self,
+        prompt: str,
+        system: str = "",
+        max_tokens: int = 2048,
+        temperature: float = 0.3,
+    ) -> dict:
+        """Generate a completion. Returns dict with 'response', 'model', 'total_duration', etc."""
+        client = _get_client()
+        payload = {
+            "model": self.model,
+            "prompt": prompt,
+            "stream": False,
+            "options": {
+                "num_predict": max_tokens,
+                "temperature": temperature,
+            },
+        }
+        if system:
+            payload["system"] = system
+
+        start = time.monotonic()
+        try:
+            resp = await client.post(
+                f"{self.base_url}/api/generate",
+                json=payload,
+            )
+            resp.raise_for_status()
+            data = resp.json()
+            latency_ms = int((time.monotonic() - start) * 1000)
+            data["_latency_ms"] = latency_ms
+            data["_node"] = self.node.value
+            logger.info(
+                f"Ollama [{self.node.value}] {self.model}: "
+                f"{latency_ms}ms, {data.get('eval_count', '?')} tokens"
+            )
+            return data
+        except httpx.HTTPStatusError as e:
+            logger.error(f"Ollama HTTP error [{self.node.value}]: {e.response.status_code} {e.response.text[:200]}")
+            raise
+        except httpx.ConnectError as e:
+            logger.error(f"Cannot reach Ollama on {self.node.value} ({self.base_url}): {e}")
+            raise
+
+    async def chat(
+        self,
+        messages: list[dict],
+        max_tokens: int = 2048,
+        temperature: float = 0.3,
+    ) -> dict:
+        """Chat completion via Ollama /api/chat."""
+        client = _get_client()
+        payload = {
+            "model": self.model,
+            "messages": messages,
+            "stream": False,
+            "options": {
+                "num_predict": max_tokens,
+                "temperature": temperature,
+            },
+        }
+
+        start = time.monotonic()
+        resp = await client.post(f"{self.base_url}/api/chat", json=payload)
+        resp.raise_for_status()
+        data = resp.json()
+        data["_latency_ms"] = int((time.monotonic() - start) * 1000)
+        data["_node"] = self.node.value
+        return data
+
+    async def generate_stream(
+        self,
+        prompt: str,
+        system: str = "",
+        max_tokens: int = 2048,
+        temperature: float = 0.3,
+    ) -> AsyncIterator[str]:
+        """Stream tokens from Ollama."""
+        client = _get_client()
+        payload = {
+            "model": self.model,
+            "prompt": prompt,
+            "stream": True,
+            "options": {
+                "num_predict": max_tokens,
+                "temperature": temperature,
+            },
+        }
+        if system:
+            payload["system"] = system
+
+        async with client.stream(
+            "POST", f"{self.base_url}/api/generate", json=payload
+        ) as resp:
+            resp.raise_for_status()
+            async for line in resp.aiter_lines():
+                if line.strip():
+                    try:
+                        chunk = json.loads(line)
+                        token = chunk.get("response", "")
+                        if token:
+                            yield token
+                        if chunk.get("done"):
+                            break
+                    except json.JSONDecodeError:
+                        continue
+
+    async def is_available(self) -> bool:
+        """Ping the Ollama node."""
+        try:
+            client = _get_client()
+            resp = await client.get(f"{self.base_url}/api/tags", timeout=5)
+            return resp.status_code == 200
+        except Exception:
+            return False
+
+
+# ── Open WebUI Provider (OpenAI-compatible) ───────────────────────────
+
+
+class OpenWebUIProvider:
+    """Calls to Open WebUI cluster at ai.guapo613.beer.
+
+    Uses the OpenAI-compatible /v1/chat/completions endpoint.
+    """
+
+    def __init__(self, model: str = ""):
+        self.model = model or settings.DEFAULT_FAST_MODEL
+        self.base_url = settings.OPENWEBUI_URL.rstrip("/")
+        self.api_key = settings.OPENWEBUI_API_KEY
+
+    def _headers(self) -> dict:
+        h = {"Content-Type": "application/json"}
+        if self.api_key:
+            h["Authorization"] = f"Bearer {self.api_key}"
+        return h
+
+    async def chat(
+        self,
+        messages: list[dict],
+        max_tokens: int = 2048,
+        temperature: float = 0.3,
+    ) -> dict:
+        """Chat completion via OpenAI-compatible endpoint."""
+        client = _get_client()
+        payload = {
+            "model": self.model,
+            "messages": messages,
+            "max_tokens": max_tokens,
+            "temperature": temperature,
+            "stream": False,
+        }
+
+        start = time.monotonic()
+        resp = await client.post(
+            f"{self.base_url}/v1/chat/completions",
+            json=payload,
+            headers=self._headers(),
+        )
+        resp.raise_for_status()
+        data = resp.json()
+        latency_ms = int((time.monotonic() - start) * 1000)
+
+        # Normalize to our format
+        content = ""
+        if data.get("choices"):
+            content = data["choices"][0].get("message", {}).get("content", "")
+
+        result = {
+            "response": content,
+            "model": data.get("model", self.model),
+            "_latency_ms": latency_ms,
+            "_node": "cluster",
+            "_usage": data.get("usage", {}),
+        }
+        logger.info(
+            f"OpenWebUI cluster {self.model}: {latency_ms}ms"
+        )
+        return result
+
+    async def generate(
+        self,
+        prompt: str,
+        system: str = "",
+        max_tokens: int = 2048,
+        temperature: float = 0.3,
+    ) -> dict:
+        """Convert prompt-style call to chat format."""
+        messages = []
+        if system:
+            messages.append({"role": "system", "content": system})
+        messages.append({"role": "user", "content": prompt})
+        return await self.chat(messages, max_tokens, temperature)
+
+    async def chat_stream(
+        self,
+        messages: list[dict],
+        max_tokens: int = 2048,
+        temperature: float = 0.3,
+    ) -> AsyncIterator[str]:
+        """Stream tokens from OpenWebUI."""
+        client = _get_client()
+        payload = {
+            "model": self.model,
+            "messages": messages,
+            "max_tokens": max_tokens,
+            "temperature": temperature,
+            "stream": True,
+        }
+
+        async with client.stream(
+            "POST",
+            f"{self.base_url}/v1/chat/completions",
+            json=payload,
+            headers=self._headers(),
+        ) as resp:
+            resp.raise_for_status()
+            async for line in resp.aiter_lines():
+                if line.startswith("data: "):
+                    data_str = line[6:].strip()
+                    if data_str == "[DONE]":
+                        break
+                    try:
+                        chunk = json.loads(data_str)
+                        delta = chunk.get("choices", [{}])[0].get("delta", {})
+                        token = delta.get("content", "")
+                        if token:
+                            yield token
+                    except json.JSONDecodeError:
+                        continue
+
+    async def is_available(self) -> bool:
+        """Check if Open WebUI is reachable."""
+        try:
+            client = _get_client()
+            resp = await client.get(
+                f"{self.base_url}/v1/models",
+                headers=self._headers(),
+                timeout=5,
+            )
+            return resp.status_code == 200
+        except Exception:
+            return False
+
+
+# ── Embedding Provider ────────────────────────────────────────────────
+
+
+class EmbeddingProvider:
+    """Generate embeddings via Ollama /api/embeddings."""
+
+    def __init__(self, model: str = "", node: Node = Node.ROADRUNNER):
+        self.model = model or settings.DEFAULT_EMBEDDING_MODEL
+        self.node = node
+        self.base_url = _ollama_url(node)
+
+    async def embed(self, text: str) -> list[float]:
+        """Get embedding vector for a single text."""
+        client = _get_client()
+        resp = await client.post(
+            f"{self.base_url}/api/embeddings",
+            json={"model": self.model, "prompt": text},
+        )
+        resp.raise_for_status()
+        data = resp.json()
+        return data.get("embedding", [])
+
+    async def embed_batch(self, texts: list[str], concurrency: int = 5) -> list[list[float]]:
+        """Embed multiple texts with controlled concurrency."""
+        sem = asyncio.Semaphore(concurrency)
+
+        async def _embed_one(t: str) -> list[float]:
+            async with sem:
+                return await self.embed(t)
+
+        return await asyncio.gather(*[_embed_one(t) for t in texts])
+
+
+# ── Health check for all nodes ────────────────────────────────────────
+
+
+async def check_all_nodes() -> dict:
+    """Check availability of all LLM nodes."""
+    wile = OllamaProvider("", Node.WILE)
+    roadrunner = OllamaProvider("", Node.ROADRUNNER)
+    cluster = OpenWebUIProvider()
+
+    wile_ok, rr_ok, cl_ok = await asyncio.gather(
+        wile.is_available(),
+        roadrunner.is_available(),
+        cluster.is_available(),
+        return_exceptions=True,
+    )
+
+    return {
+        "wile": {"available": wile_ok is True, "url": settings.wile_url},
+        "roadrunner": {"available": rr_ok is True, "url": settings.roadrunner_url},
+        "cluster": {"available": cl_ok is True, "url": settings.OPENWEBUI_URL},
+    }
--- a/backend/app/agents/registry.py
+++ b/backend/app/agents/registry.py
@@ -0,0 +1,161 @@
+"""Model registry — inventory of all Ollama models across Wile and Roadrunner.
+
+Each model is tagged with capabilities (chat, code, vision, embedding) and
+performance tier (fast, medium, heavy) for the TaskRouter.
+"""
+
+from dataclasses import dataclass, field
+from enum import Enum
+
+
+class Capability(str, Enum):
+    CHAT = "chat"
+    CODE = "code"
+    VISION = "vision"
+    EMBEDDING = "embedding"
+
+
+class Tier(str, Enum):
+    FAST = "fast"        # < 15B params — quick responses
+    MEDIUM = "medium"    # 15–40B params — balanced
+    HEAVY = "heavy"      # 40B+ params — deep analysis
+
+
+class Node(str, Enum):
+    WILE = "wile"
+    ROADRUNNER = "roadrunner"
+    CLUSTER = "cluster"  # Open WebUI balances across both
+
+
+@dataclass
+class ModelEntry:
+    name: str
+    node: Node
+    capabilities: list[Capability]
+    tier: Tier
+    param_size: str = ""  # e.g. "7b", "70b"
+    notes: str = ""
+
+
+# ── Roadrunner (100.110.190.11) ──────────────────────────────────────
+
+ROADRUNNER_MODELS: list[ModelEntry] = [
+    # General / chat
+    ModelEntry("llama3.1:latest", Node.ROADRUNNER, [Capability.CHAT], Tier.FAST, "8b"),
+    ModelEntry("qwen2.5:14b-instruct", Node.ROADRUNNER, [Capability.CHAT], Tier.FAST, "14b"),
+    ModelEntry("mistral:7b-instruct", Node.ROADRUNNER, [Capability.CHAT], Tier.FAST, "7b"),
+    ModelEntry("mistral:7b", Node.ROADRUNNER, [Capability.CHAT], Tier.FAST, "7b"),
+    ModelEntry("qwen2.5:7b", Node.ROADRUNNER, [Capability.CHAT], Tier.FAST, "7b"),
+    ModelEntry("phi3:medium", Node.ROADRUNNER, [Capability.CHAT], Tier.MEDIUM, "14b"),
+    # Code
+    ModelEntry("qwen2.5-coder:7b", Node.ROADRUNNER, [Capability.CODE], Tier.FAST, "7b"),
+    ModelEntry("qwen2.5-coder:latest", Node.ROADRUNNER, [Capability.CODE], Tier.FAST, "7b"),
+    ModelEntry("codestral:latest", Node.ROADRUNNER, [Capability.CODE], Tier.MEDIUM, "22b"),
+    ModelEntry("codellama:13b", Node.ROADRUNNER, [Capability.CODE], Tier.FAST, "13b"),
+    # Vision
+    ModelEntry("llama3.2-vision:11b", Node.ROADRUNNER, [Capability.VISION], Tier.FAST, "11b"),
+    ModelEntry("minicpm-v:latest", Node.ROADRUNNER, [Capability.VISION], Tier.FAST, "8b"),
+    ModelEntry("llava:13b", Node.ROADRUNNER, [Capability.VISION], Tier.FAST, "13b"),
+    # Embeddings
+    ModelEntry("bge-m3:latest", Node.ROADRUNNER, [Capability.EMBEDDING], Tier.FAST, "0.6b"),
+    ModelEntry("nomic-embed-text:latest", Node.ROADRUNNER, [Capability.EMBEDDING], Tier.FAST, "0.1b"),
+    # Heavy
+    ModelEntry("llama3.1:70b-instruct-q4_K_M", Node.ROADRUNNER, [Capability.CHAT], Tier.HEAVY, "70b"),
+]
+
+# ── Wile (100.110.190.12) ────────────────────────────────────────────
+
+WILE_MODELS: list[ModelEntry] = [
+    # General / chat
+    ModelEntry("llama3.1:latest", Node.WILE, [Capability.CHAT], Tier.FAST, "8b"),
+    ModelEntry("llama3:latest", Node.WILE, [Capability.CHAT], Tier.FAST, "8b"),
+    ModelEntry("gemma2:27b", Node.WILE, [Capability.CHAT], Tier.MEDIUM, "27b"),
+    # Code
+    ModelEntry("qwen2.5-coder:7b", Node.WILE, [Capability.CODE], Tier.FAST, "7b"),
+    ModelEntry("qwen2.5-coder:latest", Node.WILE, [Capability.CODE], Tier.FAST, "7b"),
+    ModelEntry("qwen2.5-coder:32b", Node.WILE, [Capability.CODE], Tier.MEDIUM, "32b"),
+    ModelEntry("deepseek-coder:33b", Node.WILE, [Capability.CODE], Tier.MEDIUM, "33b"),
+    ModelEntry("codestral:latest", Node.WILE, [Capability.CODE], Tier.MEDIUM, "22b"),
+    # Vision
+    ModelEntry("llava:13b", Node.WILE, [Capability.VISION], Tier.FAST, "13b"),
+    # Embeddings
+    ModelEntry("bge-m3:latest", Node.WILE, [Capability.EMBEDDING], Tier.FAST, "0.6b"),
+    # Heavy
+    ModelEntry("llama3.1:70b", Node.WILE, [Capability.CHAT], Tier.HEAVY, "70b"),
+    ModelEntry("llama3.1:70b-instruct-q4_K_M", Node.WILE, [Capability.CHAT], Tier.HEAVY, "70b"),
+    ModelEntry("llama3.1:70b-instruct-q5_K_M", Node.WILE, [Capability.CHAT], Tier.HEAVY, "70b"),
+    ModelEntry("mixtral:8x22b-instruct", Node.WILE, [Capability.CHAT], Tier.HEAVY, "141b"),
+    ModelEntry("qwen2:72b-instruct", Node.WILE, [Capability.CHAT], Tier.HEAVY, "72b"),
+]
+
+ALL_MODELS = ROADRUNNER_MODELS + WILE_MODELS
+
+
+class ModelRegistry:
+    """Registry of all available models and their capabilities."""
+
+    def __init__(self, models: list[ModelEntry] | None = None):
+        self.models = models or ALL_MODELS
+        self._by_name: dict[str, list[ModelEntry]] = {}
+        self._by_capability: dict[Capability, list[ModelEntry]] = {}
+        self._by_node: dict[Node, list[ModelEntry]] = {}
+        self._index()
+
+    def _index(self):
+        for m in self.models:
+            self._by_name.setdefault(m.name, []).append(m)
+            for cap in m.capabilities:
+                self._by_capability.setdefault(cap, []).append(m)
+            self._by_node.setdefault(m.node, []).append(m)
+
+    def find(
+        self,
+        capability: Capability | None = None,
+        tier: Tier | None = None,
+        node: Node | None = None,
+    ) -> list[ModelEntry]:
+        """Find models matching all given criteria."""
+        results = list(self.models)
+        if capability:
+            results = [m for m in results if capability in m.capabilities]
+        if tier:
+            results = [m for m in results if m.tier == tier]
+        if node:
+            results = [m for m in results if m.node == node]
+        return results
+
+    def get_best(
+        self,
+        capability: Capability,
+        prefer_tier: Tier | None = None,
+        prefer_node: Node | None = None,
+    ) -> ModelEntry | None:
+        """Get the best model for a capability, with optional preference."""
+        candidates = self.find(capability=capability, tier=prefer_tier, node=prefer_node)
+        if not candidates:
+            candidates = self.find(capability=capability, tier=prefer_tier)
+        if not candidates:
+            candidates = self.find(capability=capability)
+        return candidates[0] if candidates else None
+
+    def list_nodes(self) -> list[Node]:
+        return list(self._by_node.keys())
+
+    def list_models_on_node(self, node: Node) -> list[ModelEntry]:
+        return self._by_node.get(node, [])
+
+    def to_dict(self) -> list[dict]:
+        return [
+            {
+                "name": m.name,
+                "node": m.node.value,
+                "capabilities": [c.value for c in m.capabilities],
+                "tier": m.tier.value,
+                "param_size": m.param_size,
+            }
+            for m in self.models
+        ]
+
+
+# Singleton
+registry = ModelRegistry()
--- a/backend/app/agents/router.py
+++ b/backend/app/agents/router.py
@@ -0,0 +1,183 @@
+"""Task router — auto-selects the right model + node for each task type.
+
+Routes based on task characteristics:
+- Quick chat → fast models via cluster
+- Deep analysis → 70B+ models on Wile
+- Code/script analysis → code models (32b on Wile, 7b for quick)
+- Vision/image → vision models on Roadrunner
+- Embedding → embedding models on either node
+"""
+
+import logging
+from dataclasses import dataclass
+from enum import Enum
+
+from app.config import settings
+from .registry import Capability, Tier, Node, ModelEntry, registry
+from .providers_v2 import OllamaProvider, OpenWebUIProvider, EmbeddingProvider
+
+logger = logging.getLogger(__name__)
+
+
+class TaskType(str, Enum):
+    QUICK_CHAT = "quick_chat"
+    DEEP_ANALYSIS = "deep_analysis"
+    CODE_ANALYSIS = "code_analysis"
+    VISION = "vision"
+    EMBEDDING = "embedding"
+    DEBATE_PLANNER = "debate_planner"
+    DEBATE_CRITIC = "debate_critic"
+    DEBATE_PRAGMATIST = "debate_pragmatist"
+    DEBATE_JUDGE = "debate_judge"
+
+
+@dataclass
+class RoutingDecision:
+    """Result of the routing decision."""
+    model: str
+    node: Node
+    task_type: TaskType
+    provider_type: str  # "ollama" or "openwebui"
+    reason: str
+
+
+class TaskRouter:
+    """Routes tasks to the appropriate model and node."""
+
+    # Default routing rules: task_type → (capability, preferred_tier, preferred_node)
+    ROUTING_RULES: dict[TaskType, tuple[Capability, Tier | None, Node | None]] = {
+        TaskType.QUICK_CHAT: (Capability.CHAT, Tier.FAST, None),
+        TaskType.DEEP_ANALYSIS: (Capability.CHAT, Tier.HEAVY, Node.WILE),
+        TaskType.CODE_ANALYSIS: (Capability.CODE, Tier.MEDIUM, Node.WILE),
+        TaskType.VISION: (Capability.VISION, None, Node.ROADRUNNER),
+        TaskType.EMBEDDING: (Capability.EMBEDDING, Tier.FAST, None),
+        TaskType.DEBATE_PLANNER: (Capability.CHAT, Tier.HEAVY, Node.WILE),
+        TaskType.DEBATE_CRITIC: (Capability.CHAT, Tier.HEAVY, Node.WILE),
+        TaskType.DEBATE_PRAGMATIST: (Capability.CHAT, Tier.HEAVY, Node.WILE),
+        TaskType.DEBATE_JUDGE: (Capability.CHAT, Tier.MEDIUM, Node.WILE),
+    }
+
+    # Specific model overrides for debate roles (use diverse models for diversity of thought)
+    DEBATE_MODEL_OVERRIDES: dict[TaskType, str] = {
+        TaskType.DEBATE_PLANNER: "llama3.1:70b-instruct-q4_K_M",
+        TaskType.DEBATE_CRITIC: "qwen2:72b-instruct",
+        TaskType.DEBATE_PRAGMATIST: "mixtral:8x22b-instruct",
+        TaskType.DEBATE_JUDGE: "gemma2:27b",
+    }
+
+    def __init__(self):
+        self.registry = registry
+
+    def route(self, task_type: TaskType, model_override: str | None = None) -> RoutingDecision:
+        """Decide which model and node to use for a task."""
+
+        # Explicit model override
+        if model_override:
+            entries = self.registry.find()
+            for entry in entries:
+                if entry.name == model_override:
+                    return RoutingDecision(
+                        model=model_override,
+                        node=entry.node,
+                        task_type=task_type,
+                        provider_type="ollama",
+                        reason=f"Explicit model override: {model_override}",
+                    )
+            # Model not in registry — try via cluster
+            return RoutingDecision(
+                model=model_override,
+                node=Node.CLUSTER,
+                task_type=task_type,
+                provider_type="openwebui",
+                reason=f"Override model {model_override} not in registry, routing to cluster",
+            )
+
+        # Debate model overrides
+        if task_type in self.DEBATE_MODEL_OVERRIDES:
+            model_name = self.DEBATE_MODEL_OVERRIDES[task_type]
+            entries = self.registry.find()
+            for entry in entries:
+                if entry.name == model_name:
+                    return RoutingDecision(
+                        model=model_name,
+                        node=entry.node,
+                        task_type=task_type,
+                        provider_type="ollama",
+                        reason=f"Debate role {task_type.value} → {model_name} on {entry.node.value}",
+                    )
+
+        # Standard routing
+        cap, tier, node = self.ROUTING_RULES.get(
+            task_type,
+            (Capability.CHAT, Tier.FAST, None),
+        )
+
+        entry = self.registry.get_best(cap, prefer_tier=tier, prefer_node=node)
+        if entry:
+            return RoutingDecision(
+                model=entry.name,
+                node=entry.node,
+                task_type=task_type,
+                provider_type="ollama",
+                reason=f"Auto-routed {task_type.value}: {cap.value}/{tier.value if tier else 'any'} → {entry.name} on {entry.node.value}",
+            )
+
+        # Fallback to cluster
+        default_model = settings.DEFAULT_FAST_MODEL
+        return RoutingDecision(
+            model=default_model,
+            node=Node.CLUSTER,
+            task_type=task_type,
+            provider_type="openwebui",
+            reason=f"No registry match, falling back to cluster with {default_model}",
+        )
+
+    def get_provider(self, decision: RoutingDecision):
+        """Create the appropriate provider for a routing decision."""
+        if decision.provider_type == "openwebui":
+            return OpenWebUIProvider(model=decision.model)
+        else:
+            return OllamaProvider(model=decision.model, node=decision.node)
+
+    def get_embedding_provider(self, model: str | None = None, node: Node | None = None) -> EmbeddingProvider:
+        """Get an embedding provider."""
+        return EmbeddingProvider(
+            model=model or settings.DEFAULT_EMBEDDING_MODEL,
+            node=node or Node.ROADRUNNER,
+        )
+
+    def classify_task(self, query: str, has_image: bool = False) -> TaskType:
+        """Heuristic classification of query into task type.
+
+        In practice this could be enhanced by a classifier model, but
+        keyword heuristics work well for routing.
+        """
+        if has_image:
+            return TaskType.VISION
+
+        q = query.lower()
+
+        # Code/script indicators
+        code_indicators = [
+            "deobfuscate", "decode", "powershell", "script", "base64",
+            "command line", "cmdline", "commandline", "obfuscated",
+            "malware", "shellcode", "vbs", "vbscript", "batch",
+            "python script", "code review", "reverse engineer",
+        ]
+        if any(ind in q for ind in code_indicators):
+            return TaskType.CODE_ANALYSIS
+
+        # Deep analysis indicators
+        deep_indicators = [
+            "deep analysis", "detailed", "comprehensive", "thorough",
+            "investigate", "root cause", "advanced", "explain in detail",
+            "full analysis", "forensic",
+        ]
+        if any(ind in q for ind in deep_indicators):
+            return TaskType.DEEP_ANALYSIS
+
+        return TaskType.QUICK_CHAT
+
+
+# Singleton
+task_router = TaskRouter()
--- a/backend/app/api/init.py
+++ b/backend/app/api/init.py
@@ -0,0 +1 @@
+"""API routes initialization."""
--- a/backend/app/api/routes/init.py
+++ b/backend/app/api/routes/init.py
@@ -0,0 +1 @@
+"""API route modules."""
--- a/backend/app/api/routes/agent.py
+++ b/backend/app/api/routes/agent.py
@@ -0,0 +1,170 @@
+"""API routes for analyst-assist agent."""
+
+import logging
+from fastapi import APIRouter, HTTPException
+from pydantic import BaseModel, Field
+
+from app.agents.core import ThreatHuntAgent, AgentContext, AgentResponse
+from app.agents.config import AgentConfig
+
+logger = logging.getLogger(__name__)
+
+router = APIRouter(prefix="/api/agent", tags=["agent"])
+
+# Global agent instance (lazy-loaded)
+_agent: ThreatHuntAgent | None = None
+
+
+def get_agent() -> ThreatHuntAgent:
+    """Get or create the agent instance."""
+    global _agent
+    if _agent is None:
+        if not AgentConfig.is_agent_enabled():
+            raise HTTPException(
+                status_code=503,
+                detail="Analyst-assist agent is not configured. "
+                "Please configure an LLM provider.",
+            )
+        _agent = ThreatHuntAgent()
+    return _agent
+
+
+class AssistRequest(BaseModel):
+    """Request for agent assistance."""
+
+    query: str = Field(
+        ..., description="Analyst question or request for guidance"
+    )
+    dataset_name: str | None = Field(
+        None, description="Name of CSV dataset being analyzed"
+    )
+    artifact_type: str | None = Field(
+        None, description="Type of artifact (e.g., FileList, ProcessList, NetworkConnections)"
+    )
+    host_identifier: str | None = Field(
+        None, description="Host name, IP address, or identifier"
+    )
+    data_summary: str | None = Field(
+        None, description="Brief summary or context about the uploaded data"
+    )
+    conversation_history: list[dict] | None = Field(
+        None, description="Previous messages for context"
+    )
+
+
+class AssistResponse(BaseModel):
+    """Response with agent guidance."""
+
+    guidance: str
+    confidence: float
+    suggested_pivots: list[str]
+    suggested_filters: list[str]
+    caveats: str | None = None
+    reasoning: str | None = None
+
+
+@router.post(
+    "/assist",
+    response_model=AssistResponse,
+    summary="Get analyst-assist guidance",
+    description="Request guidance on CSV artifact data, analytical pivots, and hypotheses. "
+    "Agent provides advisory guidance only - no execution.",
+)
+async def agent_assist(request: AssistRequest) -> AssistResponse:
+    """Provide analyst-assist guidance on artifact data.
+
+    The agent will:
+    - Explain and interpret the provided data context
+    - Suggest analytical pivots the analyst might explore
+    - Suggest data filters or queries that might be useful
+    - Highlight assumptions, limitations, and caveats
+
+    The agent will NOT:
+    - Execute any tools or actions
+    - Escalate findings to alerts
+    - Modify any data or schema
+    - Make autonomous decisions
+
+    Args:
+        request: Assistance request with query and context
+
+    Returns:
+        Guidance response with suggestions and reasoning
+
+    Raises:
+        HTTPException: If agent is not configured (503) or request fails
+    """
+    try:
+        agent = get_agent()
+
+        # Build context
+        context = AgentContext(
+            query=request.query,
+            dataset_name=request.dataset_name,
+            artifact_type=request.artifact_type,
+            host_identifier=request.host_identifier,
+            data_summary=request.data_summary,
+            conversation_history=request.conversation_history or [],
+        )
+
+        # Get guidance
+        response = await agent.assist(context)
+
+        logger.info(
+            f"Agent assisted analyst with query: {request.query[:50]}... "
+            f"(host: {request.host_identifier}, artifact: {request.artifact_type})"
+        )
+
+        return AssistResponse(
+            guidance=response.guidance,
+            confidence=response.confidence,
+            suggested_pivots=response.suggested_pivots,
+            suggested_filters=response.suggested_filters,
+            caveats=response.caveats,
+            reasoning=response.reasoning,
+        )
+
+    except RuntimeError as e:
+        logger.error(f"Agent error: {e}")
+        raise HTTPException(
+            status_code=503,
+            detail=f"Agent unavailable: {str(e)}",
+        )
+    except Exception as e:
+        logger.exception(f"Unexpected error in agent_assist: {e}")
+        raise HTTPException(
+            status_code=500,
+            detail="Error generating guidance. Please try again.",
+        )
+
+
+@router.get(
+    "/health",
+    summary="Check agent health",
+    description="Check if agent is configured and ready to assist.",
+)
+async def agent_health() -> dict:
+    """Check agent availability and configuration.
+
+    Returns:
+        Health status with configuration details
+    """
+    try:
+        agent = get_agent()
+        provider_type = agent.provider.__class__.__name__ if agent.provider else "None"
+        return {
+            "status": "healthy",
+            "provider": provider_type,
+            "max_tokens": AgentConfig.MAX_RESPONSE_TOKENS,
+            "reasoning_enabled": AgentConfig.ENABLE_REASONING,
+        }
+    except HTTPException:
+        return {
+            "status": "unavailable",
+            "reason": "No LLM provider configured",
+            "configured_providers": {
+                "local": bool(AgentConfig.LOCAL_MODEL_PATH),
+                "networked": bool(AgentConfig.NETWORKED_ENDPOINT),
+                "online": bool(AgentConfig.ONLINE_API_KEY),
+            },
+        }
--- a/backend/app/api/routes/agent_v2.py
+++ b/backend/app/api/routes/agent_v2.py
@@ -0,0 +1,265 @@
+"""API routes for analyst-assist agent — v2.
+
+Supports quick, deep, and debate modes with streaming.
+Conversations are persisted to the database.
+"""
+
+import json
+import logging
+
+from fastapi import APIRouter, Depends, HTTPException, Query
+from fastapi.responses import StreamingResponse
+from pydantic import BaseModel, Field
+from sqlalchemy.ext.asyncio import AsyncSession
+
+from app.config import settings
+from app.db import get_db
+from app.db.models import Conversation, Message
+from app.agents.core_v2 import ThreatHuntAgent, AgentContext, AgentResponse, Perspective
+from app.agents.providers_v2 import check_all_nodes
+from app.agents.registry import registry
+from app.services.sans_rag import sans_rag
+
+logger = logging.getLogger(__name__)
+
+router = APIRouter(prefix="/api/agent", tags=["agent"])
+
+# Global agent instance
+_agent: ThreatHuntAgent | None = None
+
+
+def get_agent() -> ThreatHuntAgent:
+    global _agent
+    if _agent is None:
+        _agent = ThreatHuntAgent()
+    return _agent
+
+
+# ── Request / Response models ─────────────────────────────────────────
+
+
+class AssistRequest(BaseModel):
+    query: str = Field(..., max_length=4000, description="Analyst question")
+    dataset_name: str | None = None
+    artifact_type: str | None = None
+    host_identifier: str | None = None
+    data_summary: str | None = None
+    conversation_history: list[dict] | None = None
+    active_hypotheses: list[str] | None = None
+    annotations_summary: str | None = None
+    enrichment_summary: str | None = None
+    mode: str = Field(default="quick", description="quick | deep | debate")
+    model_override: str | None = None
+    conversation_id: str | None = Field(None, description="Persist messages to this conversation")
+    hunt_id: str | None = None
+
+
+class AssistResponseModel(BaseModel):
+    guidance: str
+    confidence: float
+    suggested_pivots: list[str]
+    suggested_filters: list[str]
+    caveats: str | None = None
+    reasoning: str | None = None
+    sans_references: list[str] = []
+    model_used: str = ""
+    node_used: str = ""
+    latency_ms: int = 0
+    perspectives: list[dict] | None = None
+    conversation_id: str | None = None
+
+
+# ── Routes ────────────────────────────────────────────────────────────
+
+
+@router.post(
+    "/assist",
+    response_model=AssistResponseModel,
+    summary="Get analyst-assist guidance",
+    description="Request guidance with auto-routed model selection. "
+    "Supports quick (fast), deep (70B), and debate (multi-model) modes.",
+)
+async def agent_assist(
+    request: AssistRequest,
+    db: AsyncSession = Depends(get_db),
+) -> AssistResponseModel:
+    try:
+        agent = get_agent()
+        context = AgentContext(
+            query=request.query,
+            dataset_name=request.dataset_name,
+            artifact_type=request.artifact_type,
+            host_identifier=request.host_identifier,
+            data_summary=request.data_summary,
+            conversation_history=request.conversation_history or [],
+            active_hypotheses=request.active_hypotheses or [],
+            annotations_summary=request.annotations_summary,
+            enrichment_summary=request.enrichment_summary,
+            mode=request.mode,
+            model_override=request.model_override,
+        )
+
+        response = await agent.assist(context)
+
+        # Persist conversation
+        conv_id = request.conversation_id
+        if conv_id or request.hunt_id:
+            conv_id = await _persist_conversation(
+                db, conv_id, request, response
+            )
+
+        return AssistResponseModel(
+            guidance=response.guidance,
+            confidence=response.confidence,
+            suggested_pivots=response.suggested_pivots,
+            suggested_filters=response.suggested_filters,
+            caveats=response.caveats,
+            reasoning=response.reasoning,
+            sans_references=response.sans_references,
+            model_used=response.model_used,
+            node_used=response.node_used,
+            latency_ms=response.latency_ms,
+            perspectives=[
+                {
+                    "role": p.role,
+                    "content": p.content,
+                    "model_used": p.model_used,
+                    "node_used": p.node_used,
+                    "latency_ms": p.latency_ms,
+                }
+                for p in response.perspectives
+            ] if response.perspectives else None,
+            conversation_id=conv_id,
+        )
+
+    except Exception as e:
+        logger.exception(f"Agent error: {e}")
+        raise HTTPException(status_code=500, detail=f"Agent error: {str(e)}")
+
+
+@router.post(
+    "/assist/stream",
+    summary="Stream agent response",
+    description="Stream tokens via SSE for real-time display.",
+)
+async def agent_assist_stream(request: AssistRequest):
+    agent = get_agent()
+    context = AgentContext(
+        query=request.query,
+        dataset_name=request.dataset_name,
+        artifact_type=request.artifact_type,
+        host_identifier=request.host_identifier,
+        data_summary=request.data_summary,
+        conversation_history=request.conversation_history or [],
+        mode="quick",  # streaming only supports quick mode
+    )
+
+    async def _stream():
+        async for token in agent.assist_stream(context):
+            yield f"data: {json.dumps({'token': token})}\n\n"
+        yield "data: [DONE]\n\n"
+
+    return StreamingResponse(
+        _stream(),
+        media_type="text/event-stream",
+        headers={
+            "Cache-Control": "no-cache",
+            "Connection": "keep-alive",
+            "X-Accel-Buffering": "no",
+        },
+    )
+
+
+@router.get(
+    "/health",
+    summary="Check agent and node health",
+    description="Returns availability of all LLM nodes and the cluster.",
+)
+async def agent_health() -> dict:
+    nodes = await check_all_nodes()
+    rag_health = await sans_rag.health_check()
+    return {
+        "status": "healthy",
+        "nodes": nodes,
+        "rag": rag_health,
+        "default_models": {
+            "fast": settings.DEFAULT_FAST_MODEL,
+            "heavy": settings.DEFAULT_HEAVY_MODEL,
+            "code": settings.DEFAULT_CODE_MODEL,
+            "vision": settings.DEFAULT_VISION_MODEL,
+            "embedding": settings.DEFAULT_EMBEDDING_MODEL,
+        },
+        "config": {
+            "max_tokens": settings.AGENT_MAX_TOKENS,
+            "temperature": settings.AGENT_TEMPERATURE,
+        },
+    }
+
+
+@router.get(
+    "/models",
+    summary="List all available models",
+    description="Returns the full model registry with capabilities and node assignments.",
+)
+async def list_models():
+    return {
+        "models": registry.to_dict(),
+        "total": len(registry.models),
+    }
+
+
+# ── Conversation persistence ──────────────────────────────────────────
+
+
+async def _persist_conversation(
+    db: AsyncSession,
+    conversation_id: str | None,
+    request: AssistRequest,
+    response: AgentResponse,
+) -> str:
+    """Save user message and agent response to the database."""
+    if conversation_id:
+        # Find existing conversation
+        from sqlalchemy import select
+        result = await db.execute(
+            select(Conversation).where(Conversation.id == conversation_id)
+        )
+        conv = result.scalar_one_or_none()
+        if not conv:
+            conv = Conversation(id=conversation_id, hunt_id=request.hunt_id)
+            db.add(conv)
+    else:
+        conv = Conversation(
+            title=request.query[:100],
+            hunt_id=request.hunt_id,
+        )
+        db.add(conv)
+        await db.flush()
+
+    # User message
+    user_msg = Message(
+        conversation_id=conv.id,
+        role="user",
+        content=request.query,
+    )
+    db.add(user_msg)
+
+    # Agent message
+    agent_msg = Message(
+        conversation_id=conv.id,
+        role="agent",
+        content=response.guidance,
+        model_used=response.model_used,
+        node_used=response.node_used,
+        latency_ms=response.latency_ms,
+        response_meta={
+            "confidence": response.confidence,
+            "pivots": response.suggested_pivots,
+            "filters": response.suggested_filters,
+            "sans_refs": response.sans_references,
+        },
+    )
+    db.add(agent_msg)
+    await db.flush()
+
+    return conv.id
--- a/backend/app/api/routes/analysis.py
+++ b/backend/app/api/routes/analysis.py
@@ -0,0 +1,402 @@
+"""Analysis API routes - triage, host profiles, reports, IOC extraction,
+host grouping, anomaly detection, data query (SSE), and job management."""
+
+from __future__ import annotations
+
+import logging
+from typing import Optional
+
+from fastapi import APIRouter, BackgroundTasks, Depends, HTTPException, Query
+from fastapi.responses import StreamingResponse
+from pydantic import BaseModel
+from sqlalchemy import select
+from sqlalchemy.ext.asyncio import AsyncSession
+
+from app.db import get_db
+from app.db.models import HostProfile, HuntReport, TriageResult
+
+logger = logging.getLogger(__name__)
+
+router = APIRouter(prefix="/api/analysis", tags=["analysis"])
+
+
+# --- Response models ---
+
+class TriageResultResponse(BaseModel):
+    id: str
+    dataset_id: str
+    row_start: int
+    row_end: int
+    risk_score: float
+    verdict: str
+    findings: list | None = None
+    suspicious_indicators: list | None = None
+    mitre_techniques: list | None = None
+    model_used: str | None = None
+    node_used: str | None = None
+
+    class Config:
+        from_attributes = True
+
+
+class HostProfileResponse(BaseModel):
+    id: str
+    hunt_id: str
+    hostname: str
+    fqdn: str | None = None
+    risk_score: float
+    risk_level: str
+    artifact_summary: dict | None = None
+    timeline_summary: str | None = None
+    suspicious_findings: list | None = None
+    mitre_techniques: list | None = None
+    llm_analysis: str | None = None
+    model_used: str | None = None
+
+    class Config:
+        from_attributes = True
+
+
+class HuntReportResponse(BaseModel):
+    id: str
+    hunt_id: str
+    status: str
+    exec_summary: str | None = None
+    full_report: str | None = None
+    findings: list | None = None
+    recommendations: list | None = None
+    mitre_mapping: dict | None = None
+    ioc_table: list | None = None
+    host_risk_summary: list | None = None
+    models_used: list | None = None
+    generation_time_ms: int | None = None
+
+    class Config:
+        from_attributes = True
+
+
+class QueryRequest(BaseModel):
+    question: str
+    mode: str = "quick"  # quick or deep
+
+
+# --- Triage endpoints ---
+
+@router.get("/triage/{dataset_id}", response_model=list[TriageResultResponse])
+async def get_triage_results(
+    dataset_id: str,
+    min_risk: float = Query(0.0, ge=0.0, le=10.0),
+    db: AsyncSession = Depends(get_db),
+):
+    result = await db.execute(
+        select(TriageResult)
+        .where(TriageResult.dataset_id == dataset_id)
+        .where(TriageResult.risk_score >= min_risk)
+        .order_by(TriageResult.risk_score.desc())
+    )
+    return result.scalars().all()
+
+
+@router.post("/triage/{dataset_id}")
+async def trigger_triage(
+    dataset_id: str,
+    background_tasks: BackgroundTasks,
+):
+    async def _run():
+        from app.services.triage import triage_dataset
+        await triage_dataset(dataset_id)
+
+    background_tasks.add_task(_run)
+    return {"status": "triage_started", "dataset_id": dataset_id}
+
+
+# --- Host profile endpoints ---
+
+@router.get("/profiles/{hunt_id}", response_model=list[HostProfileResponse])
+async def get_host_profiles(
+    hunt_id: str,
+    min_risk: float = Query(0.0, ge=0.0, le=10.0),
+    db: AsyncSession = Depends(get_db),
+):
+    result = await db.execute(
+        select(HostProfile)
+        .where(HostProfile.hunt_id == hunt_id)
+        .where(HostProfile.risk_score >= min_risk)
+        .order_by(HostProfile.risk_score.desc())
+    )
+    return result.scalars().all()
+
+
+@router.post("/profiles/{hunt_id}")
+async def trigger_all_profiles(
+    hunt_id: str,
+    background_tasks: BackgroundTasks,
+):
+    async def _run():
+        from app.services.host_profiler import profile_all_hosts
+        await profile_all_hosts(hunt_id)
+
+    background_tasks.add_task(_run)
+    return {"status": "profiling_started", "hunt_id": hunt_id}
+
+
+@router.post("/profiles/{hunt_id}/{hostname}")
+async def trigger_single_profile(
+    hunt_id: str,
+    hostname: str,
+    background_tasks: BackgroundTasks,
+):
+    async def _run():
+        from app.services.host_profiler import profile_host
+        await profile_host(hunt_id, hostname)
+
+    background_tasks.add_task(_run)
+    return {"status": "profiling_started", "hunt_id": hunt_id, "hostname": hostname}
+
+
+# --- Report endpoints ---
+
+@router.get("/reports/{hunt_id}", response_model=list[HuntReportResponse])
+async def list_reports(
+    hunt_id: str,
+    db: AsyncSession = Depends(get_db),
+):
+    result = await db.execute(
+        select(HuntReport)
+        .where(HuntReport.hunt_id == hunt_id)
+        .order_by(HuntReport.created_at.desc())
+    )
+    return result.scalars().all()
+
+
+@router.get("/reports/{hunt_id}/{report_id}", response_model=HuntReportResponse)
+async def get_report(
+    hunt_id: str,
+    report_id: str,
+    db: AsyncSession = Depends(get_db),
+):
+    result = await db.execute(
+        select(HuntReport)
+        .where(HuntReport.id == report_id)
+        .where(HuntReport.hunt_id == hunt_id)
+    )
+    report = result.scalar_one_or_none()
+    if not report:
+        raise HTTPException(status_code=404, detail="Report not found")
+    return report
+
+
+@router.post("/reports/{hunt_id}/generate")
+async def trigger_report(
+    hunt_id: str,
+    background_tasks: BackgroundTasks,
+):
+    async def _run():
+        from app.services.report_generator import generate_report
+        await generate_report(hunt_id)
+
+    background_tasks.add_task(_run)
+    return {"status": "report_generation_started", "hunt_id": hunt_id}
+
+
+# --- IOC extraction endpoints ---
+
+@router.get("/iocs/{dataset_id}")
+async def extract_iocs(
+    dataset_id: str,
+    max_rows: int = Query(5000, ge=1, le=50000),
+    db: AsyncSession = Depends(get_db),
+):
+    """Extract IOCs (IPs, domains, hashes, etc.) from dataset rows."""
+    from app.services.ioc_extractor import extract_iocs_from_dataset
+    iocs = await extract_iocs_from_dataset(dataset_id, db, max_rows=max_rows)
+    total = sum(len(v) for v in iocs.values())
+    return {"dataset_id": dataset_id, "iocs": iocs, "total": total}
+
+
+# --- Host grouping endpoints ---
+
+@router.get("/hosts/{hunt_id}")
+async def get_host_groups(
+    hunt_id: str,
+    db: AsyncSession = Depends(get_db),
+):
+    """Group data by hostname across all datasets in a hunt."""
+    from app.services.ioc_extractor import extract_host_groups
+    groups = await extract_host_groups(hunt_id, db)
+    return {"hunt_id": hunt_id, "hosts": groups}
+
+
+# --- Anomaly detection endpoints ---
+
+@router.get("/anomalies/{dataset_id}")
+async def get_anomalies(
+    dataset_id: str,
+    outliers_only: bool = Query(False),
+    db: AsyncSession = Depends(get_db),
+):
+    """Get anomaly detection results for a dataset."""
+    from app.db.models import AnomalyResult
+    stmt = select(AnomalyResult).where(AnomalyResult.dataset_id == dataset_id)
+    if outliers_only:
+        stmt = stmt.where(AnomalyResult.is_outlier == True)
+    stmt = stmt.order_by(AnomalyResult.anomaly_score.desc())
+    result = await db.execute(stmt)
+    rows = result.scalars().all()
+    return [
+        {
+            "id": r.id,
+            "dataset_id": r.dataset_id,
+            "row_id": r.row_id,
+            "anomaly_score": r.anomaly_score,
+            "distance_from_centroid": r.distance_from_centroid,
+            "cluster_id": r.cluster_id,
+            "is_outlier": r.is_outlier,
+            "explanation": r.explanation,
+        }
+        for r in rows
+    ]
+
+
+@router.post("/anomalies/{dataset_id}")
+async def trigger_anomaly_detection(
+    dataset_id: str,
+    k: int = Query(3, ge=2, le=20),
+    threshold: float = Query(0.35, ge=0.1, le=0.9),
+    background_tasks: BackgroundTasks = None,
+):
+    """Trigger embedding-based anomaly detection on a dataset."""
+    async def _run():
+        from app.services.anomaly_detector import detect_anomalies
+        await detect_anomalies(dataset_id, k=k, outlier_threshold=threshold)
+
+    if background_tasks:
+        background_tasks.add_task(_run)
+        return {"status": "anomaly_detection_started", "dataset_id": dataset_id}
+    else:
+        from app.services.anomaly_detector import detect_anomalies
+        results = await detect_anomalies(dataset_id, k=k, outlier_threshold=threshold)
+        return {"status": "complete", "dataset_id": dataset_id, "count": len(results)}
+
+
+# --- Natural language data query (SSE streaming) ---
+
+@router.post("/query/{dataset_id}")
+async def query_dataset_endpoint(
+    dataset_id: str,
+    body: QueryRequest,
+):
+    """Ask a natural language question about a dataset.
+
+    Returns an SSE stream with token-by-token LLM response.
+    Event types: status, metadata, token, error, done
+    """
+    from app.services.data_query import query_dataset_stream
+
+    return StreamingResponse(
+        query_dataset_stream(dataset_id, body.question, body.mode),
+        media_type="text/event-stream",
+        headers={
+            "Cache-Control": "no-cache",
+            "Connection": "keep-alive",
+            "X-Accel-Buffering": "no",
+        },
+    )
+
+
+@router.post("/query/{dataset_id}/sync")
+async def query_dataset_sync(
+    dataset_id: str,
+    body: QueryRequest,
+):
+    """Non-streaming version of data query."""
+    from app.services.data_query import query_dataset
+
+    try:
+        answer = await query_dataset(dataset_id, body.question, body.mode)
+        return {"dataset_id": dataset_id, "question": body.question, "answer": answer, "mode": body.mode}
+    except ValueError as e:
+        raise HTTPException(status_code=404, detail=str(e))
+    except Exception as e:
+        logger.error(f"Query failed: {e}", exc_info=True)
+        raise HTTPException(status_code=500, detail=str(e))
+
+
+# --- Job queue endpoints ---
+
+@router.get("/jobs")
+async def list_jobs(
+    status: str | None = Query(None),
+    job_type: str | None = Query(None),
+    limit: int = Query(50, ge=1, le=200),
+):
+    """List all tracked jobs."""
+    from app.services.job_queue import job_queue, JobStatus, JobType
+
+    s = JobStatus(status) if status else None
+    t = JobType(job_type) if job_type else None
+    jobs = job_queue.list_jobs(status=s, job_type=t, limit=limit)
+    stats = job_queue.get_stats()
+    return {"jobs": jobs, "stats": stats}
+
+
+@router.get("/jobs/{job_id}")
+async def get_job(job_id: str):
+    """Get status of a specific job."""
+    from app.services.job_queue import job_queue
+
+    job = job_queue.get_job(job_id)
+    if not job:
+        raise HTTPException(status_code=404, detail="Job not found")
+    return job.to_dict()
+
+
+@router.delete("/jobs/{job_id}")
+async def cancel_job(job_id: str):
+    """Cancel a running or queued job."""
+    from app.services.job_queue import job_queue
+
+    if job_queue.cancel_job(job_id):
+        return {"status": "cancelled", "job_id": job_id}
+    raise HTTPException(status_code=400, detail="Job cannot be cancelled (already complete or not found)")
+
+
+@router.post("/jobs/submit/{job_type}")
+async def submit_job(
+    job_type: str,
+    params: dict = {},
+):
+    """Submit a new job to the queue.
+
+    Job types: triage, host_profile, report, anomaly, query
+    Params vary by type (e.g., dataset_id, hunt_id, question, mode).
+    """
+    from app.services.job_queue import job_queue, JobType
+
+    try:
+        jt = JobType(job_type)
+    except ValueError:
+        raise HTTPException(
+            status_code=400,
+            detail=f"Invalid job_type: {job_type}. Valid: {[t.value for t in JobType]}",
+        )
+
+    job = job_queue.submit(jt, **params)
+    return {"job_id": job.id, "status": job.status.value, "job_type": job_type}
+
+
+# --- Load balancer status ---
+
+@router.get("/lb/status")
+async def lb_status():
+    """Get load balancer status for both nodes."""
+    from app.services.load_balancer import lb
+    return lb.get_status()
+
+
+@router.post("/lb/check")
+async def lb_health_check():
+    """Force a health check of both nodes."""
+    from app.services.load_balancer import lb
+    await lb.check_health()
+    return lb.get_status()
--- a/backend/app/api/routes/annotations.py
+++ b/backend/app/api/routes/annotations.py
@@ -0,0 +1,311 @@
+"""API routes for annotations and hypotheses."""
+
+import logging
+
+from fastapi import APIRouter, Depends, HTTPException, Query
+from pydantic import BaseModel, Field
+from sqlalchemy import select, func
+from sqlalchemy.ext.asyncio import AsyncSession
+
+from app.db import get_db
+from app.db.models import Annotation, Hypothesis
+
+logger = logging.getLogger(__name__)
+
+router = APIRouter(tags=["annotations"])
+
+
+# ── Annotation models ─────────────────────────────────────────────────
+
+
+class AnnotationCreate(BaseModel):
+    row_id: int | None = None
+    dataset_id: str | None = None
+    text: str = Field(..., max_length=2000)
+    severity: str = Field(default="info")  # info|low|medium|high|critical
+    tag: str | None = None  # suspicious|benign|needs-review
+    highlight_color: str | None = None
+
+
+class AnnotationUpdate(BaseModel):
+    text: str | None = None
+    severity: str | None = None
+    tag: str | None = None
+    highlight_color: str | None = None
+
+
+class AnnotationResponse(BaseModel):
+    id: str
+    row_id: int | None
+    dataset_id: str | None
+    author_id: str | None
+    text: str
+    severity: str
+    tag: str | None
+    highlight_color: str | None
+    created_at: str
+    updated_at: str
+
+
+class AnnotationListResponse(BaseModel):
+    annotations: list[AnnotationResponse]
+    total: int
+
+
+# ── Hypothesis models ─────────────────────────────────────────────────
+
+
+class HypothesisCreate(BaseModel):
+    hunt_id: str | None = None
+    title: str = Field(..., max_length=256)
+    description: str | None = None
+    mitre_technique: str | None = None
+    status: str = Field(default="draft")
+
+
+class HypothesisUpdate(BaseModel):
+    title: str | None = None
+    description: str | None = None
+    mitre_technique: str | None = None
+    status: str | None = None  # draft|active|confirmed|rejected
+    evidence_row_ids: list[int] | None = None
+    evidence_notes: str | None = None
+
+
+class HypothesisResponse(BaseModel):
+    id: str
+    hunt_id: str | None
+    title: str
+    description: str | None
+    mitre_technique: str | None
+    status: str
+    evidence_row_ids: list | None
+    evidence_notes: str | None
+    created_at: str
+    updated_at: str
+
+
+class HypothesisListResponse(BaseModel):
+    hypotheses: list[HypothesisResponse]
+    total: int
+
+
+# ── Annotation routes ─────────────────────────────────────────────────
+
+
+ann_router = APIRouter(prefix="/api/annotations")
+
+
+@ann_router.post("", response_model=AnnotationResponse, summary="Create annotation")
+async def create_annotation(body: AnnotationCreate, db: AsyncSession = Depends(get_db)):
+    ann = Annotation(
+        row_id=body.row_id,
+        dataset_id=body.dataset_id,
+        text=body.text,
+        severity=body.severity,
+        tag=body.tag,
+        highlight_color=body.highlight_color,
+    )
+    db.add(ann)
+    await db.flush()
+    return AnnotationResponse(
+        id=ann.id, row_id=ann.row_id, dataset_id=ann.dataset_id,
+        author_id=ann.author_id, text=ann.text, severity=ann.severity,
+        tag=ann.tag, highlight_color=ann.highlight_color,
+        created_at=ann.created_at.isoformat(), updated_at=ann.updated_at.isoformat(),
+    )
+
+
+@ann_router.get("", response_model=AnnotationListResponse, summary="List annotations")
+async def list_annotations(
+    dataset_id: str | None = Query(None),
+    row_id: int | None = Query(None),
+    tag: str | None = Query(None),
+    severity: str | None = Query(None),
+    limit: int = Query(100, ge=1, le=1000),
+    offset: int = Query(0, ge=0),
+    db: AsyncSession = Depends(get_db),
+):
+    stmt = select(Annotation).order_by(Annotation.created_at.desc())
+    if dataset_id:
+        stmt = stmt.where(Annotation.dataset_id == dataset_id)
+    if row_id:
+        stmt = stmt.where(Annotation.row_id == row_id)
+    if tag:
+        stmt = stmt.where(Annotation.tag == tag)
+    if severity:
+        stmt = stmt.where(Annotation.severity == severity)
+    stmt = stmt.limit(limit).offset(offset)
+    result = await db.execute(stmt)
+    annotations = result.scalars().all()
+
+    count_stmt = select(func.count(Annotation.id))
+    if dataset_id:
+        count_stmt = count_stmt.where(Annotation.dataset_id == dataset_id)
+    total = (await db.execute(count_stmt)).scalar_one()
+
+    return AnnotationListResponse(
+        annotations=[
+            AnnotationResponse(
+                id=a.id, row_id=a.row_id, dataset_id=a.dataset_id,
+                author_id=a.author_id, text=a.text, severity=a.severity,
+                tag=a.tag, highlight_color=a.highlight_color,
+                created_at=a.created_at.isoformat(), updated_at=a.updated_at.isoformat(),
+            )
+            for a in annotations
+        ],
+        total=total,
+    )
+
+
+@ann_router.put("/{annotation_id}", response_model=AnnotationResponse, summary="Update annotation")
+async def update_annotation(
+    annotation_id: str, body: AnnotationUpdate, db: AsyncSession = Depends(get_db)
+):
+    result = await db.execute(select(Annotation).where(Annotation.id == annotation_id))
+    ann = result.scalar_one_or_none()
+    if not ann:
+        raise HTTPException(status_code=404, detail="Annotation not found")
+    if body.text is not None:
+        ann.text = body.text
+    if body.severity is not None:
+        ann.severity = body.severity
+    if body.tag is not None:
+        ann.tag = body.tag
+    if body.highlight_color is not None:
+        ann.highlight_color = body.highlight_color
+    await db.flush()
+    return AnnotationResponse(
+        id=ann.id, row_id=ann.row_id, dataset_id=ann.dataset_id,
+        author_id=ann.author_id, text=ann.text, severity=ann.severity,
+        tag=ann.tag, highlight_color=ann.highlight_color,
+        created_at=ann.created_at.isoformat(), updated_at=ann.updated_at.isoformat(),
+    )
+
+
+@ann_router.delete("/{annotation_id}", summary="Delete annotation")
+async def delete_annotation(annotation_id: str, db: AsyncSession = Depends(get_db)):
+    result = await db.execute(select(Annotation).where(Annotation.id == annotation_id))
+    ann = result.scalar_one_or_none()
+    if not ann:
+        raise HTTPException(status_code=404, detail="Annotation not found")
+    await db.delete(ann)
+    return {"message": "Annotation deleted", "id": annotation_id}
+
+
+# ── Hypothesis routes ─────────────────────────────────────────────────
+
+
+hyp_router = APIRouter(prefix="/api/hypotheses")
+
+
+@hyp_router.post("", response_model=HypothesisResponse, summary="Create hypothesis")
+async def create_hypothesis(body: HypothesisCreate, db: AsyncSession = Depends(get_db)):
+    hyp = Hypothesis(
+        hunt_id=body.hunt_id,
+        title=body.title,
+        description=body.description,
+        mitre_technique=body.mitre_technique,
+        status=body.status,
+    )
+    db.add(hyp)
+    await db.flush()
+    return HypothesisResponse(
+        id=hyp.id, hunt_id=hyp.hunt_id, title=hyp.title,
+        description=hyp.description, mitre_technique=hyp.mitre_technique,
+        status=hyp.status, evidence_row_ids=hyp.evidence_row_ids,
+        evidence_notes=hyp.evidence_notes,
+        created_at=hyp.created_at.isoformat(), updated_at=hyp.updated_at.isoformat(),
+    )
+
+
+@hyp_router.get("", response_model=HypothesisListResponse, summary="List hypotheses")
+async def list_hypotheses(
+    hunt_id: str | None = Query(None),
+    status: str | None = Query(None),
+    limit: int = Query(100, ge=1, le=1000),
+    offset: int = Query(0, ge=0),
+    db: AsyncSession = Depends(get_db),
+):
+    stmt = select(Hypothesis).order_by(Hypothesis.updated_at.desc())
+    if hunt_id:
+        stmt = stmt.where(Hypothesis.hunt_id == hunt_id)
+    if status:
+        stmt = stmt.where(Hypothesis.status == status)
+    stmt = stmt.limit(limit).offset(offset)
+    result = await db.execute(stmt)
+    hyps = result.scalars().all()
+
+    count_stmt = select(func.count(Hypothesis.id))
+    if hunt_id:
+        count_stmt = count_stmt.where(Hypothesis.hunt_id == hunt_id)
+    total = (await db.execute(count_stmt)).scalar_one()
+
+    return HypothesisListResponse(
+        hypotheses=[
+            HypothesisResponse(
+                id=h.id, hunt_id=h.hunt_id, title=h.title,
+                description=h.description, mitre_technique=h.mitre_technique,
+                status=h.status, evidence_row_ids=h.evidence_row_ids,
+                evidence_notes=h.evidence_notes,
+                created_at=h.created_at.isoformat(), updated_at=h.updated_at.isoformat(),
+            )
+            for h in hyps
+        ],
+        total=total,
+    )
+
+
+@hyp_router.get("/{hypothesis_id}", response_model=HypothesisResponse, summary="Get hypothesis")
+async def get_hypothesis(hypothesis_id: str, db: AsyncSession = Depends(get_db)):
+    result = await db.execute(select(Hypothesis).where(Hypothesis.id == hypothesis_id))
+    hyp = result.scalar_one_or_none()
+    if not hyp:
+        raise HTTPException(status_code=404, detail="Hypothesis not found")
+    return HypothesisResponse(
+        id=hyp.id, hunt_id=hyp.hunt_id, title=hyp.title,
+        description=hyp.description, mitre_technique=hyp.mitre_technique,
+        status=hyp.status, evidence_row_ids=hyp.evidence_row_ids,
+        evidence_notes=hyp.evidence_notes,
+        created_at=hyp.created_at.isoformat(), updated_at=hyp.updated_at.isoformat(),
+    )
+
+
+@hyp_router.put("/{hypothesis_id}", response_model=HypothesisResponse, summary="Update hypothesis")
+async def update_hypothesis(
+    hypothesis_id: str, body: HypothesisUpdate, db: AsyncSession = Depends(get_db)
+):
+    result = await db.execute(select(Hypothesis).where(Hypothesis.id == hypothesis_id))
+    hyp = result.scalar_one_or_none()
+    if not hyp:
+        raise HTTPException(status_code=404, detail="Hypothesis not found")
+    if body.title is not None:
+        hyp.title = body.title
+    if body.description is not None:
+        hyp.description = body.description
+    if body.mitre_technique is not None:
+        hyp.mitre_technique = body.mitre_technique
+    if body.status is not None:
+        hyp.status = body.status
+    if body.evidence_row_ids is not None:
+        hyp.evidence_row_ids = body.evidence_row_ids
+    if body.evidence_notes is not None:
+        hyp.evidence_notes = body.evidence_notes
+    await db.flush()
+    return HypothesisResponse(
+        id=hyp.id, hunt_id=hyp.hunt_id, title=hyp.title,
+        description=hyp.description, mitre_technique=hyp.mitre_technique,
+        status=hyp.status, evidence_row_ids=hyp.evidence_row_ids,
+        evidence_notes=hyp.evidence_notes,
+        created_at=hyp.created_at.isoformat(), updated_at=hyp.updated_at.isoformat(),
+    )
+
+
+@hyp_router.delete("/{hypothesis_id}", summary="Delete hypothesis")
+async def delete_hypothesis(hypothesis_id: str, db: AsyncSession = Depends(get_db)):
+    result = await db.execute(select(Hypothesis).where(Hypothesis.id == hypothesis_id))
+    hyp = result.scalar_one_or_none()
+    if not hyp:
+        raise HTTPException(status_code=404, detail="Hypothesis not found")
+    await db.delete(hyp)
+    return {"message": "Hypothesis deleted", "id": hypothesis_id}
--- a/backend/app/api/routes/auth.py
+++ b/backend/app/api/routes/auth.py
@@ -0,0 +1,197 @@
+"""API routes for authentication — register, login, refresh, profile."""
+
+import logging
+
+from fastapi import APIRouter, Depends, HTTPException, status
+from pydantic import BaseModel, Field, EmailStr
+from sqlalchemy import select
+from sqlalchemy.ext.asyncio import AsyncSession
+
+from app.db import get_db
+from app.db.models import User
+from app.services.auth import (
+    hash_password,
+    verify_password,
+    create_token_pair,
+    decode_token,
+    get_current_user,
+    TokenPair,
+)
+
+logger = logging.getLogger(__name__)
+
+router = APIRouter(prefix="/api/auth", tags=["auth"])
+
+
+# ── Request / Response models ─────────────────────────────────────────
+
+
+class RegisterRequest(BaseModel):
+    username: str = Field(..., min_length=3, max_length=64)
+    email: str = Field(..., max_length=256)
+    password: str = Field(..., min_length=8, max_length=128)
+    display_name: str | None = None
+
+
+class LoginRequest(BaseModel):
+    username: str
+    password: str
+
+
+class RefreshRequest(BaseModel):
+    refresh_token: str
+
+
+class UserResponse(BaseModel):
+    id: str
+    username: str
+    email: str
+    display_name: str | None
+    role: str
+    is_active: bool
+    created_at: str
+
+
+class AuthResponse(BaseModel):
+    user: UserResponse
+    tokens: TokenPair
+
+
+# ── Routes ────────────────────────────────────────────────────────────
+
+
+@router.post(
+    "/register",
+    response_model=AuthResponse,
+    status_code=status.HTTP_201_CREATED,
+    summary="Register a new user",
+)
+async def register(body: RegisterRequest, db: AsyncSession = Depends(get_db)):
+    # Check for existing username
+    result = await db.execute(select(User).where(User.username == body.username))
+    if result.scalar_one_or_none():
+        raise HTTPException(
+            status_code=status.HTTP_409_CONFLICT,
+            detail="Username already taken",
+        )
+
+    # Check for existing email
+    result = await db.execute(select(User).where(User.email == body.email))
+    if result.scalar_one_or_none():
+        raise HTTPException(
+            status_code=status.HTTP_409_CONFLICT,
+            detail="Email already registered",
+        )
+
+    user = User(
+        username=body.username,
+        email=body.email,
+        password_hash=hash_password(body.password),
+        display_name=body.display_name or body.username,
+        role="analyst",  # Default role
+    )
+    db.add(user)
+    await db.flush()
+
+    tokens = create_token_pair(user.id, user.role)
+
+    logger.info(f"New user registered: {user.username} ({user.id})")
+
+    return AuthResponse(
+        user=UserResponse(
+            id=user.id,
+            username=user.username,
+            email=user.email,
+            display_name=user.display_name,
+            role=user.role,
+            is_active=user.is_active,
+            created_at=user.created_at.isoformat(),
+        ),
+        tokens=tokens,
+    )
+
+
+@router.post(
+    "/login",
+    response_model=AuthResponse,
+    summary="Login with username and password",
+)
+async def login(body: LoginRequest, db: AsyncSession = Depends(get_db)):
+    result = await db.execute(select(User).where(User.username == body.username))
+    user = result.scalar_one_or_none()
+
+    if not user or not user.password_hash:
+        raise HTTPException(
+            status_code=status.HTTP_401_UNAUTHORIZED,
+            detail="Invalid username or password",
+        )
+
+    if not verify_password(body.password, user.password_hash):
+        raise HTTPException(
+            status_code=status.HTTP_401_UNAUTHORIZED,
+            detail="Invalid username or password",
+        )
+
+    if not user.is_active:
+        raise HTTPException(
+            status_code=status.HTTP_403_FORBIDDEN,
+            detail="Account is disabled",
+        )
+
+    tokens = create_token_pair(user.id, user.role)
+
+    return AuthResponse(
+        user=UserResponse(
+            id=user.id,
+            username=user.username,
+            email=user.email,
+            display_name=user.display_name,
+            role=user.role,
+            is_active=user.is_active,
+            created_at=user.created_at.isoformat(),
+        ),
+        tokens=tokens,
+    )
+
+
+@router.post(
+    "/refresh",
+    response_model=TokenPair,
+    summary="Refresh access token",
+)
+async def refresh_token(body: RefreshRequest, db: AsyncSession = Depends(get_db)):
+    token_data = decode_token(body.refresh_token)
+
+    if token_data.type != "refresh":
+        raise HTTPException(
+            status_code=status.HTTP_401_UNAUTHORIZED,
+            detail="Invalid token type — use refresh token",
+        )
+
+    result = await db.execute(select(User).where(User.id == token_data.sub))
+    user = result.scalar_one_or_none()
+
+    if not user or not user.is_active:
+        raise HTTPException(
+            status_code=status.HTTP_401_UNAUTHORIZED,
+            detail="Invalid user",
+        )
+
+    return create_token_pair(user.id, user.role)
+
+
+@router.get(
+    "/me",
+    response_model=UserResponse,
+    summary="Get current user profile",
+)
+async def get_profile(user: User = Depends(get_current_user)):
+    return UserResponse(
+        id=user.id,
+        username=user.username,
+        email=user.email,
+        display_name=user.display_name,
+        role=user.role,
+        is_active=user.is_active,
+        created_at=user.created_at.isoformat() if hasattr(user.created_at, 'isoformat') else str(user.created_at),
+    )
--- a/backend/app/api/routes/correlation.py
+++ b/backend/app/api/routes/correlation.py
@@ -0,0 +1,83 @@
+"""API routes for cross-hunt correlation analysis."""
+
+import logging
+from dataclasses import asdict
+
+from fastapi import APIRouter, Depends, HTTPException, Query
+from pydantic import BaseModel, Field
+from sqlalchemy.ext.asyncio import AsyncSession
+
+from app.db import get_db
+from app.services.correlation import correlation_engine
+
+logger = logging.getLogger(__name__)
+
+router = APIRouter(prefix="/api/correlation", tags=["correlation"])
+
+
+class CorrelateRequest(BaseModel):
+    hunt_ids: list[str] = Field(
+        ...,
+        min_length=2,
+        max_length=20,
+        description="List of hunt IDs to correlate",
+    )
+
+
+@router.post(
+    "/analyze",
+    summary="Run correlation analysis across hunts",
+    description="Find shared IOCs, overlapping time windows, common MITRE techniques, "
+    "and host patterns across the specified hunts.",
+)
+async def correlate_hunts(
+    body: CorrelateRequest,
+    db: AsyncSession = Depends(get_db),
+):
+    result = await correlation_engine.correlate_hunts(body.hunt_ids, db)
+
+    return {
+        "hunt_ids": result.hunt_ids,
+        "summary": result.summary,
+        "total_correlations": result.total_correlations,
+        "ioc_overlaps": [asdict(o) for o in result.ioc_overlaps],
+        "time_overlaps": [asdict(o) for o in result.time_overlaps],
+        "technique_overlaps": [asdict(o) for o in result.technique_overlaps],
+        "host_overlaps": result.host_overlaps,
+    }
+
+
+@router.get(
+    "/all",
+    summary="Correlate all hunts",
+    description="Run correlation across all hunts in the system.",
+)
+async def correlate_all(db: AsyncSession = Depends(get_db)):
+    result = await correlation_engine.correlate_all(db)
+    return {
+        "hunt_ids": result.hunt_ids,
+        "summary": result.summary,
+        "total_correlations": result.total_correlations,
+        "ioc_overlaps": [asdict(o) for o in result.ioc_overlaps[:20]],
+        "time_overlaps": [asdict(o) for o in result.time_overlaps[:10]],
+        "technique_overlaps": [asdict(o) for o in result.technique_overlaps[:10]],
+        "host_overlaps": result.host_overlaps[:10],
+    }
+
+
+@router.get(
+    "/ioc/{ioc_value}",
+    summary="Find IOC across all hunts",
+    description="Search for a specific IOC value across all datasets and hunts.",
+)
+async def find_ioc(
+    ioc_value: str,
+    db: AsyncSession = Depends(get_db),
+):
+    occurrences = await correlation_engine.find_ioc_across_hunts(ioc_value, db)
+    return {
+        "ioc_value": ioc_value,
+        "occurrences": occurrences,
+        "total": len(occurrences),
+        "unique_hunts": len(set(o["hunt_id"] for o in occurrences if o.get("hunt_id"))),
+    }
--- a/backend/app/api/routes/datasets.py
+++ b/backend/app/api/routes/datasets.py
@@ -0,0 +1,295 @@
+"""API routes for dataset upload, listing, and management."""
+
+import logging
+import os
+from pathlib import Path
+
+from fastapi import APIRouter, Depends, File, HTTPException, Query, UploadFile
+from pydantic import BaseModel, Field
+from sqlalchemy.ext.asyncio import AsyncSession
+
+from app.config import settings
+from app.db import get_db
+from app.db.repositories.datasets import DatasetRepository
+from app.services.csv_parser import parse_csv_bytes, infer_column_types
+from app.services.normalizer import (
+    normalize_columns,
+    normalize_rows,
+    detect_ioc_columns,
+    detect_time_range,
+)
+
+logger = logging.getLogger(__name__)
+
+router = APIRouter(prefix="/api/datasets", tags=["datasets"])
+
+ALLOWED_EXTENSIONS = {".csv", ".tsv", ".txt"}
+
+
+# ── Response models ───────────────────────────────────────────────────
+
+
+class DatasetSummary(BaseModel):
+    id: str
+    name: str
+    filename: str
+    source_tool: str | None = None
+    row_count: int
+    column_schema: dict | None = None
+    normalized_columns: dict | None = None
+    ioc_columns: dict | None = None
+    file_size_bytes: int
+    encoding: str | None = None
+    delimiter: str | None = None
+    time_range_start: str | None = None
+    time_range_end: str | None = None
+    hunt_id: str | None = None
+    created_at: str
+
+
+class DatasetListResponse(BaseModel):
+    datasets: list[DatasetSummary]
+    total: int
+
+
+class RowsResponse(BaseModel):
+    rows: list[dict]
+    total: int
+    offset: int
+    limit: int
+
+
+class UploadResponse(BaseModel):
+    id: str
+    name: str
+    row_count: int
+    columns: list[str]
+    column_types: dict
+    normalized_columns: dict
+    ioc_columns: dict
+    message: str
+
+
+# ── Routes ────────────────────────────────────────────────────────────
+
+
+@router.post(
+    "/upload",
+    response_model=UploadResponse,
+    summary="Upload a CSV dataset",
+    description="Upload a CSV/TSV file for analysis. The file is parsed, columns normalized, "
+    "IOCs auto-detected, and rows stored in the database.",
+)
+async def upload_dataset(
+    file: UploadFile = File(...),
+    name: str | None = Query(None, description="Display name for the dataset"),
+    source_tool: str | None = Query(None, description="Source tool (e.g., velociraptor)"),
+    hunt_id: str | None = Query(None, description="Hunt ID to associate with"),
+    db: AsyncSession = Depends(get_db),
+):
+    """Upload and parse a CSV dataset."""
+    # Validate file
+    if not file.filename:
+        raise HTTPException(status_code=400, detail="No filename provided")
+
+    ext = Path(file.filename).suffix.lower()
+    if ext not in ALLOWED_EXTENSIONS:
+        raise HTTPException(
+            status_code=400,
+            detail=f"File type '{ext}' not allowed. Accepted: {', '.join(ALLOWED_EXTENSIONS)}",
+        )
+
+    # Read file bytes
+    raw_bytes = await file.read()
+    if len(raw_bytes) == 0:
+        raise HTTPException(status_code=400, detail="File is empty")
+
+    if len(raw_bytes) > settings.max_upload_bytes:
+        raise HTTPException(
+            status_code=413,
+            detail=f"File too large. Max size: {settings.MAX_UPLOAD_SIZE_MB} MB",
+        )
+
+    # Parse CSV
+    try:
+        rows, metadata = parse_csv_bytes(raw_bytes)
+    except Exception as e:
+        logger.error(f"CSV parse error: {e}")
+        raise HTTPException(
+            status_code=422,
+            detail=f"Failed to parse CSV: {str(e)}. Check encoding and format.",
+        )
+
+    if not rows:
+        raise HTTPException(status_code=422, detail="CSV file contains no data rows")
+
+    columns: list[str] = metadata["columns"]
+    column_types: dict = metadata["column_types"]
+
+    # Normalize columns
+    column_mapping = normalize_columns(columns)
+    normalized = normalize_rows(rows, column_mapping)
+
+    # Detect IOCs
+    ioc_columns = detect_ioc_columns(columns, column_types, column_mapping)
+
+    # Detect time range
+    time_start, time_end = detect_time_range(rows, column_mapping)
+
+    # Store in DB
+    repo = DatasetRepository(db)
+    dataset = await repo.create_dataset(
+        name=name or Path(file.filename).stem,
+        filename=file.filename,
+        source_tool=source_tool,
+        row_count=len(rows),
+        column_schema=column_types,
+        normalized_columns=column_mapping,
+        ioc_columns=ioc_columns,
+        file_size_bytes=len(raw_bytes),
+        encoding=metadata["encoding"],
+        delimiter=metadata["delimiter"],
+        time_range_start=time_start,
+        time_range_end=time_end,
+        hunt_id=hunt_id,
+    )
+
+    await repo.bulk_insert_rows(
+        dataset_id=dataset.id,
+        rows=rows,
+        normalized_rows=normalized,
+    )
+
+    logger.info(
+        f"Uploaded dataset '{dataset.name}': {len(rows)} rows, "
+        f"{len(columns)} columns, {len(ioc_columns)} IOC columns detected"
+    )
+
+    return UploadResponse(
+        id=dataset.id,
+        name=dataset.name,
+        row_count=len(rows),
+        columns=columns,
+        column_types=column_types,
+        normalized_columns=column_mapping,
+        ioc_columns=ioc_columns,
+        message=f"Successfully uploaded {len(rows)} rows with {len(ioc_columns)} IOC columns detected",
+    )
+
+
+@router.get(
+    "",
+    response_model=DatasetListResponse,
+    summary="List datasets",
+)
+async def list_datasets(
+    hunt_id: str | None = Query(None),
+    limit: int = Query(100, ge=1, le=1000),
+    offset: int = Query(0, ge=0),
+    db: AsyncSession = Depends(get_db),
+):
+    repo = DatasetRepository(db)
+    datasets = await repo.list_datasets(hunt_id=hunt_id, limit=limit, offset=offset)
+    total = await repo.count_datasets(hunt_id=hunt_id)
+
+    return DatasetListResponse(
+        datasets=[
+            DatasetSummary(
+                id=ds.id,
+                name=ds.name,
+                filename=ds.filename,
+                source_tool=ds.source_tool,
+                row_count=ds.row_count,
+                column_schema=ds.column_schema,
+                normalized_columns=ds.normalized_columns,
+                ioc_columns=ds.ioc_columns,
+                file_size_bytes=ds.file_size_bytes,
+                encoding=ds.encoding,
+                delimiter=ds.delimiter,
+                time_range_start=ds.time_range_start.isoformat() if ds.time_range_start else None,
+                time_range_end=ds.time_range_end.isoformat() if ds.time_range_end else None,
+                hunt_id=ds.hunt_id,
+                created_at=ds.created_at.isoformat(),
+            )
+            for ds in datasets
+        ],
+        total=total,
+    )
+
+
+@router.get(
+    "/{dataset_id}",
+    response_model=DatasetSummary,
+    summary="Get dataset details",
+)
+async def get_dataset(
+    dataset_id: str,
+    db: AsyncSession = Depends(get_db),
+):
+    repo = DatasetRepository(db)
+    ds = await repo.get_dataset(dataset_id)
+    if not ds:
+        raise HTTPException(status_code=404, detail="Dataset not found")
+    return DatasetSummary(
+        id=ds.id,
+        name=ds.name,
+        filename=ds.filename,
+        source_tool=ds.source_tool,
+        row_count=ds.row_count,
+        column_schema=ds.column_schema,
+        normalized_columns=ds.normalized_columns,
+        ioc_columns=ds.ioc_columns,
+        file_size_bytes=ds.file_size_bytes,
+        encoding=ds.encoding,
+        delimiter=ds.delimiter,
+        time_range_start=ds.time_range_start.isoformat() if ds.time_range_start else None,
+        time_range_end=ds.time_range_end.isoformat() if ds.time_range_end else None,
+        hunt_id=ds.hunt_id,
+        created_at=ds.created_at.isoformat(),
+    )
+
+
+@router.get(
+    "/{dataset_id}/rows",
+    response_model=RowsResponse,
+    summary="Get dataset rows",
+)
+async def get_dataset_rows(
+    dataset_id: str,
+    limit: int = Query(1000, ge=1, le=10000),
+    offset: int = Query(0, ge=0),
+    normalized: bool = Query(False, description="Return normalized column names"),
+    db: AsyncSession = Depends(get_db),
+):
+    repo = DatasetRepository(db)
+    ds = await repo.get_dataset(dataset_id)
+    if not ds:
+        raise HTTPException(status_code=404, detail="Dataset not found")
+
+    rows = await repo.get_rows(dataset_id, limit=limit, offset=offset)
+    total = await repo.count_rows(dataset_id)
+
+    return RowsResponse(
+        rows=[
+            (r.normalized_data if normalized and r.normalized_data else r.data)
+            for r in rows
+        ],
+        total=total,
+        offset=offset,
+        limit=limit,
+    )
+
+
+@router.delete(
+    "/{dataset_id}",
+    summary="Delete a dataset",
+)
+async def delete_dataset(
+    dataset_id: str,
+    db: AsyncSession = Depends(get_db),
+):
+    repo = DatasetRepository(db)
+    deleted = await repo.delete_dataset(dataset_id)
+    if not deleted:
+        raise HTTPException(status_code=404, detail="Dataset not found")
+    return {"message": "Dataset deleted", "id": dataset_id}
--- a/backend/app/api/routes/enrichment.py
+++ b/backend/app/api/routes/enrichment.py
@@ -0,0 +1,220 @@
+"""API routes for IOC enrichment."""
+
+import logging
+
+from fastapi import APIRouter, Depends, HTTPException, Query
+from pydantic import BaseModel, Field
+from sqlalchemy.ext.asyncio import AsyncSession
+
+from app.db import get_db
+from app.services.enrichment import (
+    enrichment_engine,
+    IOCType,
+    Verdict,
+    EnrichmentResultData,
+)
+
+logger = logging.getLogger(__name__)
+
+router = APIRouter(prefix="/api/enrichment", tags=["enrichment"])
+
+
+# ── Models ────────────────────────────────────────────────────────────
+
+
+class EnrichIOCRequest(BaseModel):
+    ioc_value: str = Field(..., max_length=2048, description="IOC value to enrich")
+    ioc_type: str = Field(..., description="IOC type: ip, domain, hash_md5, hash_sha1, hash_sha256, url")
+    skip_cache: bool = False
+
+
+class EnrichBatchRequest(BaseModel):
+    iocs: list[dict] = Field(
+        ...,
+        description="List of {value, type} pairs",
+        max_length=50,
+    )
+
+
+class EnrichmentResultResponse(BaseModel):
+    ioc_value: str
+    ioc_type: str
+    source: str
+    verdict: str
+    score: float
+    tags: list[str] = []
+    country: str = ""
+    asn: str = ""
+    org: str = ""
+    last_seen: str = ""
+    raw_data: dict = {}
+    error: str = ""
+    latency_ms: int = 0
+
+
+class EnrichIOCResponse(BaseModel):
+    ioc_value: str
+    ioc_type: str
+    results: list[EnrichmentResultResponse]
+    overall_verdict: str
+    overall_score: float
+
+
+class EnrichBatchResponse(BaseModel):
+    results: dict[str, list[EnrichmentResultResponse]]
+    total_enriched: int
+
+
+def _to_response(r: EnrichmentResultData) -> EnrichmentResultResponse:
+    return EnrichmentResultResponse(
+        ioc_value=r.ioc_value,
+        ioc_type=r.ioc_type.value,
+        source=r.source,
+        verdict=r.verdict.value,
+        score=r.score,
+        tags=r.tags,
+        country=r.country,
+        asn=r.asn,
+        org=r.org,
+        last_seen=r.last_seen,
+        raw_data=r.raw_data,
+        error=r.error,
+        latency_ms=r.latency_ms,
+    )
+
+
+def _compute_overall(results: list[EnrichmentResultData]) -> tuple[str, float]:
+    """Compute overall verdict from multiple provider results."""
+    if not results:
+        return Verdict.UNKNOWN.value, 0.0
+
+    verdicts = [r.verdict for r in results if r.verdict != Verdict.ERROR]
+    if not verdicts:
+        return Verdict.ERROR.value, 0.0
+
+    if Verdict.MALICIOUS in verdicts:
+        return Verdict.MALICIOUS.value, max(r.score for r in results)
+    elif Verdict.SUSPICIOUS in verdicts:
+        return Verdict.SUSPICIOUS.value, max(r.score for r in results)
+    elif Verdict.CLEAN in verdicts:
+        return Verdict.CLEAN.value, 0.0
+    return Verdict.UNKNOWN.value, 0.0
+
+
+# ── Routes ────────────────────────────────────────────────────────────
+
+
+@router.post(
+    "/ioc",
+    response_model=EnrichIOCResponse,
+    summary="Enrich a single IOC",
+    description="Query all configured providers for an IOC (IP, hash, domain, URL).",
+)
+async def enrich_ioc(
+    body: EnrichIOCRequest,
+    db: AsyncSession = Depends(get_db),
+):
+    try:
+        ioc_type = IOCType(body.ioc_type)
+    except ValueError:
+        raise HTTPException(
+            status_code=400,
+            detail=f"Invalid IOC type: {body.ioc_type}. Valid: {[t.value for t in IOCType]}",
+        )
+
+    results = await enrichment_engine.enrich_ioc(
+        body.ioc_value, ioc_type, db=db, skip_cache=body.skip_cache,
+    )
+
+    overall_verdict, overall_score = _compute_overall(results)
+
+    return EnrichIOCResponse(
+        ioc_value=body.ioc_value,
+        ioc_type=body.ioc_type,
+        results=[_to_response(r) for r in results],
+        overall_verdict=overall_verdict,
+        overall_score=overall_score,
+    )
+
+
+@router.post(
+    "/batch",
+    response_model=EnrichBatchResponse,
+    summary="Enrich a batch of IOCs",
+    description="Enrich up to 50 IOCs at once across all providers.",
+)
+async def enrich_batch(
+    body: EnrichBatchRequest,
+    db: AsyncSession = Depends(get_db),
+):
+    iocs = []
+    for item in body.iocs:
+        try:
+            ioc_type = IOCType(item["type"])
+            iocs.append((item["value"], ioc_type))
+        except (KeyError, ValueError):
+            continue
+
+    if not iocs:
+        raise HTTPException(status_code=400, detail="No valid IOCs provided")
+
+    all_results = await enrichment_engine.enrich_batch(iocs, db=db)
+
+    return EnrichBatchResponse(
+        results={
+            k: [_to_response(r) for r in v]
+            for k, v in all_results.items()
+        },
+        total_enriched=len(all_results),
+    )
+
+
+@router.post(
+    "/dataset/{dataset_id}",
+    summary="Auto-enrich IOCs in a dataset",
+    description="Automatically extract and enrich IOCs from a dataset's IOC columns.",
+)
+async def enrich_dataset(
+    dataset_id: str,
+    max_iocs: int = Query(50, ge=1, le=200),
+    db: AsyncSession = Depends(get_db),
+):
+    from app.db.repositories.datasets import DatasetRepository
+
+    repo = DatasetRepository(db)
+    dataset = await repo.get_dataset(dataset_id)
+    if not dataset:
+        raise HTTPException(status_code=404, detail="Dataset not found")
+
+    if not dataset.ioc_columns:
+        return {"message": "No IOC columns detected in this dataset", "results": {}}
+
+    rows = await repo.get_rows(dataset_id, limit=1000)
+    row_data = [r.data for r in rows]
+
+    all_results = await enrichment_engine.enrich_dataset_iocs(
+        rows=row_data,
+        ioc_columns=dataset.ioc_columns,
+        db=db,
+        max_iocs=max_iocs,
+    )
+
+    return {
+        "dataset_id": dataset_id,
+        "dataset_name": dataset.name,
+        "ioc_columns": dataset.ioc_columns,
+        "results": {
+            k: [_to_response(r) for r in v]
+            for k, v in all_results.items()
+        },
+        "total_enriched": len(all_results),
+    }
+
+
+@router.get(
+    "/status",
+    summary="Enrichment engine status",
+    description="Check which providers are configured and available.",
+)
+async def enrichment_status():
+    return enrichment_engine.status()
--- a/backend/app/api/routes/hunts.py
+++ b/backend/app/api/routes/hunts.py
@@ -0,0 +1,158 @@
+"""API routes for hunt management."""
+
+import logging
+
+from fastapi import APIRouter, Depends, HTTPException, Query
+from pydantic import BaseModel, Field
+from sqlalchemy import select, func
+from sqlalchemy.ext.asyncio import AsyncSession
+
+from app.db import get_db
+from app.db.models import Hunt, Conversation, Message
+
+logger = logging.getLogger(__name__)
+
+router = APIRouter(prefix="/api/hunts", tags=["hunts"])
+
+
+# ── Models ────────────────────────────────────────────────────────────
+
+
+class HuntCreate(BaseModel):
+    name: str = Field(..., max_length=256)
+    description: str | None = None
+
+
+class HuntUpdate(BaseModel):
+    name: str | None = None
+    description: str | None = None
+    status: str | None = None  # active | closed | archived
+
+
+class HuntResponse(BaseModel):
+    id: str
+    name: str
+    description: str | None
+    status: str
+    owner_id: str | None
+    created_at: str
+    updated_at: str
+    dataset_count: int = 0
+    hypothesis_count: int = 0
+
+
+class HuntListResponse(BaseModel):
+    hunts: list[HuntResponse]
+    total: int
+
+
+# ── Routes ────────────────────────────────────────────────────────────
+
+
+@router.post("", response_model=HuntResponse, summary="Create a new hunt")
+async def create_hunt(body: HuntCreate, db: AsyncSession = Depends(get_db)):
+    hunt = Hunt(name=body.name, description=body.description)
+    db.add(hunt)
+    await db.flush()
+    return HuntResponse(
+        id=hunt.id,
+        name=hunt.name,
+        description=hunt.description,
+        status=hunt.status,
+        owner_id=hunt.owner_id,
+        created_at=hunt.created_at.isoformat(),
+        updated_at=hunt.updated_at.isoformat(),
+    )
+
+
+@router.get("", response_model=HuntListResponse, summary="List hunts")
+async def list_hunts(
+    status: str | None = Query(None),
+    limit: int = Query(50, ge=1, le=500),
+    offset: int = Query(0, ge=0),
+    db: AsyncSession = Depends(get_db),
+):
+    stmt = select(Hunt).order_by(Hunt.updated_at.desc())
+    if status:
+        stmt = stmt.where(Hunt.status == status)
+    stmt = stmt.limit(limit).offset(offset)
+    result = await db.execute(stmt)
+    hunts = result.scalars().all()
+
+    count_stmt = select(func.count(Hunt.id))
+    if status:
+        count_stmt = count_stmt.where(Hunt.status == status)
+    total = (await db.execute(count_stmt)).scalar_one()
+
+    return HuntListResponse(
+        hunts=[
+            HuntResponse(
+                id=h.id,
+                name=h.name,
+                description=h.description,
+                status=h.status,
+                owner_id=h.owner_id,
+                created_at=h.created_at.isoformat(),
+                updated_at=h.updated_at.isoformat(),
+                dataset_count=len(h.datasets) if h.datasets else 0,
+                hypothesis_count=len(h.hypotheses) if h.hypotheses else 0,
+            )
+            for h in hunts
+        ],
+        total=total,
+    )
+
+
+@router.get("/{hunt_id}", response_model=HuntResponse, summary="Get hunt details")
+async def get_hunt(hunt_id: str, db: AsyncSession = Depends(get_db)):
+    result = await db.execute(select(Hunt).where(Hunt.id == hunt_id))
+    hunt = result.scalar_one_or_none()
+    if not hunt:
+        raise HTTPException(status_code=404, detail="Hunt not found")
+    return HuntResponse(
+        id=hunt.id,
+        name=hunt.name,
+        description=hunt.description,
+        status=hunt.status,
+        owner_id=hunt.owner_id,
+        created_at=hunt.created_at.isoformat(),
+        updated_at=hunt.updated_at.isoformat(),
+        dataset_count=len(hunt.datasets) if hunt.datasets else 0,
+        hypothesis_count=len(hunt.hypotheses) if hunt.hypotheses else 0,
+    )
+
+
+@router.put("/{hunt_id}", response_model=HuntResponse, summary="Update a hunt")
+async def update_hunt(
+    hunt_id: str, body: HuntUpdate, db: AsyncSession = Depends(get_db)
+):
+    result = await db.execute(select(Hunt).where(Hunt.id == hunt_id))
+    hunt = result.scalar_one_or_none()
+    if not hunt:
+        raise HTTPException(status_code=404, detail="Hunt not found")
+    if body.name is not None:
+        hunt.name = body.name
+    if body.description is not None:
+        hunt.description = body.description
+    if body.status is not None:
+        hunt.status = body.status
+    await db.flush()
+    return HuntResponse(
+        id=hunt.id,
+        name=hunt.name,
+        description=hunt.description,
+        status=hunt.status,
+        owner_id=hunt.owner_id,
+        created_at=hunt.created_at.isoformat(),
+        updated_at=hunt.updated_at.isoformat(),
+    )
+
+
+@router.delete("/{hunt_id}", summary="Delete a hunt")
+async def delete_hunt(hunt_id: str, db: AsyncSession = Depends(get_db)):
+    result = await db.execute(select(Hunt).where(Hunt.id == hunt_id))
+    hunt = result.scalar_one_or_none()
+    if not hunt:
+        raise HTTPException(status_code=404, detail="Hunt not found")
+    await db.delete(hunt)
+    return {"message": "Hunt deleted", "id": hunt_id}
--- a/backend/app/api/routes/keywords.py
+++ b/backend/app/api/routes/keywords.py
@@ -0,0 +1,257 @@
+"""API routes for AUP keyword themes, keyword CRUD, and scanning."""
+
+import logging
+from typing import Optional
+
+from fastapi import APIRouter, Depends, HTTPException, Query
+from pydantic import BaseModel, Field
+from sqlalchemy import select, func, delete
+from sqlalchemy.ext.asyncio import AsyncSession
+
+from app.db import get_db
+from app.db.models import KeywordTheme, Keyword
+from app.services.scanner import KeywordScanner
+
+logger = logging.getLogger(__name__)
+
+router = APIRouter(prefix="/api/keywords", tags=["keywords"])
+
+
+# ── Pydantic schemas ──────────────────────────────────────────────────
+
+
+class ThemeCreate(BaseModel):
+    name: str = Field(..., min_length=1, max_length=128)
+    color: str = Field(default="#9e9e9e", max_length=16)
+    enabled: bool = True
+
+
+class ThemeUpdate(BaseModel):
+    name: str | None = None
+    color: str | None = None
+    enabled: bool | None = None
+
+
+class KeywordOut(BaseModel):
+    id: int
+    theme_id: str
+    value: str
+    is_regex: bool
+    created_at: str
+
+
+class ThemeOut(BaseModel):
+    id: str
+    name: str
+    color: str
+    enabled: bool
+    is_builtin: bool
+    created_at: str
+    keyword_count: int
+    keywords: list[KeywordOut]
+
+
+class ThemeListResponse(BaseModel):
+    themes: list[ThemeOut]
+    total: int
+
+
+class KeywordCreate(BaseModel):
+    value: str = Field(..., min_length=1, max_length=256)
+    is_regex: bool = False
+
+
+class KeywordBulkCreate(BaseModel):
+    values: list[str] = Field(..., min_items=1)
+    is_regex: bool = False
+
+
+class ScanRequest(BaseModel):
+    dataset_ids: list[str] | None = None  # None → all datasets
+    theme_ids: list[str] | None = None    # None → all enabled themes
+    scan_hunts: bool = True
+    scan_annotations: bool = True
+    scan_messages: bool = True
+
+
+class ScanHit(BaseModel):
+    theme_name: str
+    theme_color: str
+    keyword: str
+    source_type: str       # dataset_row | hunt | annotation | message
+    source_id: str | int
+    field: str
+    matched_value: str
+    row_index: int | None = None
+    dataset_name: str | None = None
+
+
+class ScanResponse(BaseModel):
+    total_hits: int
+    hits: list[ScanHit]
+    themes_scanned: int
+    keywords_scanned: int
+    rows_scanned: int
+
+
+# ── Helpers ───────────────────────────────────────────────────────────
+
+
+def _theme_to_out(t: KeywordTheme) -> ThemeOut:
+    return ThemeOut(
+        id=t.id,
+        name=t.name,
+        color=t.color,
+        enabled=t.enabled,
+        is_builtin=t.is_builtin,
+        created_at=t.created_at.isoformat(),
+        keyword_count=len(t.keywords),
+        keywords=[
+            KeywordOut(
+                id=k.id,
+                theme_id=k.theme_id,
+                value=k.value,
+                is_regex=k.is_regex,
+                created_at=k.created_at.isoformat(),
+            )
+            for k in t.keywords
+        ],
+    )
+
+
+# ── Theme CRUD ────────────────────────────────────────────────────────
+
+
+@router.get("/themes", response_model=ThemeListResponse)
+async def list_themes(db: AsyncSession = Depends(get_db)):
+    """List all keyword themes with their keywords."""
+    result = await db.execute(
+        select(KeywordTheme).order_by(KeywordTheme.name)
+    )
+    themes = result.scalars().all()
+    return ThemeListResponse(
+        themes=[_theme_to_out(t) for t in themes],
+        total=len(themes),
+    )
+
+
+@router.post("/themes", response_model=ThemeOut, status_code=201)
+async def create_theme(body: ThemeCreate, db: AsyncSession = Depends(get_db)):
+    """Create a new keyword theme."""
+    exists = await db.scalar(
+        select(KeywordTheme.id).where(KeywordTheme.name == body.name)
+    )
+    if exists:
+        raise HTTPException(409, f"Theme '{body.name}' already exists")
+    theme = KeywordTheme(name=body.name, color=body.color, enabled=body.enabled)
+    db.add(theme)
+    await db.flush()
+    await db.refresh(theme)
+    return _theme_to_out(theme)
+
+
+@router.put("/themes/{theme_id}", response_model=ThemeOut)
+async def update_theme(theme_id: str, body: ThemeUpdate, db: AsyncSession = Depends(get_db)):
+    """Update theme name, color, or enabled status."""
+    theme = await db.get(KeywordTheme, theme_id)
+    if not theme:
+        raise HTTPException(404, "Theme not found")
+    if body.name is not None:
+        # check uniqueness
+        dup = await db.scalar(
+            select(KeywordTheme.id).where(
+                KeywordTheme.name == body.name, KeywordTheme.id != theme_id
+            )
+        )
+        if dup:
+            raise HTTPException(409, f"Theme '{body.name}' already exists")
+        theme.name = body.name
+    if body.color is not None:
+        theme.color = body.color
+    if body.enabled is not None:
+        theme.enabled = body.enabled
+    await db.flush()
+    await db.refresh(theme)
+    return _theme_to_out(theme)
+
+
+@router.delete("/themes/{theme_id}", status_code=204)
+async def delete_theme(theme_id: str, db: AsyncSession = Depends(get_db)):
+    """Delete a theme and all its keywords."""
+    theme = await db.get(KeywordTheme, theme_id)
+    if not theme:
+        raise HTTPException(404, "Theme not found")
+    await db.delete(theme)
+
+
+# ── Keyword CRUD ──────────────────────────────────────────────────────
+
+
+@router.post("/themes/{theme_id}/keywords", response_model=KeywordOut, status_code=201)
+async def add_keyword(theme_id: str, body: KeywordCreate, db: AsyncSession = Depends(get_db)):
+    """Add a single keyword to a theme."""
+    theme = await db.get(KeywordTheme, theme_id)
+    if not theme:
+        raise HTTPException(404, "Theme not found")
+    kw = Keyword(theme_id=theme_id, value=body.value, is_regex=body.is_regex)
+    db.add(kw)
+    await db.flush()
+    await db.refresh(kw)
+    return KeywordOut(
+        id=kw.id, theme_id=kw.theme_id, value=kw.value,
+        is_regex=kw.is_regex, created_at=kw.created_at.isoformat(),
+    )
+
+
+@router.post("/themes/{theme_id}/keywords/bulk", response_model=dict, status_code=201)
+async def add_keywords_bulk(theme_id: str, body: KeywordBulkCreate, db: AsyncSession = Depends(get_db)):
+    """Add multiple keywords to a theme at once."""
+    theme = await db.get(KeywordTheme, theme_id)
+    if not theme:
+        raise HTTPException(404, "Theme not found")
+    added = 0
+    for val in body.values:
+        val = val.strip()
+        if not val:
+            continue
+        db.add(Keyword(theme_id=theme_id, value=val, is_regex=body.is_regex))
+        added += 1
+    await db.flush()
+    return {"added": added, "theme_id": theme_id}
+
+
+@router.delete("/keywords/{keyword_id}", status_code=204)
+async def delete_keyword(keyword_id: int, db: AsyncSession = Depends(get_db)):
+    """Delete a single keyword."""
+    kw = await db.get(Keyword, keyword_id)
+    if not kw:
+        raise HTTPException(404, "Keyword not found")
+    await db.delete(kw)
+
+
+# ── Scan endpoints ────────────────────────────────────────────────────
+
+
+@router.post("/scan", response_model=ScanResponse)
+async def run_scan(body: ScanRequest, db: AsyncSession = Depends(get_db)):
+    """Run AUP keyword scan across selected data sources."""
+    scanner = KeywordScanner(db)
+    result = await scanner.scan(
+        dataset_ids=body.dataset_ids,
+        theme_ids=body.theme_ids,
+        scan_hunts=body.scan_hunts,
+        scan_annotations=body.scan_annotations,
+        scan_messages=body.scan_messages,
+    )
+    return result
+
+
+@router.get("/scan/quick", response_model=ScanResponse)
+async def quick_scan(
+    dataset_id: str = Query(..., description="Dataset to scan"),
+    db: AsyncSession = Depends(get_db),
+):
+    """Quick scan a single dataset with all enabled themes."""
+    scanner = KeywordScanner(db)
+    result = await scanner.scan(dataset_ids=[dataset_id])
+    return result
--- a/backend/app/api/routes/network.py
+++ b/backend/app/api/routes/network.py
@@ -0,0 +1,28 @@
+"""Network topology API - host inventory endpoint."""
+
+import logging
+
+from fastapi import APIRouter, Depends, HTTPException, Query
+from sqlalchemy.ext.asyncio import AsyncSession
+
+from app.db import get_db
+from app.services.host_inventory import build_host_inventory
+
+logger = logging.getLogger(__name__)
+router = APIRouter(prefix="/api/network", tags=["network"])
+
+
+@router.get("/host-inventory")
+async def get_host_inventory(
+    hunt_id: str = Query(..., description="Hunt ID to build inventory for"),
+    db: AsyncSession = Depends(get_db),
+):
+    """Build a deduplicated host inventory from all datasets in a hunt.
+
+    Returns unique hosts with hostname, IPs, OS, logged-in users, and
+    network connections derived from netstat/connection data.
+    """
+    result = await build_host_inventory(hunt_id, db)
+    if result["stats"]["total_hosts"] == 0:
+        return result
+    return result
--- a/backend/app/api/routes/reports.py
+++ b/backend/app/api/routes/reports.py
@@ -0,0 +1,67 @@
+"""API routes for report generation and export."""
+
+import logging
+
+from fastapi import APIRouter, Depends, HTTPException, Query
+from fastapi.responses import HTMLResponse, PlainTextResponse
+from sqlalchemy.ext.asyncio import AsyncSession
+
+from app.db import get_db
+from app.services.reports import report_generator
+
+logger = logging.getLogger(__name__)
+
+router = APIRouter(prefix="/api/reports", tags=["reports"])
+
+
+@router.get(
+    "/hunt/{hunt_id}",
+    summary="Generate hunt investigation report",
+    description="Generate a comprehensive report for a hunt. Supports JSON, HTML, and CSV formats.",
+)
+async def generate_hunt_report(
+    hunt_id: str,
+    format: str = Query("json", description="Report format: json, html, csv"),
+    include_rows: bool = Query(False, description="Include raw data rows"),
+    max_rows: int = Query(500, ge=0, le=5000, description="Max rows to include"),
+    db: AsyncSession = Depends(get_db),
+):
+    result = await report_generator.generate_hunt_report(
+        hunt_id, db, format=format,
+        include_rows=include_rows, max_rows=max_rows,
+    )
+
+    if isinstance(result, dict) and result.get("error"):
+        raise HTTPException(status_code=404, detail=result["error"])
+
+    if format == "html":
+        return HTMLResponse(content=result, headers={
+            "Content-Disposition": f"inline; filename=threathunt_report_{hunt_id}.html",
+        })
+    elif format == "csv":
+        return PlainTextResponse(content=result, media_type="text/csv", headers={
+            "Content-Disposition": f"attachment; filename=threathunt_report_{hunt_id}.csv",
+        })
+    else:
+        return result
+
+
+@router.get(
+    "/hunt/{hunt_id}/summary",
+    summary="Quick hunt summary",
+    description="Get a lightweight summary of the hunt for dashboard display.",
+)
+async def hunt_summary(
+    hunt_id: str,
+    db: AsyncSession = Depends(get_db),
+):
+    result = await report_generator.generate_hunt_report(
+        hunt_id, db, format="json", include_rows=False,
+    )
+    if isinstance(result, dict) and result.get("error"):
+        raise HTTPException(status_code=404, detail=result["error"])
+
+    return {
+        "hunt": result.get("hunt"),
+        "summary": result.get("summary"),
+    }
--- a/backend/app/config.py
+++ b/backend/app/config.py
@@ -0,0 +1,121 @@
+"""Application configuration — single source of truth for all settings.
+
+Loads from environment variables with sensible defaults for local dev.
+"""
+
+import os
+from typing import Literal
+
+from pydantic_settings import BaseSettings
+from pydantic import Field
+
+
+class AppConfig(BaseSettings):
+    """Central configuration for the entire ThreatHunt application."""
+
+    # ── General ────────────────────────────────────────────────────────
+    APP_NAME: str = "ThreatHunt"
+    APP_VERSION: str = "0.3.0"
+    DEBUG: bool = Field(default=False, description="Enable debug mode")
+
+    # ── Database ───────────────────────────────────────────────────────
+    DATABASE_URL: str = Field(
+        default="sqlite+aiosqlite:///./threathunt.db",
+        description="Async SQLAlchemy database URL. "
+        "Use sqlite+aiosqlite:///./threathunt.db for local dev, "
+        "postgresql+asyncpg://user:pass@host/db for production.",
+    )
+
+    # ── CORS ───────────────────────────────────────────────────────────
+    ALLOWED_ORIGINS: str = Field(
+        default="http://localhost:3000,http://localhost:8000",
+        description="Comma-separated list of allowed CORS origins",
+    )
+
+    # ── File uploads ───────────────────────────────────────────────────
+    MAX_UPLOAD_SIZE_MB: int = Field(default=500, description="Max CSV upload in MB")
+    UPLOAD_DIR: str = Field(default="./uploads", description="Directory for uploaded files")
+
+    # ── LLM Cluster — Wile & Roadrunner ────────────────────────────────
+    OPENWEBUI_URL: str = Field(
+        default="https://ai.guapo613.beer",
+        description="Open WebUI cluster endpoint (OpenAI-compatible API)",
+    )
+    OPENWEBUI_API_KEY: str = Field(
+        default="",
+        description="API key for Open WebUI (if required)",
+    )
+    WILE_HOST: str = Field(
+        default="100.110.190.12",
+        description="Tailscale IP for Wile (heavy models)",
+    )
+    WILE_OLLAMA_PORT: int = Field(default=11434, description="Ollama port on Wile")
+    ROADRUNNER_HOST: str = Field(
+        default="100.110.190.11",
+        description="Tailscale IP for Roadrunner (fast models + vision)",
+    )
+    ROADRUNNER_OLLAMA_PORT: int = Field(
+        default=11434, description="Ollama port on Roadrunner"
+    )
+
+    # ── LLM Routing defaults ──────────────────────────────────────────
+    DEFAULT_FAST_MODEL: str = Field(
+        default="llama3.1:latest",
+        description="Default model for quick chat / simple queries",
+    )
+    DEFAULT_HEAVY_MODEL: str = Field(
+        default="llama3.1:70b-instruct-q4_K_M",
+        description="Default model for deep analysis / debate",
+    )
+    DEFAULT_CODE_MODEL: str = Field(
+        default="qwen2.5-coder:32b",
+        description="Default model for code / script analysis",
+    )
+    DEFAULT_VISION_MODEL: str = Field(
+        default="llama3.2-vision:11b",
+        description="Default model for image / screenshot analysis",
+    )
+    DEFAULT_EMBEDDING_MODEL: str = Field(
+        default="bge-m3:latest",
+        description="Default embedding model",
+    )
+
+    # ── Agent behaviour ───────────────────────────────────────────────
+    AGENT_MAX_TOKENS: int = Field(default=2048, description="Max tokens per agent response")
+    AGENT_TEMPERATURE: float = Field(default=0.3, description="LLM temperature for guidance")
+    AGENT_HISTORY_LENGTH: int = Field(default=10, description="Messages to keep in context")
+    FILTER_SENSITIVE_DATA: bool = Field(default=True, description="Redact sensitive patterns")
+
+    # ── Enrichment API keys ───────────────────────────────────────────
+    VIRUSTOTAL_API_KEY: str = Field(default="", description="VirusTotal API key")
+    ABUSEIPDB_API_KEY: str = Field(default="", description="AbuseIPDB API key")
+    SHODAN_API_KEY: str = Field(default="", description="Shodan API key")
+
+    # ── Auth ──────────────────────────────────────────────────────────
+    JWT_SECRET: str = Field(
+        default="CHANGE-ME-IN-PRODUCTION-USE-A-REAL-SECRET",
+        description="Secret for JWT signing",
+    )
+    JWT_ACCESS_TOKEN_MINUTES: int = Field(default=60, description="Access token lifetime")
+    JWT_REFRESH_TOKEN_DAYS: int = Field(default=7, description="Refresh token lifetime")
+
+    model_config = {"env_prefix": "TH_", "env_file": ".env", "extra": "ignore"}
+
+    @property
+    def cors_origins(self) -> list[str]:
+        return [o.strip() for o in self.ALLOWED_ORIGINS.split(",") if o.strip()]
+
+    @property
+    def wile_url(self) -> str:
+        return f"http://{self.WILE_HOST}:{self.WILE_OLLAMA_PORT}"
+
+    @property
+    def roadrunner_url(self) -> str:
+        return f"http://{self.ROADRUNNER_HOST}:{self.ROADRUNNER_OLLAMA_PORT}"
+
+    @property
+    def max_upload_bytes(self) -> int:
+        return self.MAX_UPLOAD_SIZE_MB * 1024 * 1024
+
+
+settings = AppConfig()
--- a/backend/app/db/init.py
+++ b/backend/app/db/init.py
@@ -0,0 +1,12 @@
+"""Database package."""
+
+from .engine import Base, get_db, init_db, dispose_db, engine, async_session_factory
+
+__all__ = [
+    "Base",
+    "get_db",
+    "init_db",
+    "dispose_db",
+    "engine",
+    "async_session_factory",
+]
--- a/backend/app/db/engine.py
+++ b/backend/app/db/engine.py
@@ -0,0 +1,75 @@
+"""Database engine, session factory, and base model.
+
+Uses async SQLAlchemy with aiosqlite for local dev and asyncpg for production PostgreSQL.
+"""
+
+from sqlalchemy import event
+from sqlalchemy.ext.asyncio import (
+    AsyncSession,
+    async_sessionmaker,
+    create_async_engine,
+)
+from sqlalchemy.orm import DeclarativeBase
+
+from app.config import settings
+
+_is_sqlite = settings.DATABASE_URL.startswith("sqlite")
+
+_engine_kwargs: dict = dict(
+    echo=settings.DEBUG,
+    future=True,
+)
+
+if _is_sqlite:
+    _engine_kwargs["connect_args"] = {"timeout": 30}
+    _engine_kwargs["pool_size"] = 1
+    _engine_kwargs["max_overflow"] = 0
+
+engine = create_async_engine(settings.DATABASE_URL, **_engine_kwargs)
+
+
+@event.listens_for(engine.sync_engine, "connect")
+def _set_sqlite_pragmas(dbapi_conn, connection_record):
+    """Enable WAL mode and tune busy-timeout for SQLite connections."""
+    if _is_sqlite:
+        cursor = dbapi_conn.cursor()
+        cursor.execute("PRAGMA journal_mode=WAL")
+        cursor.execute("PRAGMA busy_timeout=5000")
+        cursor.execute("PRAGMA synchronous=NORMAL")
+        cursor.close()
+
+
+async_session_factory = async_sessionmaker(
+    engine,
+    class_=AsyncSession,
+    expire_on_commit=False,
+)
+
+
+class Base(DeclarativeBase):
+    """Base class for all ORM models."""
+    pass
+
+
+async def get_db() -> AsyncSession:  # type: ignore[misc]
+    """FastAPI dependency that yields an async DB session."""
+    async with async_session_factory() as session:
+        try:
+            yield session
+            await session.commit()
+        except Exception:
+            await session.rollback()
+            raise
+        finally:
+            await session.close()
+
+
+async def init_db() -> None:
+    """Create all tables (for dev / first-run). In production use Alembic."""
+    async with engine.begin() as conn:
+        await conn.run_sync(Base.metadata.create_all)
+
+
+async def dispose_db() -> None:
+    """Dispose of the engine connection pool."""
+    await engine.dispose()
--- a/backend/app/db/models.py
+++ b/backend/app/db/models.py
@@ -0,0 +1,402 @@
+"""SQLAlchemy ORM models for ThreatHunt.
+
+All persistent entities: datasets, hunts, conversations, annotations,
+hypotheses, enrichment results, users, and AI analysis tables.
+"""
+
+import uuid
+from datetime import datetime, timezone
+from typing import Optional
+
+from sqlalchemy import (
+    Boolean,
+    DateTime,
+    Float,
+    ForeignKey,
+    Integer,
+    String,
+    Text,
+    JSON,
+    Index,
+)
+from sqlalchemy.orm import Mapped, mapped_column, relationship
+
+from .engine import Base
+
+
+def _utcnow() -> datetime:
+    return datetime.now(timezone.utc)
+
+
+def _new_id() -> str:
+    return uuid.uuid4().hex
+
+
+# -- Users ---
+
+class User(Base):
+    __tablename__ = "users"
+
+    id: Mapped[str] = mapped_column(String(32), primary_key=True, default=_new_id)
+    username: Mapped[str] = mapped_column(String(64), unique=True, nullable=False, index=True)
+    email: Mapped[str] = mapped_column(String(256), unique=True, nullable=False)
+    hashed_password: Mapped[str] = mapped_column(String(256), nullable=False)
+    role: Mapped[str] = mapped_column(String(16), default="analyst")
+    is_active: Mapped[bool] = mapped_column(Boolean, default=True)
+    created_at: Mapped[datetime] = mapped_column(DateTime(timezone=True), default=_utcnow)
+
+    hunts: Mapped[list["Hunt"]] = relationship(back_populates="owner", lazy="selectin")
+    annotations: Mapped[list["Annotation"]] = relationship(back_populates="author", lazy="selectin")
+
+
+# -- Hunts ---
+
+class Hunt(Base):
+    __tablename__ = "hunts"
+
+    id: Mapped[str] = mapped_column(String(32), primary_key=True, default=_new_id)
+    name: Mapped[str] = mapped_column(String(256), nullable=False)
+    description: Mapped[Optional[str]] = mapped_column(Text, nullable=True)
+    status: Mapped[str] = mapped_column(String(32), default="active")
+    owner_id: Mapped[Optional[str]] = mapped_column(
+        String(32), ForeignKey("users.id"), nullable=True
+    )
+    created_at: Mapped[datetime] = mapped_column(DateTime(timezone=True), default=_utcnow)
+    updated_at: Mapped[datetime] = mapped_column(
+        DateTime(timezone=True), default=_utcnow, onupdate=_utcnow
+    )
+
+    owner: Mapped[Optional["User"]] = relationship(back_populates="hunts", lazy="selectin")
+    datasets: Mapped[list["Dataset"]] = relationship(back_populates="hunt", lazy="selectin")
+    conversations: Mapped[list["Conversation"]] = relationship(back_populates="hunt", lazy="selectin")
+    hypotheses: Mapped[list["Hypothesis"]] = relationship(back_populates="hunt", lazy="selectin")
+    host_profiles: Mapped[list["HostProfile"]] = relationship(back_populates="hunt", lazy="noload")
+    reports: Mapped[list["HuntReport"]] = relationship(back_populates="hunt", lazy="noload")
+
+
+# -- Datasets ---
+
+class Dataset(Base):
+    __tablename__ = "datasets"
+
+    id: Mapped[str] = mapped_column(String(32), primary_key=True, default=_new_id)
+    name: Mapped[str] = mapped_column(String(256), nullable=False, index=True)
+    filename: Mapped[str] = mapped_column(String(512), nullable=False)
+    source_tool: Mapped[Optional[str]] = mapped_column(String(64), nullable=True)
+    row_count: Mapped[int] = mapped_column(Integer, default=0)
+    column_schema: Mapped[Optional[dict]] = mapped_column(JSON, nullable=True)
+    normalized_columns: Mapped[Optional[dict]] = mapped_column(JSON, nullable=True)
+    ioc_columns: Mapped[Optional[dict]] = mapped_column(JSON, nullable=True)
+    file_size_bytes: Mapped[int] = mapped_column(Integer, default=0)
+    encoding: Mapped[Optional[str]] = mapped_column(String(32), nullable=True)
+    delimiter: Mapped[Optional[str]] = mapped_column(String(4), nullable=True)
+    time_range_start: Mapped[Optional[datetime]] = mapped_column(DateTime(timezone=True), nullable=True)
+    time_range_end: Mapped[Optional[datetime]] = mapped_column(DateTime(timezone=True), nullable=True)
+
+    # New Phase 1-2 columns
+    processing_status: Mapped[str] = mapped_column(String(20), default="ready")
+    artifact_type: Mapped[Optional[str]] = mapped_column(String(128), nullable=True)
+    error_message: Mapped[Optional[str]] = mapped_column(Text, nullable=True)
+    file_path: Mapped[Optional[str]] = mapped_column(String(512), nullable=True)
+
+    hunt_id: Mapped[Optional[str]] = mapped_column(
+        String(32), ForeignKey("hunts.id"), nullable=True
+    )
+    uploaded_by: Mapped[Optional[str]] = mapped_column(String(32), nullable=True)
+    created_at: Mapped[datetime] = mapped_column(DateTime(timezone=True), default=_utcnow)
+
+    hunt: Mapped[Optional["Hunt"]] = relationship(back_populates="datasets", lazy="selectin")
+    rows: Mapped[list["DatasetRow"]] = relationship(
+        back_populates="dataset", lazy="noload", cascade="all, delete-orphan"
+    )
+    triage_results: Mapped[list["TriageResult"]] = relationship(
+        back_populates="dataset", lazy="noload", cascade="all, delete-orphan"
+    )
+
+    __table_args__ = (
+        Index("ix_datasets_hunt", "hunt_id"),
+        Index("ix_datasets_status", "processing_status"),
+    )
+
+
+class DatasetRow(Base):
+    __tablename__ = "dataset_rows"
+
+    id: Mapped[int] = mapped_column(Integer, primary_key=True, autoincrement=True)
+    dataset_id: Mapped[str] = mapped_column(
+        String(32), ForeignKey("datasets.id", ondelete="CASCADE"), nullable=False
+    )
+    row_index: Mapped[int] = mapped_column(Integer, nullable=False)
+    data: Mapped[dict] = mapped_column(JSON, nullable=False)
+    normalized_data: Mapped[Optional[dict]] = mapped_column(JSON, nullable=True)
+
+    dataset: Mapped["Dataset"] = relationship(back_populates="rows")
+    annotations: Mapped[list["Annotation"]] = relationship(
+        back_populates="row", lazy="noload"
+    )
+
+    __table_args__ = (
+        Index("ix_dataset_rows_dataset", "dataset_id"),
+        Index("ix_dataset_rows_dataset_idx", "dataset_id", "row_index"),
+    )
+
+
+# -- Conversations ---
+
+class Conversation(Base):
+    __tablename__ = "conversations"
+
+    id: Mapped[str] = mapped_column(String(32), primary_key=True, default=_new_id)
+    title: Mapped[Optional[str]] = mapped_column(String(256), nullable=True)
+    hunt_id: Mapped[Optional[str]] = mapped_column(
+        String(32), ForeignKey("hunts.id"), nullable=True
+    )
+    dataset_id: Mapped[Optional[str]] = mapped_column(
+        String(32), ForeignKey("datasets.id"), nullable=True
+    )
+    created_at: Mapped[datetime] = mapped_column(DateTime(timezone=True), default=_utcnow)
+    updated_at: Mapped[datetime] = mapped_column(
+        DateTime(timezone=True), default=_utcnow, onupdate=_utcnow
+    )
+
+    hunt: Mapped[Optional["Hunt"]] = relationship(back_populates="conversations", lazy="selectin")
+    messages: Mapped[list["Message"]] = relationship(
+        back_populates="conversation", lazy="selectin", cascade="all, delete-orphan",
+        order_by="Message.created_at",
+    )
+
+
+class Message(Base):
+    __tablename__ = "messages"
+
+    id: Mapped[int] = mapped_column(Integer, primary_key=True, autoincrement=True)
+    conversation_id: Mapped[str] = mapped_column(
+        String(32), ForeignKey("conversations.id", ondelete="CASCADE"), nullable=False
+    )
+    role: Mapped[str] = mapped_column(String(16), nullable=False)
+    content: Mapped[str] = mapped_column(Text, nullable=False)
+    model_used: Mapped[Optional[str]] = mapped_column(String(128), nullable=True)
+    node_used: Mapped[Optional[str]] = mapped_column(String(64), nullable=True)
+    token_count: Mapped[Optional[int]] = mapped_column(Integer, nullable=True)
+    latency_ms: Mapped[Optional[int]] = mapped_column(Integer, nullable=True)
+    response_meta: Mapped[Optional[dict]] = mapped_column(JSON, nullable=True)
+    created_at: Mapped[datetime] = mapped_column(DateTime(timezone=True), default=_utcnow)
+
+    conversation: Mapped["Conversation"] = relationship(back_populates="messages")
+
+    __table_args__ = (
+        Index("ix_messages_conversation", "conversation_id"),
+    )
+
+
+# -- Annotations ---
+
+class Annotation(Base):
+    __tablename__ = "annotations"
+
+    id: Mapped[str] = mapped_column(String(32), primary_key=True, default=_new_id)
+    row_id: Mapped[Optional[int]] = mapped_column(
+        Integer, ForeignKey("dataset_rows.id", ondelete="SET NULL"), nullable=True
+    )
+    dataset_id: Mapped[Optional[str]] = mapped_column(
+        String(32), ForeignKey("datasets.id"), nullable=True
+    )
+    author_id: Mapped[Optional[str]] = mapped_column(
+        String(32), ForeignKey("users.id"), nullable=True
+    )
+    text: Mapped[str] = mapped_column(Text, nullable=False)
+    severity: Mapped[str] = mapped_column(String(16), default="info")
+    tag: Mapped[Optional[str]] = mapped_column(String(32), nullable=True)
+    highlight_color: Mapped[Optional[str]] = mapped_column(String(16), nullable=True)
+    created_at: Mapped[datetime] = mapped_column(DateTime(timezone=True), default=_utcnow)
+    updated_at: Mapped[datetime] = mapped_column(
+        DateTime(timezone=True), default=_utcnow, onupdate=_utcnow
+    )
+
+    row: Mapped[Optional["DatasetRow"]] = relationship(back_populates="annotations")
+    author: Mapped[Optional["User"]] = relationship(back_populates="annotations")
+
+    __table_args__ = (
+        Index("ix_annotations_dataset", "dataset_id"),
+        Index("ix_annotations_row", "row_id"),
+    )
+
+
+# -- Hypotheses ---
+
+class Hypothesis(Base):
+    __tablename__ = "hypotheses"
+
+    id: Mapped[str] = mapped_column(String(32), primary_key=True, default=_new_id)
+    hunt_id: Mapped[Optional[str]] = mapped_column(
+        String(32), ForeignKey("hunts.id"), nullable=True
+    )
+    title: Mapped[str] = mapped_column(String(256), nullable=False)
+    description: Mapped[Optional[str]] = mapped_column(Text, nullable=True)
+    mitre_technique: Mapped[Optional[str]] = mapped_column(String(32), nullable=True)
+    status: Mapped[str] = mapped_column(String(16), default="draft")
+    evidence_row_ids: Mapped[Optional[list]] = mapped_column(JSON, nullable=True)
+    evidence_notes: Mapped[Optional[str]] = mapped_column(Text, nullable=True)
+    created_at: Mapped[datetime] = mapped_column(DateTime(timezone=True), default=_utcnow)
+    updated_at: Mapped[datetime] = mapped_column(
+        DateTime(timezone=True), default=_utcnow, onupdate=_utcnow
+    )
+
+    hunt: Mapped[Optional["Hunt"]] = relationship(back_populates="hypotheses", lazy="selectin")
+
+    __table_args__ = (
+        Index("ix_hypotheses_hunt", "hunt_id"),
+    )
+
+
+# -- Enrichment Results ---
+
+class EnrichmentResult(Base):
+    __tablename__ = "enrichment_results"
+
+    id: Mapped[str] = mapped_column(String(32), primary_key=True, default=_new_id)
+    ioc_value: Mapped[str] = mapped_column(String(512), nullable=False, index=True)
+    ioc_type: Mapped[str] = mapped_column(String(32), nullable=False)
+    source: Mapped[str] = mapped_column(String(32), nullable=False)
+    verdict: Mapped[Optional[str]] = mapped_column(String(16), nullable=True)
+    confidence: Mapped[Optional[float]] = mapped_column(Float, nullable=True)
+    raw_result: Mapped[Optional[dict]] = mapped_column(JSON, nullable=True)
+    summary: Mapped[Optional[str]] = mapped_column(Text, nullable=True)
+    dataset_id: Mapped[Optional[str]] = mapped_column(
+        String(32), ForeignKey("datasets.id"), nullable=True
+    )
+    cached_at: Mapped[datetime] = mapped_column(DateTime(timezone=True), default=_utcnow)
+    expires_at: Mapped[Optional[datetime]] = mapped_column(DateTime(timezone=True), nullable=True)
+
+    __table_args__ = (
+        Index("ix_enrichment_ioc_source", "ioc_value", "source"),
+    )
+
+
+# -- AUP Keyword Themes & Keywords ---
+
+class KeywordTheme(Base):
+    __tablename__ = "keyword_themes"
+
+    id: Mapped[str] = mapped_column(String(32), primary_key=True, default=_new_id)
+    name: Mapped[str] = mapped_column(String(128), unique=True, nullable=False, index=True)
+    color: Mapped[str] = mapped_column(String(16), default="#9e9e9e")
+    enabled: Mapped[bool] = mapped_column(Boolean, default=True)
+    is_builtin: Mapped[bool] = mapped_column(Boolean, default=False)
+    created_at: Mapped[datetime] = mapped_column(DateTime(timezone=True), default=_utcnow)
+
+    keywords: Mapped[list["Keyword"]] = relationship(
+        back_populates="theme", lazy="selectin", cascade="all, delete-orphan"
+    )
+
+
+class Keyword(Base):
+    __tablename__ = "keywords"
+
+    id: Mapped[int] = mapped_column(Integer, primary_key=True, autoincrement=True)
+    theme_id: Mapped[str] = mapped_column(
+        String(32), ForeignKey("keyword_themes.id", ondelete="CASCADE"), nullable=False
+    )
+    value: Mapped[str] = mapped_column(String(256), nullable=False)
+    is_regex: Mapped[bool] = mapped_column(Boolean, default=False)
+    created_at: Mapped[datetime] = mapped_column(DateTime(timezone=True), default=_utcnow)
+
+    theme: Mapped["KeywordTheme"] = relationship(back_populates="keywords")
+
+    __table_args__ = (
+        Index("ix_keywords_theme", "theme_id"),
+        Index("ix_keywords_value", "value"),
+    )
+
+
+# -- AI Analysis Tables (Phase 2) ---
+
+class TriageResult(Base):
+    __tablename__ = "triage_results"
+
+    id: Mapped[str] = mapped_column(String(32), primary_key=True, default=_new_id)
+    dataset_id: Mapped[str] = mapped_column(
+        String(32), ForeignKey("datasets.id", ondelete="CASCADE"), nullable=False, index=True
+    )
+    row_start: Mapped[int] = mapped_column(Integer, nullable=False)
+    row_end: Mapped[int] = mapped_column(Integer, nullable=False)
+    risk_score: Mapped[float] = mapped_column(Float, default=0.0)
+    verdict: Mapped[str] = mapped_column(String(20), default="pending")
+    findings: Mapped[Optional[list]] = mapped_column(JSON, nullable=True)
+    suspicious_indicators: Mapped[Optional[list]] = mapped_column(JSON, nullable=True)
+    mitre_techniques: Mapped[Optional[list]] = mapped_column(JSON, nullable=True)
+    model_used: Mapped[Optional[str]] = mapped_column(String(128), nullable=True)
+    node_used: Mapped[Optional[str]] = mapped_column(String(64), nullable=True)
+    created_at: Mapped[datetime] = mapped_column(DateTime(timezone=True), default=_utcnow)
+
+    dataset: Mapped["Dataset"] = relationship(back_populates="triage_results")
+
+
+class HostProfile(Base):
+    __tablename__ = "host_profiles"
+
+    id: Mapped[str] = mapped_column(String(32), primary_key=True, default=_new_id)
+    hunt_id: Mapped[str] = mapped_column(
+        String(32), ForeignKey("hunts.id", ondelete="CASCADE"), nullable=False, index=True
+    )
+    hostname: Mapped[str] = mapped_column(String(256), nullable=False)
+    fqdn: Mapped[Optional[str]] = mapped_column(String(512), nullable=True)
+    client_id: Mapped[Optional[str]] = mapped_column(String(64), nullable=True)
+    risk_score: Mapped[float] = mapped_column(Float, default=0.0)
+    risk_level: Mapped[str] = mapped_column(String(20), default="unknown")
+    artifact_summary: Mapped[Optional[dict]] = mapped_column(JSON, nullable=True)
+    timeline_summary: Mapped[Optional[str]] = mapped_column(Text, nullable=True)
+    suspicious_findings: Mapped[Optional[list]] = mapped_column(JSON, nullable=True)
+    mitre_techniques: Mapped[Optional[list]] = mapped_column(JSON, nullable=True)
+    llm_analysis: Mapped[Optional[str]] = mapped_column(Text, nullable=True)
+    model_used: Mapped[Optional[str]] = mapped_column(String(128), nullable=True)
+    node_used: Mapped[Optional[str]] = mapped_column(String(64), nullable=True)
+    created_at: Mapped[datetime] = mapped_column(DateTime(timezone=True), default=_utcnow)
+    updated_at: Mapped[datetime] = mapped_column(
+        DateTime(timezone=True), default=_utcnow, onupdate=_utcnow
+    )
+
+    hunt: Mapped["Hunt"] = relationship(back_populates="host_profiles")
+
+
+class HuntReport(Base):
+    __tablename__ = "hunt_reports"
+
+    id: Mapped[str] = mapped_column(String(32), primary_key=True, default=_new_id)
+    hunt_id: Mapped[str] = mapped_column(
+        String(32), ForeignKey("hunts.id", ondelete="CASCADE"), nullable=False, index=True
+    )
+    status: Mapped[str] = mapped_column(String(20), default="pending")
+    exec_summary: Mapped[Optional[str]] = mapped_column(Text, nullable=True)
+    full_report: Mapped[Optional[str]] = mapped_column(Text, nullable=True)
+    findings: Mapped[Optional[list]] = mapped_column(JSON, nullable=True)
+    recommendations: Mapped[Optional[list]] = mapped_column(JSON, nullable=True)
+    mitre_mapping: Mapped[Optional[dict]] = mapped_column(JSON, nullable=True)
+    ioc_table: Mapped[Optional[list]] = mapped_column(JSON, nullable=True)
+    host_risk_summary: Mapped[Optional[list]] = mapped_column(JSON, nullable=True)
+    models_used: Mapped[Optional[list]] = mapped_column(JSON, nullable=True)
+    generation_time_ms: Mapped[Optional[int]] = mapped_column(Integer, nullable=True)
+    created_at: Mapped[datetime] = mapped_column(DateTime(timezone=True), default=_utcnow)
+    updated_at: Mapped[datetime] = mapped_column(
+        DateTime(timezone=True), default=_utcnow, onupdate=_utcnow
+    )
+
+    hunt: Mapped["Hunt"] = relationship(back_populates="reports")
+
+
+class AnomalyResult(Base):
+    __tablename__ = "anomaly_results"
+
+    id: Mapped[str] = mapped_column(String(32), primary_key=True, default=_new_id)
+    dataset_id: Mapped[str] = mapped_column(
+        String(32), ForeignKey("datasets.id", ondelete="CASCADE"), nullable=False, index=True
+    )
+    row_id: Mapped[Optional[int]] = mapped_column(
+        Integer, ForeignKey("dataset_rows.id", ondelete="CASCADE"), nullable=True
+    )
+    anomaly_score: Mapped[float] = mapped_column(Float, default=0.0)
+    distance_from_centroid: Mapped[Optional[float]] = mapped_column(Float, nullable=True)
+    cluster_id: Mapped[Optional[int]] = mapped_column(Integer, nullable=True)
+    is_outlier: Mapped[bool] = mapped_column(Boolean, default=False)
+    explanation: Mapped[Optional[str]] = mapped_column(Text, nullable=True)
+    created_at: Mapped[datetime] = mapped_column(DateTime(timezone=True), default=_utcnow)
--- a/backend/app/db/repositories/init.py
+++ b/backend/app/db/repositories/init.py
@@ -0,0 +1 @@
+"""Repositories package — typed CRUD operations for each model."""
--- a/backend/app/db/repositories/datasets.py
+++ b/backend/app/db/repositories/datasets.py
@@ -0,0 +1,127 @@
+"""Dataset repository — CRUD operations for datasets and their rows."""
+
+import logging
+from typing import Sequence
+
+from sqlalchemy import select, func, delete
+from sqlalchemy.ext.asyncio import AsyncSession
+
+from app.db.models import Dataset, DatasetRow
+
+logger = logging.getLogger(__name__)
+
+
+class DatasetRepository:
+    """Typed CRUD for Dataset and DatasetRow models."""
+
+    def __init__(self, session: AsyncSession):
+        self.session = session
+
+    # ── Dataset CRUD ──────────────────────────────────────────────────
+
+    async def create_dataset(self, **kwargs) -> Dataset:
+        ds = Dataset(**kwargs)
+        self.session.add(ds)
+        await self.session.flush()
+        return ds
+
+    async def get_dataset(self, dataset_id: str) -> Dataset | None:
+        result = await self.session.execute(
+            select(Dataset).where(Dataset.id == dataset_id)
+        )
+        return result.scalar_one_or_none()
+
+    async def list_datasets(
+        self,
+        hunt_id: str | None = None,
+        limit: int = 100,
+        offset: int = 0,
+    ) -> Sequence[Dataset]:
+        stmt = select(Dataset).order_by(Dataset.created_at.desc())
+        if hunt_id:
+            stmt = stmt.where(Dataset.hunt_id == hunt_id)
+        stmt = stmt.limit(limit).offset(offset)
+        result = await self.session.execute(stmt)
+        return result.scalars().all()
+
+    async def count_datasets(self, hunt_id: str | None = None) -> int:
+        stmt = select(func.count(Dataset.id))
+        if hunt_id:
+            stmt = stmt.where(Dataset.hunt_id == hunt_id)
+        result = await self.session.execute(stmt)
+        return result.scalar_one()
+
+    async def delete_dataset(self, dataset_id: str) -> bool:
+        ds = await self.get_dataset(dataset_id)
+        if not ds:
+            return False
+        await self.session.delete(ds)
+        await self.session.flush()
+        return True
+
+    # ── Row CRUD ──────────────────────────────────────────────────────
+
+    async def bulk_insert_rows(
+        self,
+        dataset_id: str,
+        rows: list[dict],
+        normalized_rows: list[dict] | None = None,
+        batch_size: int = 500,
+    ) -> int:
+        """Insert rows in batches. Returns count inserted."""
+        count = 0
+        for i in range(0, len(rows), batch_size):
+            batch = rows[i : i + batch_size]
+            norm_batch = normalized_rows[i : i + batch_size] if normalized_rows else [None] * len(batch)
+            objects = [
+                DatasetRow(
+                    dataset_id=dataset_id,
+                    row_index=i + j,
+                    data=row,
+                    normalized_data=norm,
+                )
+                for j, (row, norm) in enumerate(zip(batch, norm_batch))
+            ]
+            self.session.add_all(objects)
+            await self.session.flush()
+            count += len(objects)
+        return count
+
+    async def get_rows(
+        self,
+        dataset_id: str,
+        limit: int = 1000,
+        offset: int = 0,
+    ) -> Sequence[DatasetRow]:
+        stmt = (
+            select(DatasetRow)
+            .where(DatasetRow.dataset_id == dataset_id)
+            .order_by(DatasetRow.row_index)
+            .limit(limit)
+            .offset(offset)
+        )
+        result = await self.session.execute(stmt)
+        return result.scalars().all()
+
+    async def count_rows(self, dataset_id: str) -> int:
+        stmt = select(func.count(DatasetRow.id)).where(
+            DatasetRow.dataset_id == dataset_id
+        )
+        result = await self.session.execute(stmt)
+        return result.scalar_one()
+
+    async def get_row_by_index(
+        self, dataset_id: str, row_index: int
+    ) -> DatasetRow | None:
+        stmt = select(DatasetRow).where(
+            DatasetRow.dataset_id == dataset_id,
+            DatasetRow.row_index == row_index,
+        )
+        result = await self.session.execute(stmt)
+        return result.scalar_one_or_none()
+
+    async def delete_rows(self, dataset_id: str) -> int:
+        result = await self.session.execute(
+            delete(DatasetRow).where(DatasetRow.dataset_id == dataset_id)
+        )
+        return result.rowcount  # type: ignore[return-value]
--- a/backend/app/main.py
+++ b/backend/app/main.py
@@ -0,0 +1,123 @@
+"""ThreatHunt backend application.
+
+Wires together: database, CORS, agent routes, dataset routes, hunt routes,
+annotation/hypothesis routes, analysis routes, network routes, job queue,
+load balancer. DB tables are auto-created on startup.
+"""
+
+import logging
+import os
+from contextlib import asynccontextmanager
+
+from fastapi import FastAPI
+from fastapi.middleware.cors import CORSMiddleware
+
+from app.config import settings
+from app.db import init_db, dispose_db
+from app.api.routes.agent_v2 import router as agent_router
+from app.api.routes.datasets import router as datasets_router
+from app.api.routes.hunts import router as hunts_router
+from app.api.routes.annotations import ann_router, hyp_router
+from app.api.routes.enrichment import router as enrichment_router
+from app.api.routes.correlation import router as correlation_router
+from app.api.routes.reports import router as reports_router
+from app.api.routes.auth import router as auth_router
+from app.api.routes.keywords import router as keywords_router
+from app.api.routes.analysis import router as analysis_router
+from app.api.routes.network import router as network_router
+
+logger = logging.getLogger(__name__)
+
+
+@asynccontextmanager
+async def lifespan(app: FastAPI):
+    """Startup / shutdown lifecycle."""
+    logger.info("Starting ThreatHunt API ...")
+    await init_db()
+    logger.info("Database initialised")
+
+    # Ensure uploads directory exists
+    os.makedirs(settings.UPLOAD_DIR, exist_ok=True)
+    logger.info("Upload dir: %s", os.path.abspath(settings.UPLOAD_DIR))
+
+    # Seed default AUP keyword themes
+    from app.db import async_session_factory
+    from app.services.keyword_defaults import seed_defaults
+    async with async_session_factory() as seed_db:
+        await seed_defaults(seed_db)
+    logger.info("AUP keyword defaults checked")
+
+    # Start job queue (Phase 10)
+    from app.services.job_queue import job_queue, register_all_handlers
+    register_all_handlers()
+    await job_queue.start()
+    logger.info("Job queue started (%d workers)", job_queue._max_workers)
+
+    # Start load balancer health loop (Phase 10)
+    from app.services.load_balancer import lb
+    await lb.start_health_loop(interval=30.0)
+    logger.info("Load balancer health loop started")
+
+    yield
+
+    logger.info("Shutting down ...")
+    # Stop job queue
+    from app.services.job_queue import job_queue as jq
+    await jq.stop()
+    logger.info("Job queue stopped")
+
+    # Stop load balancer
+    from app.services.load_balancer import lb as _lb
+    await _lb.stop_health_loop()
+    logger.info("Load balancer stopped")
+
+    from app.agents.providers_v2 import cleanup_client
+    from app.services.enrichment import enrichment_engine
+    await cleanup_client()
+    await enrichment_engine.cleanup()
+    await dispose_db()
+
+
+app = FastAPI(
+    title="ThreatHunt API",
+    description="Analyst-assist threat hunting platform powered by Wile & Roadrunner LLM cluster",
+    version=settings.APP_VERSION,
+    lifespan=lifespan,
+)
+
+app.add_middleware(
+    CORSMiddleware,
+    allow_origins=settings.cors_origins,
+    allow_credentials=True,
+    allow_methods=["*"],
+    allow_headers=["*"],
+)
+
+# Include routes
+app.include_router(auth_router)
+app.include_router(agent_router)
+app.include_router(datasets_router)
+app.include_router(hunts_router)
+app.include_router(ann_router)
+app.include_router(hyp_router)
+app.include_router(enrichment_router)
+app.include_router(correlation_router)
+app.include_router(reports_router)
+app.include_router(keywords_router)
+app.include_router(analysis_router)
+app.include_router(network_router)
+
+
+@app.get("/", tags=["health"])
+async def root():
+    return {
+        "service": "ThreatHunt API",
+        "version": settings.APP_VERSION,
+        "status": "running",
+        "docs": "/docs",
+        "cluster": {
+            "wile": settings.wile_url,
+            "roadrunner": settings.roadrunner_url,
+            "openwebui": settings.OPENWEBUI_URL,
+        },
+    }
--- a/backend/app/services/init.py
+++ b/backend/app/services/init.py
@@ -0,0 +1 @@
+"""Services package."""
--- a/backend/app/services/anomaly_detector.py
+++ b/backend/app/services/anomaly_detector.py
@@ -0,0 +1,199 @@
+"""Embedding-based anomaly detection using Roadrunner's bge-m3 model.
+
+Converts dataset rows to embeddings, clusters them, and flags outliers
+that deviate significantly from the cluster centroids.  Uses cosine
+distance and simple k-means-like centroid computation.
+"""
+
+import asyncio
+import json
+import logging
+import math
+from typing import Optional
+
+import httpx
+from sqlalchemy import select
+from sqlalchemy.ext.asyncio import AsyncSession
+
+from app.config import settings
+from app.db import async_session_factory
+from app.db.models import AnomalyResult, Dataset, DatasetRow
+
+logger = logging.getLogger(__name__)
+
+EMBED_URL = f"{settings.roadrunner_url}/api/embed"
+EMBED_MODEL = "bge-m3"
+BATCH_SIZE = 32   # rows per embedding batch
+MAX_ROWS = 2000   # cap for anomaly detection
+
+# --- math helpers (no numpy required) ---
+
+def _dot(a: list[float], b: list[float]) -> float:
+    return sum(x * y for x, y in zip(a, b))
+
+
+def _norm(v: list[float]) -> float:
+    return math.sqrt(sum(x * x for x in v))
+
+
+def _cosine_distance(a: list[float], b: list[float]) -> float:
+    na, nb = _norm(a), _norm(b)
+    if na == 0 or nb == 0:
+        return 1.0
+    return 1.0 - _dot(a, b) / (na * nb)
+
+
+def _mean_vector(vectors: list[list[float]]) -> list[float]:
+    if not vectors:
+        return []
+    dim = len(vectors[0])
+    n = len(vectors)
+    return [sum(v[i] for v in vectors) / n for i in range(dim)]
+
+
+def _row_to_text(data: dict) -> str:
+    """Flatten a row dict to a single string for embedding."""
+    parts = []
+    for k, v in data.items():
+        sv = str(v).strip()
+        if sv and sv.lower() not in ('none', 'null', ''):
+            parts.append(f"{k}={sv}")
+    return " | ".join(parts)[:2000]  # cap length
+
+
+async def _embed_batch(texts: list[str], client: httpx.AsyncClient) -> list[list[float]]:
+    """Get embeddings from Roadrunner's Ollama API."""
+    resp = await client.post(
+        EMBED_URL,
+        json={"model": EMBED_MODEL, "input": texts},
+        timeout=120.0,
+    )
+    resp.raise_for_status()
+    data = resp.json()
+    # Ollama returns {"embeddings": [[...], ...]}
+    return data.get("embeddings", [])
+
+
+def _simple_cluster(
+    embeddings: list[list[float]],
+    k: int = 3,
+    max_iter: int = 20,
+) -> tuple[list[int], list[list[float]]]:
+    """Simple k-means clustering (no numpy dependency).
+
+    Returns (assignments, centroids).
+    """
+    n = len(embeddings)
+    if n <= k:
+        return list(range(n)), embeddings[:]
+
+    # Init centroids: evenly spaced indices
+    step = max(n // k, 1)
+    centroids = [embeddings[i * step % n] for i in range(k)]
+    assignments = [0] * n
+
+    for _ in range(max_iter):
+        # Assign to nearest centroid
+        new_assignments = []
+        for emb in embeddings:
+            dists = [_cosine_distance(emb, c) for c in centroids]
+            new_assignments.append(dists.index(min(dists)))
+
+        if new_assignments == assignments:
+            break
+        assignments = new_assignments
+
+        # Recompute centroids
+        for ci in range(k):
+            members = [embeddings[j] for j in range(n) if assignments[j] == ci]
+            if members:
+                centroids[ci] = _mean_vector(members)
+
+    return assignments, centroids
+
+
+async def detect_anomalies(
+    dataset_id: str,
+    k: int = 3,
+    outlier_threshold: float = 0.35,
+) -> list[dict]:
+    """Run embedding-based anomaly detection on a dataset.
+
+    1. Load rows    2. Embed via bge-m3    3. Cluster    4. Flag outliers.
+    """
+    async with async_session_factory() as db:
+        # Load rows
+        result = await db.execute(
+            select(DatasetRow.id, DatasetRow.row_index, DatasetRow.data)
+            .where(DatasetRow.dataset_id == dataset_id)
+            .order_by(DatasetRow.row_index)
+            .limit(MAX_ROWS)
+        )
+        rows = result.all()
+        if not rows:
+            logger.info("No rows for anomaly detection in dataset %s", dataset_id)
+            return []
+
+        row_ids = [r[0] for r in rows]
+        row_indices = [r[1] for r in rows]
+        texts = [_row_to_text(r[2]) for r in rows]
+
+        logger.info("Anomaly detection: %d rows, embedding with %s", len(texts), EMBED_MODEL)
+
+        # Embed in batches
+        all_embeddings: list[list[float]] = []
+        async with httpx.AsyncClient() as client:
+            for i in range(0, len(texts), BATCH_SIZE):
+                batch = texts[i : i + BATCH_SIZE]
+                try:
+                    embs = await _embed_batch(batch, client)
+                    all_embeddings.extend(embs)
+                except Exception as e:
+                    logger.error("Embedding batch %d failed: %s", i, e)
+                    # Fill with zeros so indices stay aligned
+                    all_embeddings.extend([[0.0] * 1024] * len(batch))
+
+        if not all_embeddings or len(all_embeddings) != len(texts):
+            logger.error("Embedding count mismatch")
+            return []
+
+        # Cluster
+        actual_k = min(k, len(all_embeddings))
+        assignments, centroids = _simple_cluster(all_embeddings, k=actual_k)
+
+        # Compute distances from centroid
+        anomalies: list[dict] = []
+        for idx, (emb, cluster_id) in enumerate(zip(all_embeddings, assignments)):
+            dist = _cosine_distance(emb, centroids[cluster_id])
+            is_outlier = dist > outlier_threshold
+            anomalies.append({
+                "row_id": row_ids[idx],
+                "row_index": row_indices[idx],
+                "anomaly_score": round(dist, 4),
+                "distance_from_centroid": round(dist, 4),
+                "cluster_id": cluster_id,
+                "is_outlier": is_outlier,
+            })
+
+        # Save to DB
+        outlier_count = 0
+        for a in anomalies:
+            ar = AnomalyResult(
+                dataset_id=dataset_id,
+                row_id=a["row_id"],
+                anomaly_score=a["anomaly_score"],
+                distance_from_centroid=a["distance_from_centroid"],
+                cluster_id=a["cluster_id"],
+                is_outlier=a["is_outlier"],
+            )
+            db.add(ar)
+            if a["is_outlier"]:
+                outlier_count += 1
+
+        await db.commit()
+        logger.info(
+            "Anomaly detection complete: %d rows, %d outliers (threshold=%.2f)",
+            len(anomalies), outlier_count, outlier_threshold,
+        )
+
+        return sorted(anomalies, key=lambda x: x["anomaly_score"], reverse=True)
--- a/backend/app/services/artifact_classifier.py
+++ b/backend/app/services/artifact_classifier.py
@@ -0,0 +1,81 @@
+"""Artifact classifier - identify Velociraptor artifact types from CSV headers."""
+
+from __future__ import annotations
+
+import logging
+
+logger = logging.getLogger(__name__)
+
+# (required_columns, artifact_type)
+FINGERPRINTS: list[tuple[set[str], str]] = [
+    ({"Pid", "Name", "CommandLine", "Exe"}, "Windows.System.Pslist"),
+    ({"Pid", "Name", "Ppid", "CommandLine"}, "Windows.System.Pslist"),
+    ({"Laddr.IP", "Raddr.IP", "Status", "Pid"}, "Windows.Network.Netstat"),
+    ({"Laddr", "Raddr", "Status", "Pid"}, "Windows.Network.Netstat"),
+    ({"FamilyString", "TypeString", "Status", "Pid"}, "Windows.Network.Netstat"),
+    ({"ServiceName", "DisplayName", "StartMode", "PathName"}, "Windows.System.Services"),
+    ({"DisplayName", "PathName", "ServiceDll", "StartMode"}, "Windows.System.Services"),
+    ({"OSPath", "Size", "Mtime", "Hash"}, "Windows.Search.FileFinder"),
+    ({"FullPath", "Size", "Mtime"}, "Windows.Search.FileFinder"),
+    ({"PrefetchFileName", "RunCount", "LastRunTimes"}, "Windows.Forensics.Prefetch"),
+    ({"Executable", "RunCount", "LastRunTimes"}, "Windows.Forensics.Prefetch"),
+    ({"KeyPath", "Type", "Data"}, "Windows.Registry.Finder"),
+    ({"Key", "Type", "Value"}, "Windows.Registry.Finder"),
+    ({"EventTime", "Channel", "EventID", "EventData"}, "Windows.EventLogs.EvtxHunter"),
+    ({"TimeCreated", "Channel", "EventID", "Provider"}, "Windows.EventLogs.EvtxHunter"),
+    ({"Entry", "Category", "Profile", "Launch String"}, "Windows.Sys.Autoruns"),
+    ({"Entry", "Category", "LaunchString"}, "Windows.Sys.Autoruns"),
+    ({"Name", "Record", "Type", "TTL"}, "Windows.Network.DNS"),
+    ({"QueryName", "QueryType", "QueryResults"}, "Windows.Network.DNS"),
+    ({"Path", "MD5", "SHA1", "SHA256"}, "Windows.Analysis.Hash"),
+    ({"Md5", "Sha256", "FullPath"}, "Windows.Analysis.Hash"),
+    ({"Name", "Actions", "NextRunTime", "Path"}, "Windows.System.TaskScheduler"),
+    ({"Name", "Uid", "Gid", "Description"}, "Windows.Sys.Users"),
+    ({"os_info.hostname", "os_info.system"}, "Server.Information.Client"),
+    ({"ClientId", "os_info.fqdn"}, "Server.Information.Client"),
+    ({"Pid", "Name", "Cmdline", "Exe"}, "Linux.Sys.Pslist"),
+    ({"Laddr", "Raddr", "Status", "FamilyString"}, "Linux.Network.Netstat"),
+    ({"Namespace", "ClassName", "PropertyName"}, "Windows.System.WMI"),
+    ({"RemoteAddress", "RemoteMACAddress", "InterfaceAlias"}, "Windows.Network.ArpCache"),
+    ({"URL", "Title", "VisitCount", "LastVisitTime"}, "Windows.Applications.BrowserHistory"),
+    ({"Url", "Title", "Visits"}, "Windows.Applications.BrowserHistory"),
+]
+
+VELOCIRAPTOR_META = {"_Source", "ClientId", "FlowId", "Fqdn", "HuntId"}
+
+CATEGORY_MAP = {
+    "Pslist": "process",
+    "Netstat": "network",
+    "Services": "persistence",
+    "FileFinder": "filesystem",
+    "Prefetch": "execution",
+    "Registry": "persistence",
+    "EvtxHunter": "eventlog",
+    "EventLogs": "eventlog",
+    "Autoruns": "persistence",
+    "DNS": "network",
+    "Hash": "filesystem",
+    "TaskScheduler": "persistence",
+    "Users": "account",
+    "Client": "system",
+    "WMI": "persistence",
+    "ArpCache": "network",
+    "BrowserHistory": "application",
+}
+
+
+def classify_artifact(columns: list[str]) -> str:
+    col_set = set(columns)
+    for required, artifact_type in FINGERPRINTS:
+        if required.issubset(col_set):
+            return artifact_type
+    if VELOCIRAPTOR_META.intersection(col_set):
+        return "Velociraptor.Unknown"
+    return "Unknown"
+
+
+def get_artifact_category(artifact_type: str) -> str:
+    for key, category in CATEGORY_MAP.items():
+        if key.lower() in artifact_type.lower():
+            return category
+    return "unknown"
--- a/backend/app/services/auth.py
+++ b/backend/app/services/auth.py
@@ -0,0 +1,201 @@
+"""Authentication & security — JWT tokens, password hashing, role-based access.
+
+Provides:
+- Password hashing (bcrypt via passlib)
+- JWT access/refresh token creation and verification
+- FastAPI dependency for protecting routes
+- Role-based enforcement (analyst, admin, viewer)
+"""
+
+import logging
+from datetime import datetime, timedelta, timezone
+from typing import Optional
+
+from fastapi import Depends, HTTPException, Request, status
+from fastapi.security import HTTPAuthorizationCredentials, HTTPBearer
+from jose import JWTError, jwt
+from passlib.context import CryptContext
+from pydantic import BaseModel
+from sqlalchemy import select
+from sqlalchemy.ext.asyncio import AsyncSession
+
+from app.config import settings
+from app.db import get_db
+from app.db.models import User
+
+logger = logging.getLogger(__name__)
+
+# ── Password hashing ─────────────────────────────────────────────────
+
+pwd_context = CryptContext(schemes=["bcrypt"], deprecated="auto")
+
+
+def hash_password(password: str) -> str:
+    return pwd_context.hash(password)
+
+
+def verify_password(plain: str, hashed: str) -> bool:
+    return pwd_context.verify(plain, hashed)
+
+
+# ── JWT tokens ────────────────────────────────────────────────────────
+
+ALGORITHM = "HS256"
+
+security = HTTPBearer(auto_error=False)
+
+
+class TokenPair(BaseModel):
+    access_token: str
+    refresh_token: str
+    token_type: str = "bearer"
+    expires_in: int  # seconds
+
+
+class TokenPayload(BaseModel):
+    sub: str  # user_id
+    role: str
+    exp: datetime
+    type: str  # "access" or "refresh"
+
+
+def create_access_token(user_id: str, role: str) -> str:
+    expires = datetime.now(timezone.utc) + timedelta(
+        minutes=settings.JWT_ACCESS_TOKEN_MINUTES
+    )
+    payload = {
+        "sub": user_id,
+        "role": role,
+        "exp": expires,
+        "type": "access",
+    }
+    return jwt.encode(payload, settings.JWT_SECRET, algorithm=ALGORITHM)
+
+
+def create_refresh_token(user_id: str, role: str) -> str:
+    expires = datetime.now(timezone.utc) + timedelta(
+        days=settings.JWT_REFRESH_TOKEN_DAYS
+    )
+    payload = {
+        "sub": user_id,
+        "role": role,
+        "exp": expires,
+        "type": "refresh",
+    }
+    return jwt.encode(payload, settings.JWT_SECRET, algorithm=ALGORITHM)
+
+
+def create_token_pair(user_id: str, role: str) -> TokenPair:
+    return TokenPair(
+        access_token=create_access_token(user_id, role),
+        refresh_token=create_refresh_token(user_id, role),
+        expires_in=settings.JWT_ACCESS_TOKEN_MINUTES * 60,
+    )
+
+
+def decode_token(token: str) -> TokenPayload:
+    """Decode and validate a JWT token."""
+    try:
+        payload = jwt.decode(token, settings.JWT_SECRET, algorithms=[ALGORITHM])
+        return TokenPayload(**payload)
+    except JWTError as e:
+        raise HTTPException(
+            status_code=status.HTTP_401_UNAUTHORIZED,
+            detail=f"Invalid token: {e}",
+            headers={"WWW-Authenticate": "Bearer"},
+        )
+
+
+# ── FastAPI dependencies ──────────────────────────────────────────────
+
+
+async def get_current_user(
+    credentials: Optional[HTTPAuthorizationCredentials] = Depends(security),
+    db: AsyncSession = Depends(get_db),
+) -> User:
+    """Extract and validate the current user from JWT.
+
+    When AUTH is disabled (no JWT secret configured), returns a default analyst user.
+    """
+    # If auth is disabled (dev mode), return a default user
+    if settings.JWT_SECRET == "CHANGE-ME-IN-PRODUCTION-USE-A-REAL-SECRET":
+        return User(
+            id="dev-user",
+            username="analyst",
+            email="analyst@local",
+            role="analyst",
+            display_name="Dev Analyst",
+        )
+
+    if not credentials:
+        raise HTTPException(
+            status_code=status.HTTP_401_UNAUTHORIZED,
+            detail="Authentication required",
+            headers={"WWW-Authenticate": "Bearer"},
+        )
+
+    token_data = decode_token(credentials.credentials)
+
+    if token_data.type != "access":
+        raise HTTPException(
+            status_code=status.HTTP_401_UNAUTHORIZED,
+            detail="Invalid token type — use access token",
+        )
+
+    result = await db.execute(select(User).where(User.id == token_data.sub))
+    user = result.scalar_one_or_none()
+
+    if not user:
+        raise HTTPException(
+            status_code=status.HTTP_401_UNAUTHORIZED,
+            detail="User not found",
+        )
+
+    if not user.is_active:
+        raise HTTPException(
+            status_code=status.HTTP_403_FORBIDDEN,
+            detail="User account is disabled",
+        )
+
+    return user
+
+
+async def get_optional_user(
+    credentials: Optional[HTTPAuthorizationCredentials] = Depends(security),
+    db: AsyncSession = Depends(get_db),
+) -> Optional[User]:
+    """Like get_current_user, but returns None instead of raising if no token."""
+    if not credentials:
+        if settings.JWT_SECRET == "CHANGE-ME-IN-PRODUCTION-USE-A-REAL-SECRET":
+            return User(
+                id="dev-user",
+                username="analyst",
+                email="analyst@local",
+                role="analyst",
+                display_name="Dev Analyst",
+            )
+        return None
+
+    try:
+        return await get_current_user(credentials, db)
+    except HTTPException:
+        return None
+
+
+def require_role(*roles: str):
+    """Dependency factory that requires the current user to have one of the specified roles."""
+
+    async def _check(user: User = Depends(get_current_user)) -> User:
+        if user.role not in roles:
+            raise HTTPException(
+                status_code=status.HTTP_403_FORBIDDEN,
+                detail=f"Requires one of roles: {', '.join(roles)}. You have: {user.role}",
+            )
+        return user
+
+    return _check
+
+
+# Convenience dependencies
+require_analyst = require_role("analyst", "admin")
+require_admin = require_role("admin")
--- a/backend/app/services/correlation.py
+++ b/backend/app/services/correlation.py
@@ -0,0 +1,400 @@
+"""Cross-hunt correlation engine — find IOC overlaps, timeline patterns, and shared TTPs.
+
+Identifies connections between hunts by analyzing:
+1. Shared IOC values across datasets
+2. Overlapping time ranges and temporal proximity
+3. Common MITRE ATT&CK techniques across hypotheses
+4. Host-to-host lateral movement patterns
+"""
+
+import logging
+from collections import Counter, defaultdict
+from dataclasses import dataclass, field
+from datetime import datetime
+from typing import Optional
+
+from sqlalchemy import select, func, text
+from sqlalchemy.ext.asyncio import AsyncSession
+
+from app.db.models import Dataset, DatasetRow, Hunt, Hypothesis, EnrichmentResult
+
+logger = logging.getLogger(__name__)
+
+
+@dataclass
+class IOCOverlap:
+    """Shared IOC between two or more hunts/datasets."""
+    ioc_value: str
+    ioc_type: str
+    datasets: list[dict] = field(default_factory=list)  # [{dataset_id, hunt_id, name}]
+    hunt_ids: list[str] = field(default_factory=list)
+    count: int = 0
+    enrichment_verdict: str = ""
+
+
+@dataclass
+class TimeOverlap:
+    """Overlapping time window between datasets."""
+    dataset_a: dict = field(default_factory=dict)
+    dataset_b: dict = field(default_factory=dict)
+    overlap_start: str = ""
+    overlap_end: str = ""
+    overlap_hours: float = 0.0
+
+
+@dataclass
+class TechniqueOverlap:
+    """Shared MITRE ATT&CK technique across hunts."""
+    technique_id: str
+    technique_name: str = ""
+    hypotheses: list[dict] = field(default_factory=list)
+    hunt_ids: list[str] = field(default_factory=list)
+
+
+@dataclass
+class CorrelationResult:
+    """Complete correlation analysis result."""
+    hunt_ids: list[str]
+    ioc_overlaps: list[IOCOverlap] = field(default_factory=list)
+    time_overlaps: list[TimeOverlap] = field(default_factory=list)
+    technique_overlaps: list[TechniqueOverlap] = field(default_factory=list)
+    host_overlaps: list[dict] = field(default_factory=list)
+    summary: str = ""
+    total_correlations: int = 0
+
+
+class CorrelationEngine:
+    """Engine for finding correlations across hunts and datasets."""
+
+    async def correlate_hunts(
+        self,
+        hunt_ids: list[str],
+        db: AsyncSession,
+    ) -> CorrelationResult:
+        """Run full correlation analysis across specified hunts."""
+        result = CorrelationResult(hunt_ids=hunt_ids)
+
+        # Run all correlation types
+        result.ioc_overlaps = await self._find_ioc_overlaps(hunt_ids, db)
+        result.time_overlaps = await self._find_time_overlaps(hunt_ids, db)
+        result.technique_overlaps = await self._find_technique_overlaps(hunt_ids, db)
+        result.host_overlaps = await self._find_host_overlaps(hunt_ids, db)
+
+        result.total_correlations = (
+            len(result.ioc_overlaps)
+            + len(result.time_overlaps)
+            + len(result.technique_overlaps)
+            + len(result.host_overlaps)
+        )
+
+        result.summary = self._build_summary(result)
+        return result
+
+    async def correlate_all(self, db: AsyncSession) -> CorrelationResult:
+        """Correlate across ALL hunts in the system."""
+        stmt = select(Hunt.id)
+        result = await db.execute(stmt)
+        hunt_ids = [row[0] for row in result.fetchall()]
+
+        if len(hunt_ids) < 2:
+            return CorrelationResult(
+                hunt_ids=hunt_ids,
+                summary="Need at least 2 hunts for correlation analysis.",
+            )
+
+        return await self.correlate_hunts(hunt_ids, db)
+
+    async def find_ioc_across_hunts(
+        self,
+        ioc_value: str,
+        db: AsyncSession,
+    ) -> list[dict]:
+        """Find all occurrences of a specific IOC across all datasets/hunts."""
+        # Search in dataset rows using JSON contains
+        stmt = select(DatasetRow, Dataset).join(
+            Dataset, DatasetRow.dataset_id == Dataset.id
+        )
+        result = await db.execute(stmt.limit(5000))
+        rows = result.all()
+
+        occurrences = []
+        for row, dataset in rows:
+            data = row.data or {}
+            normalized = row.normalized_data or {}
+
+            # Search both raw and normalized data
+            for col, val in {**data, **normalized}.items():
+                if str(val) == ioc_value:
+                    occurrences.append({
+                        "dataset_id": dataset.id,
+                        "dataset_name": dataset.name,
+                        "hunt_id": dataset.hunt_id,
+                        "row_index": row.row_index,
+                        "column": col,
+                    })
+                    break
+
+        return occurrences
+
+    # ── IOC overlap detection ─────────────────────────────────────────
+
+    async def _find_ioc_overlaps(
+        self,
+        hunt_ids: list[str],
+        db: AsyncSession,
+    ) -> list[IOCOverlap]:
+        """Find IOC values that appear in datasets from different hunts."""
+        # Get all datasets for the specified hunts
+        stmt = select(Dataset).where(Dataset.hunt_id.in_(hunt_ids))
+        result = await db.execute(stmt)
+        datasets = result.scalars().all()
+
+        if len(datasets) < 2:
+            return []
+
+        # Build IOC → dataset mapping
+        ioc_map: dict[str, list[dict]] = defaultdict(list)
+
+        for dataset in datasets:
+            if not dataset.ioc_columns:
+                continue
+
+            ioc_cols = list(dataset.ioc_columns.keys())
+            rows_stmt = select(DatasetRow).where(
+                DatasetRow.dataset_id == dataset.id
+            ).limit(2000)
+            rows_result = await db.execute(rows_stmt)
+            rows = rows_result.scalars().all()
+
+            for row in rows:
+                data = row.data or {}
+                for col in ioc_cols:
+                    val = data.get(col, "")
+                    if val and str(val).strip():
+                        ioc_map[str(val).strip()].append({
+                            "dataset_id": dataset.id,
+                            "dataset_name": dataset.name,
+                            "hunt_id": dataset.hunt_id,
+                            "column": col,
+                            "ioc_type": dataset.ioc_columns.get(col, "unknown"),
+                        })
+
+        # Filter to IOCs appearing in multiple hunts
+        overlaps = []
+        for ioc_value, appearances in ioc_map.items():
+            hunt_set = set(a["hunt_id"] for a in appearances if a["hunt_id"])
+            if len(hunt_set) >= 2:
+                # Check for enrichment data
+                enrich_stmt = select(EnrichmentResult).where(
+                    EnrichmentResult.ioc_value == ioc_value
+                ).limit(1)
+                enrich_result = await db.execute(enrich_stmt)
+                enrichment = enrich_result.scalar_one_or_none()
+
+                overlaps.append(IOCOverlap(
+                    ioc_value=ioc_value,
+                    ioc_type=appearances[0].get("ioc_type", "unknown"),
+                    datasets=appearances,
+                    hunt_ids=sorted(hunt_set),
+                    count=len(appearances),
+                    enrichment_verdict=enrichment.verdict if enrichment else "",
+                ))
+
+        # Sort by count descending
+        overlaps.sort(key=lambda x: x.count, reverse=True)
+        return overlaps[:100]  # Limit results
+
+    # ── Time window overlap ───────────────────────────────────────────
+
+    async def _find_time_overlaps(
+        self,
+        hunt_ids: list[str],
+        db: AsyncSession,
+    ) -> list[TimeOverlap]:
+        """Find datasets across hunts with overlapping time ranges."""
+        stmt = select(Dataset).where(
+            Dataset.hunt_id.in_(hunt_ids),
+            Dataset.time_range_start.isnot(None),
+            Dataset.time_range_end.isnot(None),
+        )
+        result = await db.execute(stmt)
+        datasets = result.scalars().all()
+
+        overlaps = []
+        for i, ds_a in enumerate(datasets):
+            for ds_b in datasets[i + 1:]:
+                if ds_a.hunt_id == ds_b.hunt_id:
+                    continue  # Same hunt, skip
+
+                try:
+                    a_start = datetime.fromisoformat(ds_a.time_range_start)
+                    a_end = datetime.fromisoformat(ds_a.time_range_end)
+                    b_start = datetime.fromisoformat(ds_b.time_range_start)
+                    b_end = datetime.fromisoformat(ds_b.time_range_end)
+                except (ValueError, TypeError):
+                    continue
+
+                # Check overlap
+                overlap_start = max(a_start, b_start)
+                overlap_end = min(a_end, b_end)
+
+                if overlap_start < overlap_end:
+                    hours = (overlap_end - overlap_start).total_seconds() / 3600
+                    overlaps.append(TimeOverlap(
+                        dataset_a={
+                            "id": ds_a.id,
+                            "name": ds_a.name,
+                            "hunt_id": ds_a.hunt_id,
+                            "start": ds_a.time_range_start,
+                            "end": ds_a.time_range_end,
+                        },
+                        dataset_b={
+                            "id": ds_b.id,
+                            "name": ds_b.name,
+                            "hunt_id": ds_b.hunt_id,
+                            "start": ds_b.time_range_start,
+                            "end": ds_b.time_range_end,
+                        },
+                        overlap_start=overlap_start.isoformat(),
+                        overlap_end=overlap_end.isoformat(),
+                        overlap_hours=round(hours, 2),
+                    ))
+
+        overlaps.sort(key=lambda x: x.overlap_hours, reverse=True)
+        return overlaps[:50]
+
+    # ── MITRE technique overlap ───────────────────────────────────────
+
+    async def _find_technique_overlaps(
+        self,
+        hunt_ids: list[str],
+        db: AsyncSession,
+    ) -> list[TechniqueOverlap]:
+        """Find MITRE ATT&CK techniques shared across hunts."""
+        stmt = select(Hypothesis).where(
+            Hypothesis.hunt_id.in_(hunt_ids),
+            Hypothesis.mitre_technique.isnot(None),
+        )
+        result = await db.execute(stmt)
+        hypotheses = result.scalars().all()
+
+        technique_map: dict[str, list[dict]] = defaultdict(list)
+        for hyp in hypotheses:
+            technique = hyp.mitre_technique.strip()
+            if technique:
+                technique_map[technique].append({
+                    "hypothesis_id": hyp.id,
+                    "hypothesis_title": hyp.title,
+                    "hunt_id": hyp.hunt_id,
+                    "status": hyp.status,
+                })
+
+        overlaps = []
+        for technique, hyps in technique_map.items():
+            hunt_set = set(h["hunt_id"] for h in hyps if h["hunt_id"])
+            if len(hunt_set) >= 2:
+                overlaps.append(TechniqueOverlap(
+                    technique_id=technique,
+                    hypotheses=hyps,
+                    hunt_ids=sorted(hunt_set),
+                ))
+
+        return overlaps
+
+    # ── Host overlap ──────────────────────────────────────────────────
+
+    async def _find_host_overlaps(
+        self,
+        hunt_ids: list[str],
+        db: AsyncSession,
+    ) -> list[dict]:
+        """Find hostnames that appear in datasets from different hunts.
+
+        Useful for detecting lateral movement patterns.
+        """
+        stmt = select(Dataset).where(Dataset.hunt_id.in_(hunt_ids))
+        result = await db.execute(stmt)
+        datasets = result.scalars().all()
+
+        host_map: dict[str, list[dict]] = defaultdict(list)
+
+        for dataset in datasets:
+            norm_cols = dataset.normalized_columns or {}
+            # Look for hostname columns
+            hostname_cols = [
+                orig for orig, canon in norm_cols.items()
+                if canon in ("hostname", "host", "computer_name", "src_host", "dst_host")
+            ]
+            if not hostname_cols:
+                continue
+
+            rows_stmt = select(DatasetRow).where(
+                DatasetRow.dataset_id == dataset.id
+            ).limit(2000)
+            rows_result = await db.execute(rows_stmt)
+            rows = rows_result.scalars().all()
+
+            for row in rows:
+                data = row.data or {}
+                for col in hostname_cols:
+                    val = data.get(col, "")
+                    if val and str(val).strip():
+                        host_name = str(val).strip().upper()
+                        host_map[host_name].append({
+                            "dataset_id": dataset.id,
+                            "dataset_name": dataset.name,
+                            "hunt_id": dataset.hunt_id,
+                        })
+
+        # Filter to hosts appearing in multiple hunts
+        overlaps = []
+        for host, appearances in host_map.items():
+            hunt_set = set(a["hunt_id"] for a in appearances if a["hunt_id"])
+            if len(hunt_set) >= 2:
+                overlaps.append({
+                    "hostname": host,
+                    "hunt_ids": sorted(hunt_set),
+                    "dataset_count": len(appearances),
+                    "datasets": appearances[:10],
+                })
+
+        overlaps.sort(key=lambda x: x["dataset_count"], reverse=True)
+        return overlaps[:50]
+
+    # ── Summary builder ───────────────────────────────────────────────
+
+    def _build_summary(self, result: CorrelationResult) -> str:
+        """Build a human-readable summary of correlations."""
+        parts = [f"Correlation analysis across {len(result.hunt_ids)} hunts:"]
+
+        if result.ioc_overlaps:
+            malicious = [o for o in result.ioc_overlaps if o.enrichment_verdict == "malicious"]
+            parts.append(
+                f"  - {len(result.ioc_overlaps)} shared IOCs "
+                f"({len(malicious)} flagged malicious)"
+            )
+        else:
+            parts.append("  - No shared IOCs found")
+
+        if result.time_overlaps:
+            parts.append(f"  - {len(result.time_overlaps)} overlapping time windows")
+
+        if result.technique_overlaps:
+            parts.append(
+                f"  - {len(result.technique_overlaps)} shared MITRE techniques"
+            )
+
+        if result.host_overlaps:
+            parts.append(
+                f"  - {len(result.host_overlaps)} hosts appearing in multiple hunts "
+                "(potential lateral movement)"
+            )
+
+        if result.total_correlations == 0:
+            parts.append("  No significant correlations detected.")
+
+        return "\n".join(parts)
+
+
+# Singleton
+correlation_engine = CorrelationEngine()
--- a/backend/app/services/csv_parser.py
+++ b/backend/app/services/csv_parser.py
@@ -0,0 +1,165 @@
+"""CSV parsing engine with encoding detection, delimiter sniffing, and streaming.
+
+Handles large Velociraptor CSV exports with resilience to encoding issues,
+varied delimiters, and malformed rows.
+"""
+
+import csv
+import io
+import logging
+from pathlib import Path
+from typing import AsyncIterator
+
+import chardet
+
+logger = logging.getLogger(__name__)
+
+# Reasonable defaults
+MAX_FIELD_SIZE = 1024 * 1024  # 1 MB per field
+csv.field_size_limit(MAX_FIELD_SIZE)
+
+
+def detect_encoding(file_bytes: bytes, sample_size: int = 65536) -> str:
+    """Detect file encoding from a sample of bytes."""
+    result = chardet.detect(file_bytes[:sample_size])
+    encoding = result.get("encoding", "utf-8") or "utf-8"
+    confidence = result.get("confidence", 0)
+    logger.info(f"Detected encoding: {encoding} (confidence: {confidence:.2f})")
+    # Fall back to utf-8 if confidence is very low
+    if confidence < 0.5:
+        encoding = "utf-8"
+    return encoding
+
+
+def detect_delimiter(text_sample: str) -> str:
+    """Sniff the CSV delimiter from a text sample."""
+    try:
+        dialect = csv.Sniffer().sniff(text_sample, delimiters=",\t;|")
+        return dialect.delimiter
+    except csv.Error:
+        return ","
+
+
+def infer_column_types(rows: list[dict], sample_size: int = 100) -> dict[str, str]:
+    """Infer column types from a sample of rows.
+
+    Returns a mapping of column_name → type_hint where type_hint is one of:
+    timestamp, integer, float, ip, hash_md5, hash_sha1, hash_sha256, domain, path, string
+    """
+    import re
+
+    type_map: dict[str, dict[str, int]] = {}
+    sample = rows[:sample_size]
+
+    patterns = {
+        "ip": re.compile(
+            r"^\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}$"
+        ),
+        "hash_md5": re.compile(r"^[a-fA-F0-9]{32}$"),
+        "hash_sha1": re.compile(r"^[a-fA-F0-9]{40}$"),
+        "hash_sha256": re.compile(r"^[a-fA-F0-9]{64}$"),
+        "integer": re.compile(r"^-?\d+$"),
+        "float": re.compile(r"^-?\d+\.\d+$"),
+        "timestamp": re.compile(
+            r"^\d{4}[-/]\d{2}[-/]\d{2}[T ]\d{2}:\d{2}"
+        ),
+        "domain": re.compile(
+            r"^[a-zA-Z0-9]([a-zA-Z0-9-]*[a-zA-Z0-9])?(\.[a-zA-Z]{2,})+$"
+        ),
+        "path": re.compile(r"^([A-Z]:\\|/)", re.IGNORECASE),
+    }
+
+    for row in sample:
+        for col, val in row.items():
+            if col not in type_map:
+                type_map[col] = {}
+            val_str = str(val).strip()
+            if not val_str:
+                continue
+            matched = False
+            for type_name, pattern in patterns.items():
+                if pattern.match(val_str):
+                    type_map[col][type_name] = type_map[col].get(type_name, 0) + 1
+                    matched = True
+                    break
+            if not matched:
+                type_map[col]["string"] = type_map[col].get("string", 0) + 1
+
+    result: dict[str, str] = {}
+    for col, counts in type_map.items():
+        if counts:
+            result[col] = max(counts, key=counts.get)  # type: ignore[arg-type]
+        else:
+            result[col] = "string"
+
+    return result
+
+
+def parse_csv_bytes(
+    raw_bytes: bytes,
+    max_rows: int | None = None,
+) -> tuple[list[dict], dict]:
+    """Parse a CSV file from raw bytes.
+
+    Returns:
+        (rows, metadata) where metadata contains encoding, delimiter, columns, etc.
+    """
+    encoding = detect_encoding(raw_bytes)
+
+    try:
+        text = raw_bytes.decode(encoding, errors="replace")
+    except (UnicodeDecodeError, LookupError):
+        text = raw_bytes.decode("utf-8", errors="replace")
+        encoding = "utf-8"
+
+    # Detect delimiter from first few KB
+    delimiter = detect_delimiter(text[:8192])
+
+    reader = csv.DictReader(io.StringIO(text), delimiter=delimiter)
+    columns = reader.fieldnames or []
+
+    rows: list[dict] = []
+    for i, row in enumerate(reader):
+        if max_rows is not None and i >= max_rows:
+            break
+        rows.append(dict(row))
+
+    column_types = infer_column_types(rows) if rows else {}
+
+    metadata = {
+        "encoding": encoding,
+        "delimiter": delimiter,
+        "columns": columns,
+        "column_types": column_types,
+        "row_count": len(rows),
+        "total_rows_in_file": len(rows),  # same when no max_rows
+    }
+
+    return rows, metadata
+
+
+async def parse_csv_streaming(
+    file_path: Path,
+    chunk_size: int = 8192,
+) -> AsyncIterator[tuple[int, dict]]:
+    """Stream-parse a CSV file yielding (row_index, row_dict) tuples.
+
+    Memory-efficient for large files.
+    """
+    import aiofiles  # type: ignore[import-untyped]
+
+    # Read a sample for encoding/delimiter detection
+    with open(file_path, "rb") as f:
+        sample_bytes = f.read(65536)
+
+    encoding = detect_encoding(sample_bytes)
+    text_sample = sample_bytes.decode(encoding, errors="replace")
+    delimiter = detect_delimiter(text_sample[:8192])
+
+    # Now stream-read
+    async with aiofiles.open(file_path, mode="r", encoding=encoding, errors="replace") as f:
+        content = await f.read()  # For DictReader compatibility
+
+    reader = csv.DictReader(io.StringIO(content), delimiter=delimiter)
+    for i, row in enumerate(reader):
+        yield i, dict(row)
--- a/backend/app/services/data_query.py
+++ b/backend/app/services/data_query.py
@@ -0,0 +1,238 @@
+"""Natural-language data query service with SSE streaming.
+
+Lets analysts ask questions about dataset rows in plain English.
+Routes to fast model (Roadrunner) for quick queries, heavy model (Wile)
+for deep analysis.  Supports streaming via OllamaProvider.generate_stream().
+"""
+
+from __future__ import annotations
+
+import asyncio
+import json
+import logging
+import time
+from typing import AsyncIterator
+
+from sqlalchemy import select, func
+from sqlalchemy.ext.asyncio import AsyncSession
+
+from app.config import settings
+from app.db import async_session_factory
+from app.db.models import Dataset, DatasetRow
+
+logger = logging.getLogger(__name__)
+
+# Maximum rows to include in context window
+MAX_CONTEXT_ROWS = 60
+MAX_ROW_TEXT_CHARS = 300
+
+
+def _rows_to_text(rows: list[dict], columns: list[str]) -> str:
+    """Convert dataset rows to a compact text table for the LLM context."""
+    if not rows:
+        return "(no rows)"
+    # Header
+    header = " | ".join(columns[:20])  # cap columns to avoid overflow
+    lines = [header, "-" * min(len(header), 120)]
+    for row in rows[:MAX_CONTEXT_ROWS]:
+        vals = []
+        for c in columns[:20]:
+            v = str(row.get(c, ""))
+            if len(v) > 80:
+                v = v[:77] + "..."
+            vals.append(v)
+        line = " | ".join(vals)
+        if len(line) > MAX_ROW_TEXT_CHARS:
+            line = line[:MAX_ROW_TEXT_CHARS] + "..."
+        lines.append(line)
+    return "\n".join(lines)
+
+
+QUERY_SYSTEM_PROMPT = """You are a cybersecurity data analyst assistant for ThreatHunt.
+You have been given a sample of rows from a forensic artifact dataset (Velociraptor, etc.).
+
+Your job:
+- Answer the analyst's question about this data accurately and concisely
+- Point out suspicious patterns, anomalies, or indicators of compromise
+- Reference MITRE ATT&CK techniques when relevant
+- Suggest follow-up queries or pivots
+- If you cannot answer from the data provided, say so clearly
+
+Rules:
+- Be factual - only reference data you can see
+- Use forensic terminology appropriate for SOC/DFIR analysts
+- Format your answer with clear sections using markdown
+- If the data seems benign, say so - do not fabricate threats"""
+
+
+async def _load_dataset_context(
+    dataset_id: str,
+    db: AsyncSession,
+    sample_size: int = MAX_CONTEXT_ROWS,
+) -> tuple[dict, str, int]:
+    """Load dataset metadata + sample rows for context.
+
+    Returns (metadata_dict, rows_text, total_row_count).
+    """
+    ds = await db.get(Dataset, dataset_id)
+    if not ds:
+        raise ValueError(f"Dataset {dataset_id} not found")
+
+    # Get total count
+    count_q = await db.execute(
+        select(func.count()).where(DatasetRow.dataset_id == dataset_id)
+    )
+    total = count_q.scalar() or 0
+
+    # Sample rows - get first batch + some from the middle
+    half = sample_size // 2
+    result = await db.execute(
+        select(DatasetRow)
+        .where(DatasetRow.dataset_id == dataset_id)
+        .order_by(DatasetRow.row_index)
+        .limit(half)
+    )
+    first_rows = result.scalars().all()
+
+    # If dataset is large, also sample from the middle
+    middle_rows = []
+    if total > sample_size:
+        mid_offset = total // 2
+        result2 = await db.execute(
+            select(DatasetRow)
+            .where(DatasetRow.dataset_id == dataset_id)
+            .order_by(DatasetRow.row_index)
+            .offset(mid_offset)
+            .limit(sample_size - half)
+        )
+        middle_rows = result2.scalars().all()
+    else:
+        result2 = await db.execute(
+            select(DatasetRow)
+            .where(DatasetRow.dataset_id == dataset_id)
+            .order_by(DatasetRow.row_index)
+            .offset(half)
+            .limit(sample_size - half)
+        )
+        middle_rows = result2.scalars().all()
+
+    all_rows = first_rows + middle_rows
+    row_dicts = [r.data if isinstance(r.data, dict) else {} for r in all_rows]
+
+    columns = list(ds.column_schema.keys()) if ds.column_schema else []
+    if not columns and row_dicts:
+        columns = list(row_dicts[0].keys())
+
+    rows_text = _rows_to_text(row_dicts, columns)
+
+    metadata = {
+        "name": ds.name,
+        "filename": ds.filename,
+        "source_tool": ds.source_tool,
+        "artifact_type": getattr(ds, "artifact_type", None),
+        "row_count": total,
+        "columns": columns[:30],
+        "sample_rows_shown": len(all_rows),
+    }
+    return metadata, rows_text, total
+
+
+async def query_dataset(
+    dataset_id: str,
+    question: str,
+    mode: str = "quick",
+) -> str:
+    """Non-streaming query: returns full answer text."""
+    from app.agents.providers_v2 import OllamaProvider, Node
+
+    async with async_session_factory() as db:
+        meta, rows_text, total = await _load_dataset_context(dataset_id, db)
+
+    prompt = _build_prompt(question, meta, rows_text, total)
+
+    if mode == "deep":
+        provider = OllamaProvider(settings.DEFAULT_HEAVY_MODEL, Node.WILE)
+        max_tokens = 4096
+    else:
+        provider = OllamaProvider(settings.DEFAULT_FAST_MODEL, Node.ROADRUNNER)
+        max_tokens = 2048
+
+    result = await provider.generate(
+        prompt,
+        system=QUERY_SYSTEM_PROMPT,
+        max_tokens=max_tokens,
+        temperature=0.3,
+    )
+    return result.get("response", "No response generated.")
+
+
+async def query_dataset_stream(
+    dataset_id: str,
+    question: str,
+    mode: str = "quick",
+) -> AsyncIterator[str]:
+    """Streaming query: yields SSE-formatted events."""
+    from app.agents.providers_v2 import OllamaProvider, Node
+
+    start = time.monotonic()
+
+    # Send initial metadata event
+    yield f"data: {json.dumps({'type': 'status', 'message': 'Loading dataset...'})}\n\n"
+
+    async with async_session_factory() as db:
+        meta, rows_text, total = await _load_dataset_context(dataset_id, db)
+
+    yield f"data: {json.dumps({'type': 'metadata', 'dataset': meta})}\n\n"
+    yield f"data: {json.dumps({'type': 'status', 'message': f'Querying LLM ({mode} mode)...'})}\n\n"
+
+    prompt = _build_prompt(question, meta, rows_text, total)
+
+    if mode == "deep":
+        provider = OllamaProvider(settings.DEFAULT_HEAVY_MODEL, Node.WILE)
+        max_tokens = 4096
+        model_name = settings.DEFAULT_HEAVY_MODEL
+        node_name = "wile"
+    else:
+        provider = OllamaProvider(settings.DEFAULT_FAST_MODEL, Node.ROADRUNNER)
+        max_tokens = 2048
+        model_name = settings.DEFAULT_FAST_MODEL
+        node_name = "roadrunner"
+
+    # Stream tokens
+    token_count = 0
+    try:
+        async for token in provider.generate_stream(
+            prompt,
+            system=QUERY_SYSTEM_PROMPT,
+            max_tokens=max_tokens,
+            temperature=0.3,
+        ):
+            token_count += 1
+            yield f"data: {json.dumps({'type': 'token', 'content': token})}\n\n"
+    except Exception as e:
+        logger.error(f"Streaming error: {e}")
+        yield f"data: {json.dumps({'type': 'error', 'message': str(e)})}\n\n"
+
+    elapsed_ms = int((time.monotonic() - start) * 1000)
+    yield f"data: {json.dumps({'type': 'done', 'tokens': token_count, 'elapsed_ms': elapsed_ms, 'model': model_name, 'node': node_name})}\n\n"
+
+
+def _build_prompt(question: str, meta: dict, rows_text: str, total: int) -> str:
+    """Construct the full prompt with data context."""
+    parts = [
+        f"## Dataset: {meta['name']}",
+        f"- Source: {meta.get('source_tool', 'unknown')}",
+        f"- Artifact type: {meta.get('artifact_type', 'unknown')}",
+        f"- Total rows: {total}",
+        f"- Columns: {', '.join(meta.get('columns', []))}",
+        f"- Showing {meta['sample_rows_shown']} sample rows below",
+        "",
+        "## Sample Data",
+        "```",
+        rows_text,
+        "```",
+        "",
+        f"## Analyst Question",
+        question,
+    ]
+    return "\n".join(parts)
--- a/backend/app/services/enrichment.py
+++ b/backend/app/services/enrichment.py
@@ -0,0 +1,655 @@
+"""IOC Enrichment Engine — VirusTotal, AbuseIPDB, Shodan integrations.
+
+Provides automated IOC enrichment with caching and rate limiting.
+Enriches IPs, hashes, domains with threat intelligence verdicts.
+"""
+
+import asyncio
+import hashlib
+import logging
+import time
+from dataclasses import dataclass, field
+from datetime import datetime, timedelta, timezone
+from enum import Enum
+from typing import Optional
+
+import httpx
+from sqlalchemy import select
+from sqlalchemy.ext.asyncio import AsyncSession
+
+from app.config import settings
+from app.db.models import EnrichmentResult as EnrichmentDB
+
+logger = logging.getLogger(__name__)
+
+
+class IOCType(str, Enum):
+    IP = "ip"
+    DOMAIN = "domain"
+    HASH_MD5 = "hash_md5"
+    HASH_SHA1 = "hash_sha1"
+    HASH_SHA256 = "hash_sha256"
+    URL = "url"
+
+
+class Verdict(str, Enum):
+    CLEAN = "clean"
+    SUSPICIOUS = "suspicious"
+    MALICIOUS = "malicious"
+    UNKNOWN = "unknown"
+    ERROR = "error"
+
+
+@dataclass
+class EnrichmentResultData:
+    """Enrichment result from a provider."""
+    ioc_value: str
+    ioc_type: IOCType
+    source: str
+    verdict: Verdict
+    score: float = 0.0                # 0-100 normalized threat score
+    raw_data: dict = field(default_factory=dict)
+    tags: list[str] = field(default_factory=list)
+    country: str = ""
+    asn: str = ""
+    org: str = ""
+    last_seen: str = ""
+    error: str = ""
+    latency_ms: int = 0
+
+
+# ── Rate limiter ──────────────────────────────────────────────────────
+
+
+class RateLimiter:
+    """Simple token bucket rate limiter for API calls."""
+
+    def __init__(self, calls_per_minute: int = 4):
+        self.calls_per_minute = calls_per_minute
+        self.interval = 60.0 / calls_per_minute
+        self._last_call: float = 0.0
+        self._lock = asyncio.Lock()
+
+    async def acquire(self):
+        async with self._lock:
+            now = time.monotonic()
+            elapsed = now - self._last_call
+            if elapsed < self.interval:
+                await asyncio.sleep(self.interval - elapsed)
+            self._last_call = time.monotonic()
+
+
+# ── Provider base ─────────────────────────────────────────────────────
+
+
+class EnrichmentProvider:
+    """Base class for enrichment providers."""
+
+    name: str = "base"
+
+    def __init__(self, api_key: str = "", rate_limit: int = 4):
+        self.api_key = api_key
+        self.rate_limiter = RateLimiter(rate_limit)
+        self._client: httpx.AsyncClient | None = None
+
+    def _get_client(self) -> httpx.AsyncClient:
+        if self._client is None or self._client.is_closed:
+            self._client = httpx.AsyncClient(
+                timeout=httpx.Timeout(connect=10, read=30, write=10, pool=5),
+            )
+        return self._client
+
+    async def cleanup(self):
+        if self._client and not self._client.is_closed:
+            await self._client.aclose()
+
+    @property
+    def is_configured(self) -> bool:
+        return bool(self.api_key)
+
+    async def enrich(self, ioc_value: str, ioc_type: IOCType) -> EnrichmentResultData:
+        raise NotImplementedError
+
+
+# ── VirusTotal ────────────────────────────────────────────────────────
+
+
+class VirusTotalProvider(EnrichmentProvider):
+    """VirusTotal v3 API provider."""
+
+    name = "virustotal"
+    BASE_URL = "https://www.virustotal.com/api/v3"
+
+    def __init__(self):
+        super().__init__(api_key=settings.VIRUSTOTAL_API_KEY, rate_limit=4)
+
+    def _headers(self) -> dict:
+        return {"x-apikey": self.api_key}
+
+    async def enrich(self, ioc_value: str, ioc_type: IOCType) -> EnrichmentResultData:
+        if not self.is_configured:
+            return EnrichmentResultData(
+                ioc_value=ioc_value, ioc_type=ioc_type,
+                source=self.name, verdict=Verdict.ERROR,
+                error="VirusTotal API key not configured",
+            )
+
+        await self.rate_limiter.acquire()
+        start = time.monotonic()
+
+        try:
+            endpoint = self._get_endpoint(ioc_value, ioc_type)
+            if not endpoint:
+                return EnrichmentResultData(
+                    ioc_value=ioc_value, ioc_type=ioc_type,
+                    source=self.name, verdict=Verdict.ERROR,
+                    error=f"Unsupported IOC type: {ioc_type}",
+                )
+
+            client = self._get_client()
+            resp = await client.get(endpoint, headers=self._headers())
+            latency_ms = int((time.monotonic() - start) * 1000)
+
+            if resp.status_code == 404:
+                return EnrichmentResultData(
+                    ioc_value=ioc_value, ioc_type=ioc_type,
+                    source=self.name, verdict=Verdict.UNKNOWN,
+                    latency_ms=latency_ms,
+                )
+
+            resp.raise_for_status()
+            data = resp.json()
+            attrs = data.get("data", {}).get("attributes", {})
+            stats = attrs.get("last_analysis_stats", {})
+
+            malicious = stats.get("malicious", 0)
+            suspicious = stats.get("suspicious", 0)
+            total = sum(stats.values()) if stats else 0
+
+            # Determine verdict
+            if malicious > 3:
+                verdict = Verdict.MALICIOUS
+            elif malicious > 0 or suspicious > 2:
+                verdict = Verdict.SUSPICIOUS
+            elif total > 0:
+                verdict = Verdict.CLEAN
+            else:
+                verdict = Verdict.UNKNOWN
+
+            score = (malicious / total * 100) if total > 0 else 0
+
+            tags = attrs.get("tags", [])
+            if attrs.get("type_description"):
+                tags.append(attrs["type_description"])
+
+            return EnrichmentResultData(
+                ioc_value=ioc_value,
+                ioc_type=ioc_type,
+                source=self.name,
+                verdict=verdict,
+                score=round(score, 1),
+                raw_data={
+                    "stats": stats,
+                    "reputation": attrs.get("reputation", 0),
+                    "type_description": attrs.get("type_description", ""),
+                    "names": attrs.get("names", [])[:5],
+                },
+                tags=tags[:10],
+                country=attrs.get("country", ""),
+                asn=str(attrs.get("asn", "")),
+                org=attrs.get("as_owner", ""),
+                last_seen=attrs.get("last_analysis_date", ""),
+                latency_ms=latency_ms,
+            )
+
+        except httpx.HTTPStatusError as e:
+            return EnrichmentResultData(
+                ioc_value=ioc_value, ioc_type=ioc_type,
+                source=self.name, verdict=Verdict.ERROR,
+                error=f"HTTP {e.response.status_code}",
+                latency_ms=int((time.monotonic() - start) * 1000),
+            )
+        except Exception as e:
+            return EnrichmentResultData(
+                ioc_value=ioc_value, ioc_type=ioc_type,
+                source=self.name, verdict=Verdict.ERROR,
+                error=str(e),
+                latency_ms=int((time.monotonic() - start) * 1000),
+            )
+
+    def _get_endpoint(self, ioc_value: str, ioc_type: IOCType) -> str | None:
+        if ioc_type == IOCType.IP:
+            return f"{self.BASE_URL}/ip_addresses/{ioc_value}"
+        elif ioc_type == IOCType.DOMAIN:
+            return f"{self.BASE_URL}/domains/{ioc_value}"
+        elif ioc_type in (IOCType.HASH_MD5, IOCType.HASH_SHA1, IOCType.HASH_SHA256):
+            return f"{self.BASE_URL}/files/{ioc_value}"
+        elif ioc_type == IOCType.URL:
+            url_id = hashlib.sha256(ioc_value.encode()).hexdigest()
+            return f"{self.BASE_URL}/urls/{url_id}"
+        return None
+
+
+# ── AbuseIPDB ─────────────────────────────────────────────────────────
+
+
+class AbuseIPDBProvider(EnrichmentProvider):
+    """AbuseIPDB API provider — IP reputation."""
+
+    name = "abuseipdb"
+    BASE_URL = "https://api.abuseipdb.com/api/v2"
+
+    def __init__(self):
+        super().__init__(api_key=settings.ABUSEIPDB_API_KEY, rate_limit=10)
+
+    async def enrich(self, ioc_value: str, ioc_type: IOCType) -> EnrichmentResultData:
+        if ioc_type != IOCType.IP:
+            return EnrichmentResultData(
+                ioc_value=ioc_value, ioc_type=ioc_type,
+                source=self.name, verdict=Verdict.ERROR,
+                error="AbuseIPDB only supports IP lookups",
+            )
+
+        if not self.is_configured:
+            return EnrichmentResultData(
+                ioc_value=ioc_value, ioc_type=ioc_type,
+                source=self.name, verdict=Verdict.ERROR,
+                error="AbuseIPDB API key not configured",
+            )
+
+        await self.rate_limiter.acquire()
+        start = time.monotonic()
+
+        try:
+            client = self._get_client()
+            resp = await client.get(
+                f"{self.BASE_URL}/check",
+                params={"ipAddress": ioc_value, "maxAgeInDays": 90, "verbose": "true"},
+                headers={"Key": self.api_key, "Accept": "application/json"},
+            )
+            latency_ms = int((time.monotonic() - start) * 1000)
+            resp.raise_for_status()
+            data = resp.json().get("data", {})
+
+            abuse_score = data.get("abuseConfidenceScore", 0)
+            total_reports = data.get("totalReports", 0)
+
+            if abuse_score >= 75:
+                verdict = Verdict.MALICIOUS
+            elif abuse_score >= 25 or total_reports > 5:
+                verdict = Verdict.SUSPICIOUS
+            elif total_reports == 0:
+                verdict = Verdict.UNKNOWN
+            else:
+                verdict = Verdict.CLEAN
+
+            categories = data.get("reports", [])
+            tags = []
+            for report in categories[:10]:
+                for cat_id in report.get("categories", []):
+                    tag = self._category_name(cat_id)
+                    if tag and tag not in tags:
+                        tags.append(tag)
+
+            return EnrichmentResultData(
+                ioc_value=ioc_value,
+                ioc_type=ioc_type,
+                source=self.name,
+                verdict=verdict,
+                score=float(abuse_score),
+                raw_data={
+                    "abuse_confidence_score": abuse_score,
+                    "total_reports": total_reports,
+                    "is_whitelisted": data.get("isWhitelisted"),
+                    "is_tor": data.get("isTor", False),
+                    "usage_type": data.get("usageType", ""),
+                    "isp": data.get("isp", ""),
+                },
+                tags=tags[:10],
+                country=data.get("countryCode", ""),
+                org=data.get("isp", ""),
+                last_seen=data.get("lastReportedAt", ""),
+                latency_ms=latency_ms,
+            )
+
+        except Exception as e:
+            return EnrichmentResultData(
+                ioc_value=ioc_value, ioc_type=ioc_type,
+                source=self.name, verdict=Verdict.ERROR,
+                error=str(e),
+                latency_ms=int((time.monotonic() - start) * 1000),
+            )
+
+    @staticmethod
+    def _category_name(cat_id: int) -> str:
+        categories = {
+            1: "DNS Compromise", 2: "DNS Poisoning", 3: "Fraud Orders",
+            4: "DDoS Attack", 5: "FTP Brute-Force", 6: "Ping of Death",
+            7: "Phishing", 8: "Fraud VoIP", 9: "Open Proxy",
+            10: "Web Spam", 11: "Email Spam", 12: "Blog Spam",
+            13: "VPN IP", 14: "Port Scan", 15: "Hacking",
+            16: "SQL Injection", 17: "Spoofing", 18: "Brute-Force",
+            19: "Bad Web Bot", 20: "Exploited Host", 21: "Web App Attack",
+            22: "SSH", 23: "IoT Targeted",
+        }
+        return categories.get(cat_id, "")
+
+
+# ── Shodan ────────────────────────────────────────────────────────────
+
+
+class ShodanProvider(EnrichmentProvider):
+    """Shodan API provider — infrastructure intelligence."""
+
+    name = "shodan"
+    BASE_URL = "https://api.shodan.io"
+
+    def __init__(self):
+        super().__init__(api_key=settings.SHODAN_API_KEY, rate_limit=1)
+
+    async def enrich(self, ioc_value: str, ioc_type: IOCType) -> EnrichmentResultData:
+        if ioc_type != IOCType.IP:
+            return EnrichmentResultData(
+                ioc_value=ioc_value, ioc_type=ioc_type,
+                source=self.name, verdict=Verdict.ERROR,
+                error="Shodan only supports IP lookups",
+            )
+
+        if not self.is_configured:
+            return EnrichmentResultData(
+                ioc_value=ioc_value, ioc_type=ioc_type,
+                source=self.name, verdict=Verdict.ERROR,
+                error="Shodan API key not configured",
+            )
+
+        await self.rate_limiter.acquire()
+        start = time.monotonic()
+
+        try:
+            client = self._get_client()
+            resp = await client.get(
+                f"{self.BASE_URL}/shodan/host/{ioc_value}",
+                params={"key": self.api_key, "minify": "true"},
+            )
+            latency_ms = int((time.monotonic() - start) * 1000)
+
+            if resp.status_code == 404:
+                return EnrichmentResultData(
+                    ioc_value=ioc_value, ioc_type=ioc_type,
+                    source=self.name, verdict=Verdict.UNKNOWN,
+                    latency_ms=latency_ms,
+                )
+
+            resp.raise_for_status()
+            data = resp.json()
+
+            ports = data.get("ports", [])
+            vulns = data.get("vulns", [])
+            tags_raw = data.get("tags", [])
+
+            # Determine verdict based on open ports and vulns
+            if vulns:
+                verdict = Verdict.SUSPICIOUS
+                score = min(len(vulns) * 15, 100.0)
+            elif len(ports) > 20:
+                verdict = Verdict.SUSPICIOUS
+                score = 40.0
+            else:
+                verdict = Verdict.CLEAN
+                score = 0.0
+
+            tags = tags_raw[:10]
+            if vulns:
+                tags.extend([f"CVE: {v}" for v in vulns[:5]])
+
+            return EnrichmentResultData(
+                ioc_value=ioc_value,
+                ioc_type=ioc_type,
+                source=self.name,
+                verdict=verdict,
+                score=score,
+                raw_data={
+                    "ports": ports[:20],
+                    "vulns": vulns[:10],
+                    "os": data.get("os"),
+                    "hostnames": data.get("hostnames", [])[:5],
+                    "domains": data.get("domains", [])[:5],
+                    "last_update": data.get("last_update", ""),
+                },
+                tags=tags[:15],
+                country=data.get("country_code", ""),
+                asn=data.get("asn", ""),
+                org=data.get("org", ""),
+                last_seen=data.get("last_update", ""),
+                latency_ms=latency_ms,
+            )
+
+        except Exception as e:
+            return EnrichmentResultData(
+                ioc_value=ioc_value, ioc_type=ioc_type,
+                source=self.name, verdict=Verdict.ERROR,
+                error=str(e),
+                latency_ms=int((time.monotonic() - start) * 1000),
+            )
+
+
+# ── Enrichment Engine (orchestrator) ──────────────────────────────────
+
+
+class EnrichmentEngine:
+    """Orchestrates IOC enrichment across all providers with caching."""
+
+    CACHE_TTL_HOURS = 24
+
+    def __init__(self):
+        self.providers: list[EnrichmentProvider] = [
+            VirusTotalProvider(),
+            AbuseIPDBProvider(),
+            ShodanProvider(),
+        ]
+
+    @property
+    def configured_providers(self) -> list[EnrichmentProvider]:
+        return [p for p in self.providers if p.is_configured]
+
+    async def enrich_ioc(
+        self,
+        ioc_value: str,
+        ioc_type: IOCType,
+        db: AsyncSession | None = None,
+        skip_cache: bool = False,
+    ) -> list[EnrichmentResultData]:
+        """Enrich a single IOC across all configured providers.
+
+        Uses cached results from DB when available.
+        """
+        results: list[EnrichmentResultData] = []
+
+        # Check cache first
+        if db and not skip_cache:
+            cached = await self._get_cached(db, ioc_value, ioc_type)
+            if cached:
+                logger.info(f"Cache hit for {ioc_type.value}:{ioc_value} ({len(cached)} results)")
+                return cached
+
+        # Query all applicable providers in parallel
+        tasks = []
+        for provider in self.configured_providers:
+            # Skip providers that don't support this IOC type
+            if ioc_type in (IOCType.DOMAIN,) and provider.name in ("abuseipdb", "shodan"):
+                continue
+            if ioc_type == IOCType.IP and provider.name == "virustotal":
+                tasks.append(provider.enrich(ioc_value, ioc_type))
+            elif ioc_type == IOCType.IP:
+                tasks.append(provider.enrich(ioc_value, ioc_type))
+            elif ioc_type in (IOCType.HASH_MD5, IOCType.HASH_SHA1, IOCType.HASH_SHA256):
+                if provider.name == "virustotal":
+                    tasks.append(provider.enrich(ioc_value, ioc_type))
+            elif ioc_type == IOCType.DOMAIN:
+                if provider.name == "virustotal":
+                    tasks.append(provider.enrich(ioc_value, ioc_type))
+            elif ioc_type == IOCType.URL:
+                if provider.name == "virustotal":
+                    tasks.append(provider.enrich(ioc_value, ioc_type))
+
+        if tasks:
+            results = list(await asyncio.gather(*tasks, return_exceptions=False))
+
+        # Cache results
+        if db and results:
+            await self._cache_results(db, results)
+
+        return results
+
+    async def enrich_batch(
+        self,
+        iocs: list[tuple[str, IOCType]],
+        db: AsyncSession | None = None,
+        concurrency: int = 3,
+    ) -> dict[str, list[EnrichmentResultData]]:
+        """Enrich a batch of IOCs with controlled concurrency."""
+        sem = asyncio.Semaphore(concurrency)
+        all_results: dict[str, list[EnrichmentResultData]] = {}
+
+        async def _enrich_one(value: str, ioc_type: IOCType):
+            async with sem:
+                result = await self.enrich_ioc(value, ioc_type, db=db)
+                all_results[value] = result
+
+        tasks = [_enrich_one(v, t) for v, t in iocs]
+        await asyncio.gather(*tasks, return_exceptions=True)
+        return all_results
+
+    async def enrich_dataset_iocs(
+        self,
+        rows: list[dict],
+        ioc_columns: dict,
+        db: AsyncSession | None = None,
+        max_iocs: int = 50,
+    ) -> dict[str, list[EnrichmentResultData]]:
+        """Auto-enrich IOCs found in a dataset.
+
+        Extracts unique IOC values from the identified columns and enriches them.
+        """
+        iocs_to_enrich: list[tuple[str, IOCType]] = []
+        seen = set()
+
+        for col_name, col_type in ioc_columns.items():
+            ioc_type = self._map_column_type(col_type)
+            if not ioc_type:
+                continue
+
+            for row in rows:
+                value = row.get(col_name, "")
+                if value and value not in seen:
+                    seen.add(value)
+                    iocs_to_enrich.append((str(value), ioc_type))
+
+                if len(iocs_to_enrich) >= max_iocs:
+                    break
+
+            if len(iocs_to_enrich) >= max_iocs:
+                break
+
+        if iocs_to_enrich:
+            return await self.enrich_batch(iocs_to_enrich, db=db)
+        return {}
+
+    async def _get_cached(
+        self,
+        db: AsyncSession,
+        ioc_value: str,
+        ioc_type: IOCType,
+    ) -> list[EnrichmentResultData] | None:
+        """Check for cached enrichment results."""
+        cutoff = datetime.now(timezone.utc) - timedelta(hours=self.CACHE_TTL_HOURS)
+        stmt = (
+            select(EnrichmentDB)
+            .where(
+                EnrichmentDB.ioc_value == ioc_value,
+                EnrichmentDB.ioc_type == ioc_type.value,
+                EnrichmentDB.cached_at >= cutoff,
+            )
+        )
+        result = await db.execute(stmt)
+        cached = result.scalars().all()
+
+        if not cached:
+            return None
+
+        return [
+            EnrichmentResultData(
+                ioc_value=c.ioc_value,
+                ioc_type=IOCType(c.ioc_type),
+                source=c.source,
+                verdict=Verdict(c.verdict),
+                score=c.score or 0.0,
+                raw_data=c.raw_data or {},
+                tags=c.tags or [],
+                country=c.country or "",
+                asn=c.asn or "",
+                org=c.org or "",
+            )
+            for c in cached
+        ]
+
+    async def _cache_results(
+        self,
+        db: AsyncSession,
+        results: list[EnrichmentResultData],
+    ):
+        """Cache enrichment results in the database."""
+        for r in results:
+            if r.verdict == Verdict.ERROR:
+                continue  # Don't cache errors
+            entry = EnrichmentDB(
+                ioc_value=r.ioc_value,
+                ioc_type=r.ioc_type.value,
+                source=r.source,
+                verdict=r.verdict.value,
+                score=r.score,
+                raw_data=r.raw_data,
+                tags=r.tags,
+                country=r.country,
+                asn=r.asn,
+                org=r.org,
+            )
+            db.add(entry)
+        try:
+            await db.flush()
+        except Exception as e:
+            logger.warning(f"Failed to cache enrichment: {e}")
+
+    @staticmethod
+    def _map_column_type(col_type: str) -> IOCType | None:
+        """Map column type from normalizer to IOCType."""
+        mapping = {
+            "ip": IOCType.IP,
+            "ip_address": IOCType.IP,
+            "src_ip": IOCType.IP,
+            "dst_ip": IOCType.IP,
+            "domain": IOCType.DOMAIN,
+            "hash_md5": IOCType.HASH_MD5,
+            "hash_sha1": IOCType.HASH_SHA1,
+            "hash_sha256": IOCType.HASH_SHA256,
+            "url": IOCType.URL,
+        }
+        return mapping.get(col_type)
+
+    async def cleanup(self):
+        for provider in self.providers:
+            await provider.cleanup()
+
+    def status(self) -> dict:
+        """Return enrichment engine status."""
+        return {
+            "providers": {
+                p.name: {"configured": p.is_configured}
+                for p in self.providers
+            },
+            "cache_ttl_hours": self.CACHE_TTL_HOURS,
+        }
+
+
+# Singleton
+enrichment_engine = EnrichmentEngine()
--- a/backend/app/services/host_inventory.py
+++ b/backend/app/services/host_inventory.py
@@ -0,0 +1,290 @@
+"""Host Inventory Service - builds a deduplicated host-centric network view.
+
+Scans all datasets in a hunt to identify unique hosts, their IPs, OS,
+logged-in users, and network connections between them.
+"""
+
+import re
+import logging
+from collections import defaultdict
+from typing import Any
+
+from sqlalchemy import select, func
+from sqlalchemy.ext.asyncio import AsyncSession
+
+from app.db.models import Dataset, DatasetRow
+
+logger = logging.getLogger(__name__)
+
+# --- Column-name patterns (Velociraptor + generic forensic tools) ---
+
+_HOST_ID_RE = re.compile(
+    r'^(client_?id|clientid|agent_?id|endpoint_?id|host_?id|sensor_?id)$', re.I)
+_FQDN_RE = re.compile(
+    r'^(fqdn|fully_?qualified|computer_?name|hostname|host_?name|host|'
+    r'system_?name|machine_?name|nodename|workstation)$', re.I)
+_USERNAME_RE = re.compile(
+    r'^(user|username|user_?name|logon_?name|account_?name|owner|'
+    r'logged_?in_?user|sam_?account_?name|samaccountname)$', re.I)
+_LOCAL_IP_RE = re.compile(
+    r'^(laddr\.?ip|laddr|local_?addr(ess)?|src_?ip|source_?ip)$', re.I)
+_REMOTE_IP_RE = re.compile(
+    r'^(raddr\.?ip|raddr|remote_?addr(ess)?|dst_?ip|dest_?ip)$', re.I)
+_REMOTE_PORT_RE = re.compile(
+    r'^(raddr\.?port|rport|remote_?port|dst_?port|dest_?port)$', re.I)
+_OS_RE = re.compile(
+    r'^(os|operating_?system|os_?version|os_?name|platform|os_?type|os_?build)$', re.I)
+_IP_VALID_RE = re.compile(r'^\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}$')
+
+_IGNORE_IPS = frozenset({
+    '0.0.0.0', '::', '::1', '127.0.0.1', '', '-', '*', 'None', 'null',
+})
+_SYSTEM_DOMAINS = frozenset({
+    'NT AUTHORITY', 'NT SERVICE', 'FONT DRIVER HOST', 'WINDOW MANAGER',
+})
+_SYSTEM_USERS = frozenset({
+    'SYSTEM', 'LOCAL SERVICE', 'NETWORK SERVICE',
+    'UMFD-0', 'UMFD-1', 'DWM-1', 'DWM-2', 'DWM-3',
+})
+
+
+def _is_valid_ip(v: str) -> bool:
+    if not v or v in _IGNORE_IPS:
+        return False
+    return bool(_IP_VALID_RE.match(v))
+
+
+def _clean(v: Any) -> str:
+    s = str(v or '').strip()
+    return s if s and s not in ('-', 'None', 'null', '') else ''
+
+
+_SYSTEM_USER_RE = re.compile(
+    r'^(SYSTEM|LOCAL SERVICE|NETWORK SERVICE|DWM-\d+|UMFD-\d+)$', re.I)
+
+
+def _extract_username(raw: str) -> str:
+    """Clean username, stripping domain prefixes and filtering system accounts."""
+    if not raw:
+        return ''
+    name = raw.strip()
+    if '\\' in name:
+        domain, _, name = name.rpartition('\\')
+        name = name.strip()
+        if domain.strip().upper() in _SYSTEM_DOMAINS:
+            if not name or _SYSTEM_USER_RE.match(name):
+                return ''
+    if _SYSTEM_USER_RE.match(name):
+        return ''
+    return name or ''
+
+
+def _infer_os(fqdn: str) -> str:
+    u = fqdn.upper()
+    if 'W10-' in u or 'WIN10' in u:
+        return 'Windows 10'
+    if 'W11-' in u or 'WIN11' in u:
+        return 'Windows 11'
+    if 'W7-' in u or 'WIN7' in u:
+        return 'Windows 7'
+    if 'SRV' in u or 'SERVER' in u or 'DC-' in u:
+        return 'Windows Server'
+    if any(k in u for k in ('LINUX', 'UBUNTU', 'CENTOS', 'RHEL', 'DEBIAN')):
+        return 'Linux'
+    if 'MAC' in u or 'DARWIN' in u:
+        return 'macOS'
+    return 'Windows'
+
+
+def _identify_columns(ds: Dataset) -> dict:
+    norm = ds.normalized_columns or {}
+    schema = ds.column_schema or {}
+    raw_cols = list(schema.keys()) if schema else list(norm.keys())
+
+    result = {
+        'host_id': [], 'fqdn': [], 'username': [],
+        'local_ip': [], 'remote_ip': [], 'remote_port': [], 'os': [],
+    }
+
+    for col in raw_cols:
+        canonical = (norm.get(col) or '').lower()
+        lower = col.lower()
+
+        if _HOST_ID_RE.match(lower) or (canonical == 'hostname' and lower not in ('hostname', 'host_name', 'host')):
+            result['host_id'].append(col)
+
+        if _FQDN_RE.match(lower) or canonical == 'fqdn':
+            result['fqdn'].append(col)
+
+        if _USERNAME_RE.match(lower) or canonical in ('username', 'user'):
+            result['username'].append(col)
+
+        if _LOCAL_IP_RE.match(lower):
+            result['local_ip'].append(col)
+        elif _REMOTE_IP_RE.match(lower):
+            result['remote_ip'].append(col)
+
+        if _REMOTE_PORT_RE.match(lower):
+            result['remote_port'].append(col)
+
+        if _OS_RE.match(lower) or canonical == 'os':
+            result['os'].append(col)
+
+    return result
+
+
+async def build_host_inventory(hunt_id: str, db: AsyncSession) -> dict:
+    """Build a deduplicated host inventory from all datasets in a hunt.
+
+    Returns dict with 'hosts', 'connections', and 'stats'.
+    Each host has: id, hostname, fqdn, client_id, ips, os, users, datasets, row_count.
+    """
+    ds_result = await db.execute(
+        select(Dataset).where(Dataset.hunt_id == hunt_id)
+    )
+    all_datasets = ds_result.scalars().all()
+
+    if not all_datasets:
+        return {"hosts": [], "connections": [], "stats": {
+            "total_hosts": 0, "total_datasets_scanned": 0,
+            "total_rows_scanned": 0,
+        }}
+
+    hosts: dict[str, dict] = {}          # fqdn -> host record
+    ip_to_host: dict[str, str] = {}       # local-ip -> fqdn
+    connections: dict[tuple, int] = defaultdict(int)
+    total_rows = 0
+    ds_with_hosts = 0
+
+    for ds in all_datasets:
+        cols = _identify_columns(ds)
+        if not cols['fqdn'] and not cols['host_id']:
+            continue
+        ds_with_hosts += 1
+
+        batch_size = 5000
+        offset = 0
+        while True:
+            rr = await db.execute(
+                select(DatasetRow)
+                .where(DatasetRow.dataset_id == ds.id)
+                .order_by(DatasetRow.row_index)
+                .offset(offset).limit(batch_size)
+            )
+            rows = rr.scalars().all()
+            if not rows:
+                break
+
+            for ro in rows:
+                data = ro.data or {}
+                total_rows += 1
+
+                fqdn = ''
+                for c in cols['fqdn']:
+                    fqdn = _clean(data.get(c))
+                    if fqdn:
+                        break
+                client_id = ''
+                for c in cols['host_id']:
+                    client_id = _clean(data.get(c))
+                    if client_id:
+                        break
+
+                if not fqdn and not client_id:
+                    continue
+
+                host_key = fqdn or client_id
+
+                if host_key not in hosts:
+                    short = fqdn.split('.')[0] if fqdn and '.' in fqdn else fqdn
+                    hosts[host_key] = {
+                        'id': host_key,
+                        'hostname': short or client_id,
+                        'fqdn': fqdn,
+                        'client_id': client_id,
+                        'ips': set(),
+                        'os': '',
+                        'users': set(),
+                        'datasets': set(),
+                        'row_count': 0,
+                    }
+
+                h = hosts[host_key]
+                h['datasets'].add(ds.name)
+                h['row_count'] += 1
+                if client_id and not h['client_id']:
+                    h['client_id'] = client_id
+
+                for c in cols['username']:
+                    u = _extract_username(_clean(data.get(c)))
+                    if u:
+                        h['users'].add(u)
+
+                for c in cols['local_ip']:
+                    ip = _clean(data.get(c))
+                    if _is_valid_ip(ip):
+                        h['ips'].add(ip)
+                        ip_to_host[ip] = host_key
+
+                for c in cols['os']:
+                    ov = _clean(data.get(c))
+                    if ov and not h['os']:
+                        h['os'] = ov
+
+                for c in cols['remote_ip']:
+                    rip = _clean(data.get(c))
+                    if _is_valid_ip(rip):
+                        rport = ''
+                        for pc in cols['remote_port']:
+                            rport = _clean(data.get(pc))
+                            if rport:
+                                break
+                        connections[(host_key, rip, rport)] += 1
+
+            offset += batch_size
+            if len(rows) < batch_size:
+                break
+
+    # Post-process hosts
+    for h in hosts.values():
+        if not h['os'] and h['fqdn']:
+            h['os'] = _infer_os(h['fqdn'])
+        h['ips'] = sorted(h['ips'])
+        h['users'] = sorted(h['users'])
+        h['datasets'] = sorted(h['datasets'])
+
+    # Build connections, resolving IPs to host keys
+    conn_list = []
+    seen = set()
+    for (src, dst_ip, dst_port), cnt in connections.items():
+        if dst_ip in _IGNORE_IPS:
+            continue
+        dst_host = ip_to_host.get(dst_ip, '')
+        if dst_host == src:
+            continue
+        key = tuple(sorted([src, dst_host or dst_ip]))
+        if key in seen:
+            continue
+        seen.add(key)
+        conn_list.append({
+            'source': src,
+            'target': dst_host or dst_ip,
+            'target_ip': dst_ip,
+            'port': dst_port,
+            'count': cnt,
+        })
+
+    host_list = sorted(hosts.values(), key=lambda x: x['row_count'], reverse=True)
+
+    return {
+        "hosts": host_list,
+        "connections": conn_list,
+        "stats": {
+            "total_hosts": len(host_list),
+            "total_datasets_scanned": len(all_datasets),
+            "datasets_with_hosts": ds_with_hosts,
+            "total_rows_scanned": total_rows,
+            "hosts_with_ips": sum(1 for h in host_list if h['ips']),
+            "hosts_with_users": sum(1 for h in host_list if h['users']),
+        },
+    }
--- a/backend/app/services/host_profiler.py
+++ b/backend/app/services/host_profiler.py
@@ -0,0 +1,198 @@
+"""Host profiler - per-host deep threat analysis via Wile heavy models."""
+
+from __future__ import annotations
+
+import asyncio
+import json
+import logging
+
+import httpx
+from sqlalchemy import select
+
+from app.config import settings
+from app.db.engine import async_session
+from app.db.models import Dataset, DatasetRow, HostProfile, TriageResult
+
+logger = logging.getLogger(__name__)
+
+HEAVY_MODEL = settings.DEFAULT_HEAVY_MODEL
+WILE_URL = f"{settings.wile_url}/api/generate"
+
+
+async def _get_triage_summary(db, dataset_id: str) -> str:
+    result = await db.execute(
+        select(TriageResult)
+        .where(TriageResult.dataset_id == dataset_id)
+        .where(TriageResult.risk_score >= 3.0)
+        .order_by(TriageResult.risk_score.desc())
+        .limit(10)
+    )
+    triages = result.scalars().all()
+    if not triages:
+        return "No significant triage findings."
+    lines = []
+    for t in triages:
+        lines.append(
+            f"- Rows {t.row_start}-{t.row_end}: risk={t.risk_score:.1f} "
+            f"verdict={t.verdict} findings={json.dumps(t.findings, default=str)[:300]}"
+        )
+    return "\n".join(lines)
+
+
+async def _collect_host_data(db, hunt_id: str, hostname: str, fqdn: str | None = None) -> dict:
+    result = await db.execute(select(Dataset).where(Dataset.hunt_id == hunt_id))
+    datasets = result.scalars().all()
+
+    host_data: dict[str, list[dict]] = {}
+    triage_parts: list[str] = []
+
+    for ds in datasets:
+        artifact_type = getattr(ds, "artifact_type", None) or "Unknown"
+        rows_result = await db.execute(
+            select(DatasetRow).where(DatasetRow.dataset_id == ds.id).limit(500)
+        )
+        rows = rows_result.scalars().all()
+
+        matching = []
+        for r in rows:
+            data = r.normalized_data or r.data
+            row_host = (
+                data.get("hostname", "") or data.get("Fqdn", "")
+                or data.get("ClientId", "") or data.get("client_id", "")
+            )
+            if hostname.lower() in str(row_host).lower():
+                matching.append(data)
+            elif fqdn and fqdn.lower() in str(row_host).lower():
+                matching.append(data)
+
+        if matching:
+            host_data[artifact_type] = matching[:50]
+            triage_info = await _get_triage_summary(db, ds.id)
+            triage_parts.append(f"\n### {artifact_type} ({len(matching)} rows)\n{triage_info}")
+
+    return {
+        "artifacts": host_data,
+        "triage_summary": "\n".join(triage_parts) or "No triage data.",
+        "artifact_count": sum(len(v) for v in host_data.values()),
+    }
+
+
+async def profile_host(
+    hunt_id: str, hostname: str, fqdn: str | None = None, client_id: str | None = None,
+) -> None:
+    logger.info("Profiling host %s in hunt %s", hostname, hunt_id)
+
+    async with async_session() as db:
+        host_data = await _collect_host_data(db, hunt_id, hostname, fqdn)
+        if host_data["artifact_count"] == 0:
+            logger.info("No data found for host %s, skipping", hostname)
+            return
+
+        system_prompt = (
+            "You are a senior threat hunting analyst performing deep host analysis.\n"
+            "You receive consolidated forensic artifacts and prior triage results for a single host.\n\n"
+            "Provide a comprehensive host threat profile as JSON:\n"
+            "- risk_score: 0.0 (clean) to 10.0 (actively compromised)\n"
+            "- risk_level: low/medium/high/critical\n"
+            "- suspicious_findings: list of specific concerns\n"
+            "- mitre_techniques: list of MITRE ATT&CK technique IDs\n"
+            "- timeline_summary: brief timeline of suspicious activity\n"
+            "- analysis: detailed narrative assessment\n\n"
+            "Consider: cross-artifact correlation, attack patterns, LOLBins, anomalies.\n"
+            "Respond with valid JSON only."
+        )
+
+        artifact_summary = {}
+        for art_type, rows in host_data["artifacts"].items():
+            artifact_summary[art_type] = [
+                {k: str(v)[:150] for k, v in row.items() if v} for row in rows[:20]
+            ]
+
+        prompt = (
+            f"Host: {hostname}\nFQDN: {fqdn or 'unknown'}\n\n"
+            f"## Prior Triage Results\n{host_data['triage_summary']}\n\n"
+            f"## Artifact Data ({host_data['artifact_count']} total rows)\n"
+            f"{json.dumps(artifact_summary, indent=1, default=str)[:8000]}\n\n"
+            "Provide your comprehensive host threat profile as JSON."
+        )
+
+        try:
+            async with httpx.AsyncClient(timeout=300.0) as client:
+                resp = await client.post(
+                    WILE_URL,
+                    json={
+                        "model": HEAVY_MODEL,
+                        "prompt": prompt,
+                        "system": system_prompt,
+                        "stream": False,
+                        "options": {"temperature": 0.3, "num_predict": 4096},
+                    },
+                )
+                resp.raise_for_status()
+                llm_text = resp.json().get("response", "")
+
+            from app.services.triage import _parse_llm_response
+            parsed = _parse_llm_response(llm_text)
+
+            profile = HostProfile(
+                hunt_id=hunt_id,
+                hostname=hostname,
+                fqdn=fqdn,
+                client_id=client_id,
+                risk_score=float(parsed.get("risk_score", 0.0)),
+                risk_level=parsed.get("risk_level", "low"),
+                artifact_summary={a: len(r) for a, r in host_data["artifacts"].items()},
+                timeline_summary=parsed.get("timeline_summary", ""),
+                suspicious_findings=parsed.get("suspicious_findings", []),
+                mitre_techniques=parsed.get("mitre_techniques", []),
+                llm_analysis=parsed.get("analysis", llm_text[:5000]),
+                model_used=HEAVY_MODEL,
+                node_used="wile",
+            )
+            db.add(profile)
+            await db.commit()
+            logger.info("Host profile %s: risk=%.1f level=%s", hostname, profile.risk_score, profile.risk_level)
+
+        except Exception as e:
+            logger.error("Failed to profile host %s: %s", hostname, e)
+            profile = HostProfile(
+                hunt_id=hunt_id, hostname=hostname, fqdn=fqdn,
+                risk_score=0.0, risk_level="unknown",
+                llm_analysis=f"Error: {e}",
+                model_used=HEAVY_MODEL, node_used="wile",
+            )
+            db.add(profile)
+            await db.commit()
+
+
+async def profile_all_hosts(hunt_id: str) -> None:
+    logger.info("Starting host profiling for hunt %s", hunt_id)
+
+    async with async_session() as db:
+        result = await db.execute(select(Dataset).where(Dataset.hunt_id == hunt_id))
+        datasets = result.scalars().all()
+
+        hostnames: dict[str, str | None] = {}
+        for ds in datasets:
+            rows_result = await db.execute(
+                select(DatasetRow).where(DatasetRow.dataset_id == ds.id).limit(2000)
+            )
+            for r in rows_result.scalars().all():
+                data = r.normalized_data or r.data
+                host = data.get("hostname") or data.get("Fqdn") or data.get("Hostname")
+                if host and str(host).strip():
+                    h = str(host).strip()
+                    if h not in hostnames:
+                        hostnames[h] = data.get("fqdn") or data.get("Fqdn")
+
+    logger.info("Discovered %d unique hosts in hunt %s", len(hostnames), hunt_id)
+
+    semaphore = asyncio.Semaphore(settings.HOST_PROFILE_CONCURRENCY)
+
+    async def _bounded(hostname: str, fqdn: str | None):
+        async with semaphore:
+            await profile_host(hunt_id, hostname, fqdn)
+
+    tasks = [_bounded(h, f) for h, f in hostnames.items()]
+    await asyncio.gather(*tasks, return_exceptions=True)
+    logger.info("Host profiling complete for hunt %s (%d hosts)", hunt_id, len(hostnames))
--- a/backend/app/services/ioc_extractor.py
+++ b/backend/app/services/ioc_extractor.py
@@ -0,0 +1,210 @@
+"""IOC extraction service  extract indicators of compromise from dataset rows.
+
+Identifies: IPv4/IPv6 addresses, domain names, MD5/SHA1/SHA256 hashes,
+email addresses, URLs, and file paths that look suspicious.
+"""
+
+import re
+import logging
+from collections import defaultdict
+from typing import Optional
+
+from sqlalchemy import select, func
+from sqlalchemy.ext.asyncio import AsyncSession
+
+from app.db.models import Dataset, DatasetRow
+
+logger = logging.getLogger(__name__)
+
+#  Patterns 
+
+_IPV4 = re.compile(
+    r'\b(?:(?:25[0-5]|2[0-4]\d|[01]?\d\d?)\.){3}(?:25[0-5]|2[0-4]\d|[01]?\d\d?)\b'
+)
+_IPV6 = re.compile(r'\b(?:[0-9a-fA-F]{1,4}:){7}[0-9a-fA-F]{1,4}\b')
+_DOMAIN = re.compile(
+    r'\b(?:[a-zA-Z0-9](?:[a-zA-Z0-9\-]{0,61}[a-zA-Z0-9])?\.)'
+    r'+(?:com|net|org|io|info|biz|co|us|uk|de|ru|cn|cc|tk|xyz|top|'
+    r'online|site|club|win|work|download|stream|gdn|bid|review|racing|'
+    r'loan|date|faith|accountant|cricket|science|trade|party|men)\b',
+    re.IGNORECASE,
+)
+_MD5 = re.compile(r'\b[0-9a-fA-F]{32}\b')
+_SHA1 = re.compile(r'\b[0-9a-fA-F]{40}\b')
+_SHA256 = re.compile(r'\b[0-9a-fA-F]{64}\b')
+_EMAIL = re.compile(r'\b[a-zA-Z0-9._%+\-]+@[a-zA-Z0-9.\-]+\.[a-zA-Z]{2,}\b')
+_URL = re.compile(r'https?://[^\s<>"\']+', re.IGNORECASE)
+
+# Private / reserved IPs to skip
+_PRIVATE_NETS = re.compile(
+    r'^(10\.|172\.(1[6-9]|2\d|3[01])\.|192\.168\.|127\.|0\.|255\.)'
+)
+
+PATTERNS = {
+    'ipv4': _IPV4,
+    'ipv6': _IPV6,
+    'domain': _DOMAIN,
+    'md5': _MD5,
+    'sha1': _SHA1,
+    'sha256': _SHA256,
+    'email': _EMAIL,
+    'url': _URL,
+}
+
+
+def _is_private_ip(ip: str) -> bool:
+    return bool(_PRIVATE_NETS.match(ip))
+
+
+def extract_iocs_from_text(text: str, skip_private: bool = True) -> dict[str, set[str]]:
+    """Extract all IOC types from a block of text."""
+    result: dict[str, set[str]] = defaultdict(set)
+    for ioc_type, pattern in PATTERNS.items():
+        for match in pattern.findall(text):
+            val = match.strip().lower() if ioc_type != 'url' else match.strip()
+            # Filter private IPs
+            if ioc_type == 'ipv4' and skip_private and _is_private_ip(val):
+                continue
+            # Filter hex strings that are too generic (< 32 chars not a hash)
+            result[ioc_type].add(val)
+    return result
+
+
+async def extract_iocs_from_dataset(
+    dataset_id: str,
+    db: AsyncSession,
+    max_rows: int = 5000,
+    skip_private: bool = True,
+) -> dict[str, list[str]]:
+    """Extract IOCs from all rows of a dataset.
+
+    Returns {ioc_type: [sorted unique values]}.
+    """
+    # Load rows in batches
+    all_iocs: dict[str, set[str]] = defaultdict(set)
+    offset = 0
+    batch_size = 500
+
+    while offset < max_rows:
+        result = await db.execute(
+            select(DatasetRow.data)
+            .where(DatasetRow.dataset_id == dataset_id)
+            .order_by(DatasetRow.row_index)
+            .offset(offset)
+            .limit(batch_size)
+        )
+        rows = result.scalars().all()
+        if not rows:
+            break
+
+        for data in rows:
+            # Flatten all values to a single string for scanning
+            text = ' '.join(str(v) for v in data.values()) if isinstance(data, dict) else str(data)
+            batch_iocs = extract_iocs_from_text(text, skip_private)
+            for ioc_type, values in batch_iocs.items():
+                all_iocs[ioc_type].update(values)
+
+        offset += batch_size
+
+    # Convert sets to sorted lists
+    return {k: sorted(v) for k, v in all_iocs.items() if v}
+
+
+async def extract_host_groups(
+    hunt_id: str,
+    db: AsyncSession,
+) -> list[dict]:
+    """Group all data by hostname across datasets in a hunt.
+
+    Returns a list of host group dicts with dataset count, total rows,
+    artifact types, and time range.
+    """
+    # Get all datasets for this hunt
+    result = await db.execute(
+        select(Dataset).where(Dataset.hunt_id == hunt_id)
+    )
+    ds_list = result.scalars().all()
+    if not ds_list:
+        return []
+
+    # Known host columns (check normalized data first, then raw)
+    HOST_COLS = [
+        'hostname', 'host', 'computer_name', 'computername', 'system',
+        'machine', 'device_name', 'devicename', 'endpoint',
+        'ClientId', 'Fqdn', 'client_id', 'fqdn',
+    ]
+
+    hosts: dict[str, dict] = {}
+
+    for ds in ds_list:
+        # Sample first few rows to find host column
+        sample_result = await db.execute(
+            select(DatasetRow.data, DatasetRow.normalized_data)
+            .where(DatasetRow.dataset_id == ds.id)
+            .limit(5)
+        )
+        samples = sample_result.all()
+        if not samples:
+            continue
+
+        # Find which host column exists
+        host_col = None
+        for row_data, norm_data in samples:
+            check = norm_data if norm_data else row_data
+            if not isinstance(check, dict):
+                continue
+            for col in HOST_COLS:
+                if col in check and check[col]:
+                    host_col = col
+                    break
+            if host_col:
+                break
+
+        if not host_col:
+            continue
+
+        # Count rows per host in this dataset
+        all_rows_result = await db.execute(
+            select(DatasetRow.data, DatasetRow.normalized_data)
+            .where(DatasetRow.dataset_id == ds.id)
+        )
+        all_rows = all_rows_result.all()
+        for row_data, norm_data in all_rows:
+            check = norm_data if norm_data else row_data
+            if not isinstance(check, dict):
+                continue
+            host_val = check.get(host_col, '')
+            if not host_val or not isinstance(host_val, str):
+                continue
+            host_val = host_val.strip()
+            if not host_val:
+                continue
+
+            if host_val not in hosts:
+                hosts[host_val] = {
+                    'hostname': host_val,
+                    'dataset_ids': set(),
+                    'total_rows': 0,
+                    'artifact_types': set(),
+                    'first_seen': None,
+                    'last_seen': None,
+                }
+            hosts[host_val]['dataset_ids'].add(ds.id)
+            hosts[host_val]['total_rows'] += 1
+            if ds.artifact_type:
+                hosts[host_val]['artifact_types'].add(ds.artifact_type)
+
+    # Convert to output format
+    result_list = []
+    for h in sorted(hosts.values(), key=lambda x: x['total_rows'], reverse=True):
+        result_list.append({
+            'hostname': h['hostname'],
+            'dataset_count': len(h['dataset_ids']),
+            'total_rows': h['total_rows'],
+            'artifact_types': sorted(h['artifact_types']),
+            'first_seen': None,  # TODO: extract from timestamp columns
+            'last_seen': None,
+            'risk_score': None,  # TODO: link to host profiles
+        })
+
+    return result_list
--- a/backend/app/services/job_queue.py
+++ b/backend/app/services/job_queue.py
@@ -0,0 +1,316 @@
+"""Async job queue for background AI tasks.
+
+Manages triage, profiling, report generation, anomaly detection,
+and data queries as trackable jobs with status, progress, and
+cancellation support.
+"""
+
+from __future__ import annotations
+
+import asyncio
+import logging
+import time
+import uuid
+from dataclasses import dataclass, field
+from enum import Enum
+from typing import Any, Callable, Coroutine, Optional
+
+logger = logging.getLogger(__name__)
+
+
+class JobStatus(str, Enum):
+    QUEUED = "queued"
+    RUNNING = "running"
+    COMPLETED = "completed"
+    FAILED = "failed"
+    CANCELLED = "cancelled"
+
+
+class JobType(str, Enum):
+    TRIAGE = "triage"
+    HOST_PROFILE = "host_profile"
+    REPORT = "report"
+    ANOMALY = "anomaly"
+    QUERY = "query"
+
+
+@dataclass
+class Job:
+    id: str
+    job_type: JobType
+    status: JobStatus = JobStatus.QUEUED
+    progress: float = 0.0          # 0-100
+    message: str = ""
+    result: Any = None
+    error: str | None = None
+    created_at: float = field(default_factory=time.time)
+    started_at: float | None = None
+    completed_at: float | None = None
+    params: dict = field(default_factory=dict)
+    _cancel_event: asyncio.Event = field(default_factory=asyncio.Event, repr=False)
+
+    @property
+    def elapsed_ms(self) -> int:
+        end = self.completed_at or time.time()
+        start = self.started_at or self.created_at
+        return int((end - start) * 1000)
+
+    def to_dict(self) -> dict:
+        return {
+            "id": self.id,
+            "job_type": self.job_type.value,
+            "status": self.status.value,
+            "progress": round(self.progress, 1),
+            "message": self.message,
+            "error": self.error,
+            "created_at": self.created_at,
+            "started_at": self.started_at,
+            "completed_at": self.completed_at,
+            "elapsed_ms": self.elapsed_ms,
+            "params": self.params,
+        }
+
+    @property
+    def is_cancelled(self) -> bool:
+        return self._cancel_event.is_set()
+
+    def cancel(self):
+        self._cancel_event.set()
+        self.status = JobStatus.CANCELLED
+        self.completed_at = time.time()
+        self.message = "Cancelled by user"
+
+
+class JobQueue:
+    """In-memory async job queue with concurrency control.
+
+    Jobs are tracked by ID and can be listed, polled, or cancelled.
+    A configurable number of workers process jobs from the queue.
+    """
+
+    def __init__(self, max_workers: int = 3):
+        self._jobs: dict[str, Job] = {}
+        self._queue: asyncio.Queue[str] = asyncio.Queue()
+        self._max_workers = max_workers
+        self._workers: list[asyncio.Task] = []
+        self._handlers: dict[JobType, Callable] = {}
+        self._started = False
+
+    def register_handler(
+        self,
+        job_type: JobType,
+        handler: Callable[[Job], Coroutine],
+    ):
+        """Register an async handler for a job type.
+
+        Handler signature: async def handler(job: Job) -> Any
+        The handler can update job.progress and job.message during execution.
+        It should check job.is_cancelled periodically and return early.
+        """
+        self._handlers[job_type] = handler
+        logger.info(f"Registered handler for {job_type.value}")
+
+    async def start(self):
+        """Start worker tasks."""
+        if self._started:
+            return
+        self._started = True
+        for i in range(self._max_workers):
+            task = asyncio.create_task(self._worker(i))
+            self._workers.append(task)
+        logger.info(f"Job queue started with {self._max_workers} workers")
+
+    async def stop(self):
+        """Stop all workers."""
+        self._started = False
+        for w in self._workers:
+            w.cancel()
+        await asyncio.gather(*self._workers, return_exceptions=True)
+        self._workers.clear()
+        logger.info("Job queue stopped")
+
+    def submit(self, job_type: JobType, **params) -> Job:
+        """Submit a new job. Returns the Job object immediately."""
+        job = Job(
+            id=str(uuid.uuid4()),
+            job_type=job_type,
+            params=params,
+        )
+        self._jobs[job.id] = job
+        self._queue.put_nowait(job.id)
+        logger.info(f"Job submitted: {job.id} ({job_type.value}) params={params}")
+        return job
+
+    def get_job(self, job_id: str) -> Job | None:
+        return self._jobs.get(job_id)
+
+    def cancel_job(self, job_id: str) -> bool:
+        job = self._jobs.get(job_id)
+        if not job:
+            return False
+        if job.status in (JobStatus.COMPLETED, JobStatus.FAILED, JobStatus.CANCELLED):
+            return False
+        job.cancel()
+        return True
+
+    def list_jobs(
+        self,
+        status: JobStatus | None = None,
+        job_type: JobType | None = None,
+        limit: int = 50,
+    ) -> list[dict]:
+        """List jobs, newest first."""
+        jobs = sorted(self._jobs.values(), key=lambda j: j.created_at, reverse=True)
+        if status:
+            jobs = [j for j in jobs if j.status == status]
+        if job_type:
+            jobs = [j for j in jobs if j.job_type == job_type]
+        return [j.to_dict() for j in jobs[:limit]]
+
+    def get_stats(self) -> dict:
+        """Get queue statistics."""
+        by_status = {}
+        for j in self._jobs.values():
+            by_status[j.status.value] = by_status.get(j.status.value, 0) + 1
+        return {
+            "total": len(self._jobs),
+            "queued": self._queue.qsize(),
+            "by_status": by_status,
+            "workers": self._max_workers,
+            "active_workers": sum(
+                1 for j in self._jobs.values() if j.status == JobStatus.RUNNING
+            ),
+        }
+
+    def cleanup(self, max_age_seconds: float = 3600):
+        """Remove old completed/failed/cancelled jobs."""
+        now = time.time()
+        to_remove = [
+            jid for jid, j in self._jobs.items()
+            if j.status in (JobStatus.COMPLETED, JobStatus.FAILED, JobStatus.CANCELLED)
+            and (now - j.created_at) > max_age_seconds
+        ]
+        for jid in to_remove:
+            del self._jobs[jid]
+        if to_remove:
+            logger.info(f"Cleaned up {len(to_remove)} old jobs")
+
+    async def _worker(self, worker_id: int):
+        """Worker loop: pull jobs from queue and execute handlers."""
+        logger.info(f"Worker {worker_id} started")
+        while self._started:
+            try:
+                job_id = await asyncio.wait_for(self._queue.get(), timeout=5.0)
+            except asyncio.TimeoutError:
+                continue
+            except asyncio.CancelledError:
+                break
+
+            job = self._jobs.get(job_id)
+            if not job or job.is_cancelled:
+                continue
+
+            handler = self._handlers.get(job.job_type)
+            if not handler:
+                job.status = JobStatus.FAILED
+                job.error = f"No handler for {job.job_type.value}"
+                job.completed_at = time.time()
+                logger.error(f"No handler for job type {job.job_type.value}")
+                continue
+
+            job.status = JobStatus.RUNNING
+            job.started_at = time.time()
+            job.message = "Running..."
+            logger.info(f"Worker {worker_id}: executing {job.id} ({job.job_type.value})")
+
+            try:
+                result = await handler(job)
+                if not job.is_cancelled:
+                    job.status = JobStatus.COMPLETED
+                    job.progress = 100.0
+                    job.result = result
+                    job.message = "Completed"
+                    job.completed_at = time.time()
+                    logger.info(
+                        f"Worker {worker_id}: completed {job.id} "
+                        f"in {job.elapsed_ms}ms"
+                    )
+            except Exception as e:
+                if not job.is_cancelled:
+                    job.status = JobStatus.FAILED
+                    job.error = str(e)
+                    job.message = f"Failed: {e}"
+                    job.completed_at = time.time()
+                    logger.error(
+                        f"Worker {worker_id}: failed {job.id}: {e}",
+                        exc_info=True,
+                    )
+
+
+#  Singleton + job handlers 
+
+job_queue = JobQueue(max_workers=3)
+
+
+async def _handle_triage(job: Job):
+    """Triage handler."""
+    from app.services.triage import triage_dataset
+    dataset_id = job.params.get("dataset_id")
+    job.message = f"Triaging dataset {dataset_id}"
+    results = await triage_dataset(dataset_id)
+    return {"count": len(results) if results else 0}
+
+
+async def _handle_host_profile(job: Job):
+    """Host profiling handler."""
+    from app.services.host_profiler import profile_all_hosts, profile_host
+    hunt_id = job.params.get("hunt_id")
+    hostname = job.params.get("hostname")
+    if hostname:
+        job.message = f"Profiling host {hostname}"
+        await profile_host(hunt_id, hostname)
+        return {"hostname": hostname}
+    else:
+        job.message = f"Profiling all hosts in hunt {hunt_id}"
+        await profile_all_hosts(hunt_id)
+        return {"hunt_id": hunt_id}
+
+
+async def _handle_report(job: Job):
+    """Report generation handler."""
+    from app.services.report_generator import generate_report
+    hunt_id = job.params.get("hunt_id")
+    job.message = f"Generating report for hunt {hunt_id}"
+    report = await generate_report(hunt_id)
+    return {"report_id": report.id if report else None}
+
+
+async def _handle_anomaly(job: Job):
+    """Anomaly detection handler."""
+    from app.services.anomaly_detector import detect_anomalies
+    dataset_id = job.params.get("dataset_id")
+    k = job.params.get("k", 3)
+    threshold = job.params.get("threshold", 0.35)
+    job.message = f"Detecting anomalies in dataset {dataset_id}"
+    results = await detect_anomalies(dataset_id, k=k, outlier_threshold=threshold)
+    return {"count": len(results) if results else 0}
+
+
+async def _handle_query(job: Job):
+    """Data query handler (non-streaming)."""
+    from app.services.data_query import query_dataset
+    dataset_id = job.params.get("dataset_id")
+    question = job.params.get("question", "")
+    mode = job.params.get("mode", "quick")
+    job.message = f"Querying dataset {dataset_id}"
+    answer = await query_dataset(dataset_id, question, mode)
+    return {"answer": answer}
+
+
+def register_all_handlers():
+    """Register all job handlers."""
+    job_queue.register_handler(JobType.TRIAGE, _handle_triage)
+    job_queue.register_handler(JobType.HOST_PROFILE, _handle_host_profile)
+    job_queue.register_handler(JobType.REPORT, _handle_report)
+    job_queue.register_handler(JobType.ANOMALY, _handle_anomaly)
+    job_queue.register_handler(JobType.QUERY, _handle_query)
--- a/backend/app/services/keyword_defaults.py
+++ b/backend/app/services/keyword_defaults.py
@@ -0,0 +1,145 @@
+"""Default AUP keyword themes and their seed keywords.
+
+Called once on startup — only inserts themes that don't already exist,
+so user edits are never overwritten.
+"""
+
+import logging
+from sqlalchemy import select
+from sqlalchemy.ext.asyncio import AsyncSession
+
+from app.db.models import KeywordTheme, Keyword
+
+logger = logging.getLogger(__name__)
+
+# ── Default themes + keywords ─────────────────────────────────────────
+
+DEFAULTS: dict[str, dict] = {
+    "Gambling": {
+        "color": "#f44336",
+        "keywords": [
+            "poker", "casino", "blackjack", "roulette", "sportsbook",
+            "sports betting", "bet365", "draftkings", "fanduel", "bovada",
+            "betonline", "mybookie", "slots", "slot machine", "parlay",
+            "wager", "bookie", "betway", "888casino", "pokerstars",
+            "william hill", "ladbrokes", "betfair", "unibet", "pinnacle",
+        ],
+    },
+    "Gaming": {
+        "color": "#9c27b0",
+        "keywords": [
+            "steam", "steamcommunity", "steampowered", "epic games",
+            "epicgames", "origin.com", "battle.net", "blizzard",
+            "roblox", "minecraft", "fortnite", "valorant", "league of legends",
+            "twitch", "twitch.tv", "discord", "discord.gg", "xbox live",
+            "playstation network", "gog.com", "itch.io", "gamepass",
+            "riot games", "ubisoft", "ea.com",
+        ],
+    },
+    "Streaming": {
+        "color": "#ff9800",
+        "keywords": [
+            "netflix", "hulu", "disney+", "disneyplus", "hbomax",
+            "amazon prime video", "peacock", "paramount+", "crunchyroll",
+            "funimation", "spotify", "pandora", "soundcloud", "deezer",
+            "tidal", "apple music", "youtube music", "pluto tv",
+            "tubi", "vudu", "plex",
+        ],
+    },
+    "Downloads / Piracy": {
+        "color": "#ff5722",
+        "keywords": [
+            "torrent", "bittorrent", "utorrent", "qbittorrent", "piratebay",
+            "thepiratebay", "1337x", "rarbg", "yts", "kickass",
+            "limewire", "frostwire", "mega.nz", "rapidshare", "mediafire",
+            "zippyshare", "uploadhaven", "fitgirl", "repack", "crack",
+            "keygen", "warez", "nulled", "pirate", "magnet:",
+        ],
+    },
+    "Adult Content": {
+        "color": "#e91e63",
+        "keywords": [
+            "pornhub", "xvideos", "xhamster", "onlyfans", "chaturbate",
+            "livejasmin", "brazzers", "redtube", "youporn", "xnxx",
+            "porn", "xxx", "nsfw", "adult content", "cam site",
+            "stripchat", "bongacams",
+        ],
+    },
+    "Social Media": {
+        "color": "#2196f3",
+        "keywords": [
+            "facebook", "instagram", "tiktok", "snapchat", "pinterest",
+            "reddit", "tumblr", "myspace", "whatsapp web", "telegram web",
+            "signal web", "wechat web", "twitter.com", "x.com",
+            "threads.net", "mastodon", "bluesky",
+        ],
+    },
+    "Job Search": {
+        "color": "#4caf50",
+        "keywords": [
+            "indeed", "linkedin jobs", "glassdoor", "monster.com",
+            "ziprecruiter", "careerbuilder", "dice.com", "hired.com",
+            "angel.co", "wellfound", "levels.fyi", "salary.com",
+            "payscale", "resume", "cover letter", "job application",
+        ],
+    },
+    "Shopping": {
+        "color": "#00bcd4",
+        "keywords": [
+            "amazon.com", "ebay", "etsy", "walmart.com", "target.com",
+            "bestbuy", "aliexpress", "wish.com", "shein", "temu",
+            "wayfair", "overstock", "newegg", "zappos", "coupon",
+            "promo code", "add to cart",
+        ],
+    },
+}
+
+
+async def seed_defaults(db: AsyncSession) -> int:
+    """Insert default themes + keywords for any theme name not already in DB.
+
+    Returns the number of themes inserted (0 if all already exist).
+    """
+    # Rename legacy theme names
+    _renames = [("Social Media (Personal)", "Social Media")]
+    for old_name, new_name in _renames:
+        old = await db.scalar(select(KeywordTheme.id).where(KeywordTheme.name == old_name))
+        if old:
+            await db.execute(
+                KeywordTheme.__table__.update()
+                .where(KeywordTheme.name == old_name)
+                .values(name=new_name)
+            )
+            await db.commit()
+            logger.info("Renamed AUP theme '%s' → '%s'", old_name, new_name)
+
+    inserted = 0
+    for theme_name, meta in DEFAULTS.items():
+        exists = await db.scalar(
+            select(KeywordTheme.id).where(KeywordTheme.name == theme_name)
+        )
+        if exists:
+            continue
+
+        theme = KeywordTheme(
+            name=theme_name,
+            color=meta["color"],
+            enabled=True,
+            is_builtin=True,
+        )
+        db.add(theme)
+        await db.flush()  # get theme.id
+
+        for kw in meta["keywords"]:
+            db.add(Keyword(theme_id=theme.id, value=kw))
+
+        inserted += 1
+        logger.info("Seeded AUP theme '%s' with %d keywords", theme_name, len(meta["keywords"]))
+
+    if inserted:
+        await db.commit()
+        logger.info("Seeded %d AUP keyword themes", inserted)
+    else:
+        logger.debug("All default AUP themes already present")
+
+    return inserted
--- a/backend/app/services/load_balancer.py
+++ b/backend/app/services/load_balancer.py
@@ -0,0 +1,193 @@
+"""Smart load balancer for Wile & Roadrunner LLM nodes.
+
+Tracks active jobs per node, health status, and routes new work
+to the least-busy healthy node.  Periodically pings both nodes
+to maintain an up-to-date health map.
+"""
+
+from __future__ import annotations
+
+import asyncio
+import logging
+import time
+from dataclasses import dataclass, field
+from enum import Enum
+from typing import Optional
+
+from app.config import settings
+
+logger = logging.getLogger(__name__)
+
+
+class NodeId(str, Enum):
+    WILE = "wile"
+    ROADRUNNER = "roadrunner"
+
+
+class WorkloadTier(str, Enum):
+    """What kind of workload is this?"""
+    HEAVY = "heavy"      # 70B models, deep analysis, reports
+    FAST = "fast"        # 7-14B models, triage, quick queries
+    EMBEDDING = "embed"  # bge-m3 embeddings
+    ANY = "any"
+
+
+@dataclass
+class NodeStatus:
+    node_id: NodeId
+    url: str
+    healthy: bool = True
+    last_check: float = 0.0
+    active_jobs: int = 0
+    total_completed: int = 0
+    total_errors: int = 0
+    avg_latency_ms: float = 0.0
+    _latencies: list[float] = field(default_factory=list)
+
+    def record_completion(self, latency_ms: float):
+        self.active_jobs = max(0, self.active_jobs - 1)
+        self.total_completed += 1
+        self._latencies.append(latency_ms)
+        # Rolling average of last 50
+        if len(self._latencies) > 50:
+            self._latencies = self._latencies[-50:]
+        self.avg_latency_ms = sum(self._latencies) / len(self._latencies)
+
+    def record_error(self):
+        self.active_jobs = max(0, self.active_jobs - 1)
+        self.total_errors += 1
+
+    def record_start(self):
+        self.active_jobs += 1
+
+
+class LoadBalancer:
+    """Routes LLM work to the least-busy healthy node.
+
+    Node capabilities:
+    - Wile: Heavy models (70B), code models (32B)
+    - Roadrunner: Fast models (7-14B), embeddings (bge-m3), vision
+    """
+
+    # Which nodes can handle which tiers
+    TIER_NODES = {
+        WorkloadTier.HEAVY: [NodeId.WILE],
+        WorkloadTier.FAST: [NodeId.ROADRUNNER, NodeId.WILE],
+        WorkloadTier.EMBEDDING: [NodeId.ROADRUNNER],
+        WorkloadTier.ANY: [NodeId.ROADRUNNER, NodeId.WILE],
+    }
+
+    def __init__(self):
+        self._nodes: dict[NodeId, NodeStatus] = {
+            NodeId.WILE: NodeStatus(
+                node_id=NodeId.WILE,
+                url=f"http://{settings.WILE_HOST}:{settings.WILE_OLLAMA_PORT}",
+            ),
+            NodeId.ROADRUNNER: NodeStatus(
+                node_id=NodeId.ROADRUNNER,
+                url=f"http://{settings.ROADRUNNER_HOST}:{settings.ROADRUNNER_OLLAMA_PORT}",
+            ),
+        }
+        self._lock = asyncio.Lock()
+        self._health_task: Optional[asyncio.Task] = None
+
+    async def start_health_loop(self, interval: float = 30.0):
+        """Start background health-check loop."""
+        if self._health_task and not self._health_task.done():
+            return
+        self._health_task = asyncio.create_task(self._health_loop(interval))
+        logger.info("Load balancer health loop started (%.0fs interval)", interval)
+
+    async def stop_health_loop(self):
+        if self._health_task:
+            self._health_task.cancel()
+            try:
+                await self._health_task
+            except asyncio.CancelledError:
+                pass
+            self._health_task = None
+
+    async def _health_loop(self, interval: float):
+        while True:
+            try:
+                await self.check_health()
+            except Exception as e:
+                logger.warning(f"Health check error: {e}")
+            await asyncio.sleep(interval)
+
+    async def check_health(self):
+        """Ping both nodes and update status."""
+        import httpx
+        async with httpx.AsyncClient(timeout=5) as client:
+            for nid, status in self._nodes.items():
+                try:
+                    resp = await client.get(f"{status.url}/api/tags")
+                    status.healthy = resp.status_code == 200
+                except Exception:
+                    status.healthy = False
+                status.last_check = time.time()
+                logger.debug(
+                    f"Health: {nid.value} = {'OK' if status.healthy else 'DOWN'} "
+                    f"(active={status.active_jobs})"
+                )
+
+    def select_node(self, tier: WorkloadTier = WorkloadTier.ANY) -> NodeId:
+        """Select the best node for a workload tier.
+
+        Strategy: among healthy nodes that support the tier,
+        pick the one with fewest active jobs.
+        Falls back to any node if none healthy.
+        """
+        candidates = self.TIER_NODES.get(tier, [NodeId.ROADRUNNER, NodeId.WILE])
+
+        # Filter to healthy candidates
+        healthy = [
+            nid for nid in candidates
+            if self._nodes[nid].healthy
+        ]
+
+        if not healthy:
+            logger.warning(f"No healthy nodes for tier {tier.value}, using first candidate")
+            healthy = candidates
+
+        # Pick least busy
+        best = min(healthy, key=lambda nid: self._nodes[nid].active_jobs)
+        return best
+
+    def acquire(self, tier: WorkloadTier = WorkloadTier.ANY) -> NodeId:
+        """Select node and mark a job started."""
+        node = self.select_node(tier)
+        self._nodes[node].record_start()
+        logger.info(
+            f"LB: dispatched {tier.value} -> {node.value} "
+            f"(active={self._nodes[node].active_jobs})"
+        )
+        return node
+
+    def release(self, node: NodeId, latency_ms: float = 0, error: bool = False):
+        """Mark a job completed on a node."""
+        status = self._nodes.get(node)
+        if not status:
+            return
+        if error:
+            status.record_error()
+        else:
+            status.record_completion(latency_ms)
+
+    def get_status(self) -> dict:
+        """Get current load balancer status."""
+        return {
+            nid.value: {
+                "healthy": s.healthy,
+                "active_jobs": s.active_jobs,
+                "total_completed": s.total_completed,
+                "total_errors": s.total_errors,
+                "avg_latency_ms": round(s.avg_latency_ms, 1),
+                "last_check": s.last_check,
+            }
+            for nid, s in self._nodes.items()
+        }
+
+
+# Singleton
+lb = LoadBalancer()
--- a/backend/app/services/normalizer.py
+++ b/backend/app/services/normalizer.py
@@ -0,0 +1,196 @@
+"""Artifact normalizer — maps Velociraptor and common tool columns to canonical schema.
+
+The canonical schema provides consistent field names regardless of which tool
+exported the CSV (Velociraptor, OSQuery, Sysmon, etc.).
+"""
+
+import logging
+import re
+from datetime import datetime
+from typing import Any
+
+logger = logging.getLogger(__name__)
+
+# ── Column mapping: source_column_pattern → canonical_name ─────────────
+# Patterns are case-insensitive regexes matched against column names.
+
+COLUMN_MAPPINGS: list[tuple[str, str]] = [
+    # Timestamps
+    (r"^(timestamp|time|event_?time|date_?time|created?_?(at|time|date)|modified_?(at|time|date)|mtime|ctime|atime|start_?time|end_?time)$", "timestamp"),
+    (r"^(eventtime|system\.timecreated)$", "timestamp"),
+    # Host identifiers
+    (r"^(hostname|host|fqdn|computer_?name|system_?name|machinename|clientid)$", "hostname"),
+    # Operating system
+    (r"^(os|operating_?system|os_?version|os_?name|platform|os_?type)$", "os"),
+    # Source / destination IPs
+    (r"^(source_?ip|src_?ip|srcaddr|local_?address|sourceaddress)$", "src_ip"),
+    (r"^(dest_?ip|dst_?ip|dstaddr|remote_?address|destinationaddress|destaddress)$", "dst_ip"),
+    (r"^(ip_?address|ipaddress|ip)$", "ip_address"),
+    # Ports
+    (r"^(source_?port|src_?port|localport)$", "src_port"),
+    (r"^(dest_?port|dst_?port|remoteport|destinationport)$", "dst_port"),
+    # Process info
+    (r"^(process_?name|name|image|exe|executable|binary)$", "process_name"),
+    (r"^(pid|process_?id)$", "pid"),
+    (r"^(ppid|parent_?pid|parentprocessid)$", "ppid"),
+    (r"^(command_?line|cmdline|commandline|cmd)$", "command_line"),
+    (r"^(parent_?command_?line|parentcommandline)$", "parent_command_line"),
+    # User info
+    (r"^(user|username|user_?name|account_?name|subjectusername)$", "username"),
+    (r"^(user_?id|uid|sid|subjectusersid)$", "user_id"),
+    # File info
+    (r"^(file_?path|fullpath|full_?name|path|filepath)$", "file_path"),
+    (r"^(file_?name|filename|name)$", "file_name"),
+    (r"^(file_?size|size|bytes|length)$", "file_size"),
+    (r"^(extension|file_?ext)$", "file_extension"),
+    # Hashes
+    (r"^(md5|md5hash|hash_?md5)$", "hash_md5"),
+    (r"^(sha1|sha1hash|hash_?sha1)$", "hash_sha1"),
+    (r"^(sha256|sha256hash|hash_?sha256|hash|filehash)$", "hash_sha256"),
+    # Network
+    (r"^(protocol|proto)$", "protocol"),
+    (r"^(domain|dns_?name|query_?name|queriedname)$", "domain"),
+    (r"^(url|uri|request_?url)$", "url"),
+    # Event info
+    (r"^(event_?id|eventid|eid)$", "event_id"),
+    (r"^(event_?type|eventtype|category|action)$", "event_type"),
+    (r"^(description|message|msg|detail)$", "description"),
+    (r"^(severity|level|priority)$", "severity"),
+    # Registry
+    (r"^(reg_?key|registry_?key|targetobject)$", "registry_key"),
+    (r"^(reg_?value|registry_?value)$", "registry_value"),
+]
+
+
+def normalize_columns(columns: list[str]) -> dict[str, str]:
+    """Map raw column names to canonical names.
+
+    Returns:
+        Dict of {raw_column_name: canonical_column_name}.
+        Columns with no match map to themselves (lowered + underscored).
+    """
+    mapping: dict[str, str] = {}
+    used_canonical: set[str] = set()
+
+    for col in columns:
+        col_lower = col.strip().lower()
+        matched = False
+        for pattern, canonical in COLUMN_MAPPINGS:
+            if re.match(pattern, col_lower, re.IGNORECASE):
+                # Avoid duplicate canonical names
+                if canonical not in used_canonical:
+                    mapping[col] = canonical
+                    used_canonical.add(canonical)
+                    matched = True
+                    break
+        if not matched:
+            # Produce a clean snake_case version
+            clean = re.sub(r"[^a-z0-9]+", "_", col_lower).strip("_")
+            mapping[col] = clean or col
+
+    return mapping
+
+
+def normalize_row(row: dict[str, Any], column_mapping: dict[str, str]) -> dict[str, Any]:
+    """Apply column mapping to a single row."""
+    return {column_mapping.get(k, k): v for k, v in row.items()}
+
+
+def normalize_rows(rows: list[dict], column_mapping: dict[str, str]) -> list[dict]:
+    """Apply column mapping to all rows."""
+    return [normalize_row(row, column_mapping) for row in rows]
+
+
+def detect_ioc_columns(
+    columns: list[str],
+    column_types: dict[str, str],
+    column_mapping: dict[str, str],
+) -> dict[str, str]:
+    """Detect which columns contain IOCs (IPs, hashes, domains).
+
+    Returns:
+        Dict of {column_name: ioc_type}.
+    """
+    ioc_columns: dict[str, str] = {}
+    ioc_type_map = {
+        "ip": "ip",
+        "hash_md5": "hash_md5",
+        "hash_sha1": "hash_sha1",
+        "hash_sha256": "hash_sha256",
+        "domain": "domain",
+    }
+
+    for col in columns:
+        col_type = column_types.get(col)
+        if col_type in ioc_type_map:
+            ioc_columns[col] = ioc_type_map[col_type]
+
+        # Also check canonical name
+        canonical = column_mapping.get(col, "")
+        if canonical in ("src_ip", "dst_ip", "ip_address"):
+            ioc_columns[col] = "ip"
+        elif canonical == "hash_md5":
+            ioc_columns[col] = "hash_md5"
+        elif canonical == "hash_sha1":
+            ioc_columns[col] = "hash_sha1"
+        elif canonical in ("hash_sha256",):
+            ioc_columns[col] = "hash_sha256"
+        elif canonical == "domain":
+            ioc_columns[col] = "domain"
+        elif canonical == "url":
+            ioc_columns[col] = "url"
+
+    return ioc_columns
+
+
+def detect_time_range(
+    rows: list[dict],
+    column_mapping: dict[str, str],
+) -> tuple[datetime | None, datetime | None]:
+    """Find the earliest and latest timestamps in the dataset."""
+    ts_col = None
+    for raw_col, canonical in column_mapping.items():
+        if canonical == "timestamp":
+            ts_col = raw_col
+            break
+
+    if not ts_col:
+        return None, None
+
+    timestamps: list[datetime] = []
+    for row in rows:
+        val = row.get(ts_col)
+        if not val:
+            continue
+        try:
+            dt = _parse_timestamp(str(val))
+            if dt:
+                timestamps.append(dt)
+        except (ValueError, TypeError):
+            continue
+
+    if not timestamps:
+        return None, None
+
+    return min(timestamps), max(timestamps)
+
+
+def _parse_timestamp(value: str) -> datetime | None:
+    """Try multiple timestamp formats."""
+    formats = [
+        "%Y-%m-%dT%H:%M:%S.%fZ",
+        "%Y-%m-%dT%H:%M:%SZ",
+        "%Y-%m-%dT%H:%M:%S.%f",
+        "%Y-%m-%dT%H:%M:%S",
+        "%Y-%m-%d %H:%M:%S.%f",
+        "%Y-%m-%d %H:%M:%S",
+        "%Y/%m/%d %H:%M:%S",
+        "%m/%d/%Y %H:%M:%S",
+        "%d/%m/%Y %H:%M:%S",
+    ]
+    for fmt in formats:
+        try:
+            return datetime.strptime(value.strip(), fmt)
+        except ValueError:
+            continue
+    return None
--- a/backend/app/services/report_generator.py
+++ b/backend/app/services/report_generator.py
@@ -0,0 +1,198 @@
+"""Report generator - debate-powered hunt report generation using Wile + Roadrunner."""
+
+from __future__ import annotations
+
+import json
+import logging
+import time
+
+import httpx
+from sqlalchemy import select
+
+from app.config import settings
+from app.db.engine import async_session
+from app.db.models import (
+    Dataset, HostProfile, HuntReport, TriageResult,
+)
+from app.services.triage import _parse_llm_response
+
+logger = logging.getLogger(__name__)
+
+WILE_URL = f"{settings.wile_url}/api/generate"
+ROADRUNNER_URL = f"{settings.roadrunner_url}/api/generate"
+HEAVY_MODEL = settings.DEFAULT_HEAVY_MODEL
+FAST_MODEL = "qwen2.5-coder:7b-instruct-q4_K_M"
+
+
+async def _llm_call(url: str, model: str, system: str, prompt: str, timeout: float = 300.0) -> str:
+    async with httpx.AsyncClient(timeout=timeout) as client:
+        resp = await client.post(
+            url,
+            json={
+                "model": model,
+                "prompt": prompt,
+                "system": system,
+                "stream": False,
+                "options": {"temperature": 0.3, "num_predict": 8192},
+            },
+        )
+        resp.raise_for_status()
+        return resp.json().get("response", "")
+
+
+async def _gather_evidence(db, hunt_id: str) -> dict:
+    ds_result = await db.execute(select(Dataset).where(Dataset.hunt_id == hunt_id))
+    datasets = ds_result.scalars().all()
+
+    dataset_summary = []
+    all_triage = []
+    for ds in datasets:
+        ds_info = {
+            "name": ds.name,
+            "artifact_type": getattr(ds, "artifact_type", "Unknown"),
+            "row_count": ds.row_count or 0,
+        }
+        dataset_summary.append(ds_info)
+
+        triage_result = await db.execute(
+            select(TriageResult)
+            .where(TriageResult.dataset_id == ds.id)
+            .where(TriageResult.risk_score >= 3.0)
+            .order_by(TriageResult.risk_score.desc())
+            .limit(15)
+        )
+        for t in triage_result.scalars().all():
+            all_triage.append({
+                "dataset": ds.name,
+                "artifact_type": ds_info["artifact_type"],
+                "rows": f"{t.row_start}-{t.row_end}",
+                "risk_score": t.risk_score,
+                "verdict": t.verdict,
+                "findings": t.findings[:5] if t.findings else [],
+                "indicators": t.suspicious_indicators[:5] if t.suspicious_indicators else [],
+                "mitre": t.mitre_techniques or [],
+            })
+
+    profile_result = await db.execute(
+        select(HostProfile)
+        .where(HostProfile.hunt_id == hunt_id)
+        .order_by(HostProfile.risk_score.desc())
+    )
+    profiles = profile_result.scalars().all()
+    host_summaries = []
+    for p in profiles:
+        host_summaries.append({
+            "hostname": p.hostname,
+            "risk_score": p.risk_score,
+            "risk_level": p.risk_level,
+            "findings": p.suspicious_findings[:5] if p.suspicious_findings else [],
+            "mitre": p.mitre_techniques or [],
+            "timeline": (p.timeline_summary or "")[:300],
+        })
+
+    return {
+        "datasets": dataset_summary,
+        "triage_findings": all_triage[:30],
+        "host_profiles": host_summaries,
+        "total_datasets": len(datasets),
+        "total_rows": sum(d["row_count"] for d in dataset_summary),
+        "high_risk_hosts": len([h for h in host_summaries if h["risk_score"] >= 7.0]),
+    }
+
+
+async def generate_report(hunt_id: str) -> None:
+    logger.info("Generating report for hunt %s", hunt_id)
+    start = time.monotonic()
+
+    async with async_session() as db:
+        report = HuntReport(
+            hunt_id=hunt_id,
+            status="generating",
+            models_used=[HEAVY_MODEL, FAST_MODEL],
+        )
+        db.add(report)
+        await db.commit()
+        await db.refresh(report)
+        report_id = report.id
+
+        try:
+            evidence = await _gather_evidence(db, hunt_id)
+            evidence_text = json.dumps(evidence, indent=1, default=str)[:12000]
+
+            # Phase 1: Wile initial analysis
+            logger.info("Report phase 1: Wile initial analysis")
+            phase1 = await _llm_call(
+                WILE_URL, HEAVY_MODEL,
+                system=(
+                    "You are a senior threat intelligence analyst writing a hunt report.\n"
+                    "Analyze all evidence and produce a structured threat assessment.\n"
+                    "Include: executive summary, detailed findings per host, MITRE mapping,\n"
+                    "IOC table, risk rankings, and actionable recommendations.\n"
+                    "Use markdown formatting. Be thorough and specific."
+                ),
+                prompt=f"Hunt evidence:\n{evidence_text}\n\nProduce your initial threat assessment.",
+            )
+
+            # Phase 2: Roadrunner critical review
+            logger.info("Report phase 2: Roadrunner critical review")
+            phase2 = await _llm_call(
+                ROADRUNNER_URL, FAST_MODEL,
+                system=(
+                    "You are a critical reviewer of threat hunt reports.\n"
+                    "Review the initial assessment and identify:\n"
+                    "- Missing correlations or overlooked indicators\n"
+                    "- False positive risks or overblown findings\n"
+                    "- Additional MITRE techniques that should be mapped\n"
+                    "- Gaps in recommendations\n"
+                    "Be specific and constructive. Respond in markdown."
+                ),
+                prompt=f"Evidence:\n{evidence_text[:4000]}\n\nInitial Assessment:\n{phase1[:6000]}\n\nProvide your critical review.",
+                timeout=120.0,
+            )
+
+            # Phase 3: Wile final synthesis
+            logger.info("Report phase 3: Wile final synthesis")
+            synthesis_prompt = (
+                f"Original evidence:\n{evidence_text[:6000]}\n\n"
+                f"Initial assessment:\n{phase1[:5000]}\n\n"
+                f"Critical review:\n{phase2[:3000]}\n\n"
+                "Produce the FINAL hunt report incorporating the review feedback.\n"
+                "Return JSON with these keys:\n"
+                "- executive_summary: 2-3 paragraph executive summary\n"
+                "- findings: list of {title, severity, description, evidence, mitre_ids}\n"
+                "- recommendations: list of {priority, action, rationale}\n"
+                "- mitre_mapping: dict of technique_id -> {name, description, evidence}\n"
+                "- ioc_table: list of {type, value, context, confidence}\n"
+                "- host_risk_summary: list of {hostname, risk_score, risk_level, key_findings}\n"
+                "Respond with valid JSON only."
+            )
+            phase3_text = await _llm_call(
+                WILE_URL, HEAVY_MODEL,
+                system="You are producing the final, definitive threat hunt report. Incorporate all feedback. Respond with valid JSON only.",
+                prompt=synthesis_prompt,
+            )
+
+            parsed = _parse_llm_response(phase3_text)
+            elapsed_ms = int((time.monotonic() - start) * 1000)
+
+            full_report = f"# Threat Hunt Report\n\n{phase1}\n\n---\n## Review Notes\n{phase2}\n\n---\n## Final Synthesis\n{phase3_text}"
+
+            report.status = "complete"
+            report.exec_summary = parsed.get("executive_summary", phase1[:2000])
+            report.full_report = full_report
+            report.findings = parsed.get("findings", [])
+            report.recommendations = parsed.get("recommendations", [])
+            report.mitre_mapping = parsed.get("mitre_mapping", {})
+            report.ioc_table = parsed.get("ioc_table", [])
+            report.host_risk_summary = parsed.get("host_risk_summary", [])
+            report.generation_time_ms = elapsed_ms
+            await db.commit()
+
+            logger.info("Report %s complete in %dms", report_id, elapsed_ms)
+
+        except Exception as e:
+            logger.error("Report generation failed for hunt %s: %s", hunt_id, e)
+            report.status = "error"
+            report.exec_summary = f"Report generation failed: {e}"
+            report.generation_time_ms = int((time.monotonic() - start) * 1000)
+            await db.commit()
--- a/backend/app/services/reports.py
+++ b/backend/app/services/reports.py
@@ -0,0 +1,425 @@
+"""Report generation — JSON, HTML, and CSV export for hunt investigations.
+
+Generates comprehensive investigation reports including:
+- Hunt metadata and status
+- Dataset summaries with IOC counts
+- Hypotheses and their evidence
+- Annotations timeline
+- Enrichment verdicts
+- Agent conversation history
+- Cross-hunt correlations
+"""
+
+import csv
+import io
+import json
+import logging
+from dataclasses import asdict
+from datetime import datetime, timezone
+from typing import Optional
+
+from sqlalchemy import select
+from sqlalchemy.ext.asyncio import AsyncSession
+
+from app.db.models import (
+    Hunt, Dataset, DatasetRow, Hypothesis,
+    Annotation, Conversation, Message, EnrichmentResult,
+)
+
+logger = logging.getLogger(__name__)
+
+
+class ReportGenerator:
+    """Generates exportable investigation reports."""
+
+    async def generate_hunt_report(
+        self,
+        hunt_id: str,
+        db: AsyncSession,
+        format: str = "json",
+        include_rows: bool = False,
+        max_rows: int = 500,
+    ) -> dict | str:
+        """Generate a comprehensive report for a hunt investigation."""
+
+        # Gather all hunt data
+        report_data = await self._gather_hunt_data(
+            hunt_id, db, include_rows=include_rows, max_rows=max_rows,
+        )
+
+        if not report_data:
+            return {"error": "Hunt not found"}
+
+        if format == "json":
+            return report_data
+        elif format == "html":
+            return self._render_html(report_data)
+        elif format == "csv":
+            return self._render_csv(report_data)
+        else:
+            return report_data
+
+    async def _gather_hunt_data(
+        self,
+        hunt_id: str,
+        db: AsyncSession,
+        include_rows: bool = False,
+        max_rows: int = 500,
+    ) -> dict | None:
+        """Gather all data for a hunt report."""
+
+        # Hunt metadata
+        result = await db.execute(select(Hunt).where(Hunt.id == hunt_id))
+        hunt = result.scalar_one_or_none()
+        if not hunt:
+            return None
+
+        # Datasets
+        ds_result = await db.execute(
+            select(Dataset).where(Dataset.hunt_id == hunt_id)
+        )
+        datasets = ds_result.scalars().all()
+
+        dataset_summaries = []
+        all_iocs = {}
+        for ds in datasets:
+            summary = {
+                "id": ds.id,
+                "name": ds.name,
+                "filename": ds.filename,
+                "source_tool": ds.source_tool,
+                "row_count": ds.row_count,
+                "columns": list((ds.column_schema or {}).keys()),
+                "ioc_columns": ds.ioc_columns or {},
+                "time_range": {
+                    "start": ds.time_range_start,
+                    "end": ds.time_range_end,
+                },
+                "created_at": ds.created_at.isoformat(),
+            }
+
+            if include_rows:
+                rows_result = await db.execute(
+                    select(DatasetRow)
+                    .where(DatasetRow.dataset_id == ds.id)
+                    .order_by(DatasetRow.row_index)
+                    .limit(max_rows)
+                )
+                rows = rows_result.scalars().all()
+                summary["rows"] = [r.data for r in rows]
+
+            dataset_summaries.append(summary)
+
+            # Collect IOCs for enrichment lookup
+            if ds.ioc_columns:
+                all_iocs.update(ds.ioc_columns)
+
+        # Hypotheses
+        hyp_result = await db.execute(
+            select(Hypothesis).where(Hypothesis.hunt_id == hunt_id)
+        )
+        hypotheses = hyp_result.scalars().all()
+
+        hypotheses_data = [
+            {
+                "id": h.id,
+                "title": h.title,
+                "description": h.description,
+                "mitre_technique": h.mitre_technique,
+                "status": h.status,
+                "evidence_row_ids": h.evidence_row_ids,
+                "evidence_notes": h.evidence_notes,
+                "created_at": h.created_at.isoformat(),
+                "updated_at": h.updated_at.isoformat(),
+            }
+            for h in hypotheses
+        ]
+
+        # Annotations (across all datasets in this hunt)
+        dataset_ids = [ds.id for ds in datasets]
+        annotations_data = []
+        if dataset_ids:
+            ann_result = await db.execute(
+                select(Annotation)
+                .where(Annotation.dataset_id.in_(dataset_ids))
+                .order_by(Annotation.created_at)
+            )
+            annotations = ann_result.scalars().all()
+            annotations_data = [
+                {
+                    "id": a.id,
+                    "dataset_id": a.dataset_id,
+                    "row_id": a.row_id,
+                    "text": a.text,
+                    "severity": a.severity,
+                    "tag": a.tag,
+                    "created_at": a.created_at.isoformat(),
+                }
+                for a in annotations
+            ]
+
+        # Conversations
+        conv_result = await db.execute(
+            select(Conversation).where(Conversation.hunt_id == hunt_id)
+        )
+        conversations = conv_result.scalars().all()
+
+        conversations_data = []
+        for conv in conversations:
+            msg_result = await db.execute(
+                select(Message)
+                .where(Message.conversation_id == conv.id)
+                .order_by(Message.created_at)
+            )
+            messages = msg_result.scalars().all()
+            conversations_data.append({
+                "id": conv.id,
+                "title": conv.title,
+                "messages": [
+                    {
+                        "role": m.role,
+                        "content": m.content,
+                        "model_used": m.model_used,
+                        "node_used": m.node_used,
+                        "latency_ms": m.latency_ms,
+                        "created_at": m.created_at.isoformat(),
+                    }
+                    for m in messages
+                ],
+            })
+
+        # Enrichment results
+        enrichment_data = []
+        for ds in datasets:
+            if not ds.ioc_columns:
+                continue
+            # Get unique enriched IOCs for this dataset
+            for col_name in ds.ioc_columns.keys():
+                enrich_result = await db.execute(
+                    select(EnrichmentResult)
+                    .where(EnrichmentResult.source.isnot(None))
+                    .limit(100)
+                )
+                enrichments = enrich_result.scalars().all()
+                for e in enrichments:
+                    enrichment_data.append({
+                        "ioc_value": e.ioc_value,
+                        "ioc_type": e.ioc_type,
+                        "source": e.source,
+                        "verdict": e.verdict,
+                        "score": e.score,
+                        "tags": e.tags,
+                        "country": e.country,
+                    })
+                break  # Only query once
+
+        # Build report
+        now = datetime.now(timezone.utc)
+        return {
+            "report_metadata": {
+                "generated_at": now.isoformat(),
+                "format_version": "1.0",
+                "generator": "ThreatHunt Report Engine",
+            },
+            "hunt": {
+                "id": hunt.id,
+                "name": hunt.name,
+                "description": hunt.description,
+                "status": hunt.status,
+                "created_at": hunt.created_at.isoformat(),
+                "updated_at": hunt.updated_at.isoformat(),
+            },
+            "summary": {
+                "dataset_count": len(datasets),
+                "total_rows": sum(ds.row_count for ds in datasets),
+                "hypothesis_count": len(hypotheses),
+                "confirmed_hypotheses": len([h for h in hypotheses if h.status == "confirmed"]),
+                "annotation_count": len(annotations_data),
+                "critical_annotations": len([a for a in annotations_data if a["severity"] == "critical"]),
+                "conversation_count": len(conversations_data),
+                "enrichment_count": len(enrichment_data),
+                "malicious_iocs": len([e for e in enrichment_data if e["verdict"] == "malicious"]),
+            },
+            "datasets": dataset_summaries,
+            "hypotheses": hypotheses_data,
+            "annotations": annotations_data,
+            "conversations": conversations_data,
+            "enrichments": enrichment_data[:100],
+        }
+
+    def _render_html(self, data: dict) -> str:
+        """Render report as self-contained HTML."""
+        hunt = data.get("hunt", {})
+        summary = data.get("summary", {})
+        hypotheses = data.get("hypotheses", [])
+        annotations = data.get("annotations", [])
+        datasets = data.get("datasets", [])
+        enrichments = data.get("enrichments", [])
+        meta = data.get("report_metadata", {})
+
+        html = f"""<!DOCTYPE html>
+<html lang="en">
+<head>
+<meta charset="UTF-8">
+<meta name="viewport" content="width=device-width, initial-scale=1.0">
+<title>ThreatHunt Report: {hunt.get('name', 'Unknown')}</title>
+<style>
+  :root {{ --bg: #0d1117; --surface: #161b22; --border: #30363d; --text: #c9d1d9; --accent: #58a6ff; --red: #f85149; --orange: #d29922; --green: #3fb950; }}
+  * {{ box-sizing: border-box; margin: 0; padding: 0; }}
+  body {{ font-family: -apple-system, BlinkMacSystemFont, 'Segoe UI', Helvetica, Arial, sans-serif; background: var(--bg); color: var(--text); line-height: 1.6; padding: 2rem; }}
+  .container {{ max-width: 1200px; margin: 0 auto; }}
+  h1 {{ color: var(--accent); border-bottom: 2px solid var(--border); padding-bottom: 0.5rem; margin-bottom: 1rem; }}
+  h2 {{ color: var(--accent); margin: 1.5rem 0 0.75rem; }}
+  h3 {{ color: var(--text); margin: 1rem 0 0.5rem; }}
+  .card {{ background: var(--surface); border: 1px solid var(--border); border-radius: 8px; padding: 1rem; margin: 0.75rem 0; }}
+  .stat-grid {{ display: grid; grid-template-columns: repeat(auto-fit, minmax(180px, 1fr)); gap: 0.75rem; }}
+  .stat {{ background: var(--surface); border: 1px solid var(--border); border-radius: 8px; padding: 1rem; text-align: center; }}
+  .stat .value {{ font-size: 2rem; font-weight: 700; color: var(--accent); }}
+  .stat .label {{ font-size: 0.85rem; color: #8b949e; }}
+  table {{ width: 100%; border-collapse: collapse; margin: 0.5rem 0; }}
+  th, td {{ padding: 0.5rem 0.75rem; border: 1px solid var(--border); text-align: left; }}
+  th {{ background: var(--surface); color: var(--accent); }}
+  .badge {{ display: inline-block; padding: 0.15rem 0.5rem; border-radius: 999px; font-size: 0.8rem; font-weight: 600; }}
+  .badge-malicious {{ background: var(--red); color: white; }}
+  .badge-suspicious {{ background: var(--orange); color: #000; }}
+  .badge-clean {{ background: var(--green); color: #000; }}
+  .badge-critical {{ background: var(--red); color: white; }}
+  .badge-high {{ background: #da3633; color: white; }}
+  .badge-medium {{ background: var(--orange); color: #000; }}
+  .badge-confirmed {{ background: var(--green); color: #000; }}
+  .badge-active {{ background: var(--accent); color: #000; }}
+  .footer {{ margin-top: 2rem; padding-top: 1rem; border-top: 1px solid var(--border); color: #8b949e; font-size: 0.85rem; }}
+</style>
+</head>
+<body>
+<div class="container">
+<h1>🔍 ThreatHunt Report: {hunt.get('name', 'Untitled')}</h1>
+<p><strong>Hunt ID:</strong> {hunt.get('id', '')}<br>
+<strong>Status:</strong> {hunt.get('status', 'unknown')}<br>
+<strong>Description:</strong> {hunt.get('description', 'N/A')}<br>
+<strong>Created:</strong> {hunt.get('created_at', '')}</p>
+
+<h2>Summary</h2>
+<div class="stat-grid">
+  <div class="stat"><div class="value">{summary.get('dataset_count', 0)}</div><div class="label">Datasets</div></div>
+  <div class="stat"><div class="value">{summary.get('total_rows', 0):,}</div><div class="label">Total Rows</div></div>
+  <div class="stat"><div class="value">{summary.get('hypothesis_count', 0)}</div><div class="label">Hypotheses</div></div>
+  <div class="stat"><div class="value">{summary.get('confirmed_hypotheses', 0)}</div><div class="label">Confirmed</div></div>
+  <div class="stat"><div class="value">{summary.get('annotation_count', 0)}</div><div class="label">Annotations</div></div>
+  <div class="stat"><div class="value">{summary.get('malicious_iocs', 0)}</div><div class="label">Malicious IOCs</div></div>
+</div>
+"""
+
+        # Hypotheses section
+        if hypotheses:
+            html += "<h2>Hypotheses</h2>\n"
+            html += "<table><tr><th>Title</th><th>MITRE</th><th>Status</th><th>Description</th></tr>\n"
+            for h in hypotheses:
+                status_class = f"badge-{h['status']}" if h['status'] in ('confirmed', 'active') else ""
+                html += (
+                    f"<tr><td>{h['title']}</td>"
+                    f"<td>{h.get('mitre_technique', 'N/A')}</td>"
+                    f"<td><span class='badge {status_class}'>{h['status']}</span></td>"
+                    f"<td>{h.get('description', '') or ''}</td></tr>\n"
+                )
+            html += "</table>\n"
+
+        # Datasets section
+        if datasets:
+            html += "<h2>Datasets</h2>\n"
+            for ds in datasets:
+                html += f"""<div class="card">
+<h3>{ds['name']} ({ds.get('filename', '')})</h3>
+<p><strong>Source:</strong> {ds.get('source_tool', 'N/A')} |
+<strong>Rows:</strong> {ds['row_count']:,} |
+<strong>IOC Columns:</strong> {len(ds.get('ioc_columns', {}))} |
+<strong>Time Range:</strong> {ds.get('time_range', {}).get('start', 'N/A')} to {ds.get('time_range', {}).get('end', 'N/A')}</p>
+</div>\n"""
+
+        # Annotations
+        if annotations:
+            critical = [a for a in annotations if a['severity'] in ('critical', 'high')]
+            html += f"<h2>Annotations ({len(annotations)} total, {len(critical)} critical/high)</h2>\n"
+            html += "<table><tr><th>Severity</th><th>Tag</th><th>Text</th><th>Created</th></tr>\n"
+            for a in annotations[:50]:
+                sev_class = f"badge-{a['severity']}" if a['severity'] in ('critical', 'high', 'medium') else ""
+                html += (
+                    f"<tr><td><span class='badge {sev_class}'>{a['severity']}</span></td>"
+                    f"<td>{a.get('tag', 'N/A')}</td>"
+                    f"<td>{a['text'][:200]}</td>"
+                    f"<td>{a['created_at'][:19]}</td></tr>\n"
+                )
+            html += "</table>\n"
+
+        # Enrichments
+        if enrichments:
+            malicious = [e for e in enrichments if e['verdict'] == 'malicious']
+            html += f"<h2>IOC Enrichment ({len(enrichments)} results, {len(malicious)} malicious)</h2>\n"
+            html += "<table><tr><th>IOC</th><th>Type</th><th>Source</th><th>Verdict</th><th>Score</th></tr>\n"
+            for e in enrichments[:50]:
+                verdict_class = f"badge-{e['verdict']}"
+                html += (
+                    f"<tr><td><code>{e['ioc_value']}</code></td>"
+                    f"<td>{e['ioc_type']}</td>"
+                    f"<td>{e['source']}</td>"
+                    f"<td><span class='badge {verdict_class}'>{e['verdict']}</span></td>"
+                    f"<td>{e.get('score', 0)}</td></tr>\n"
+                )
+            html += "</table>\n"
+
+        html += f"""
+<div class="footer">
+  <p>Generated by ThreatHunt Report Engine | {meta.get('generated_at', '')[:19]}</p>
+</div>
+</div>
+</body>
+</html>"""
+
+        return html
+
+    def _render_csv(self, data: dict) -> str:
+        """Render key report data as CSV."""
+        output = io.StringIO()
+
+        # Hypotheses sheet
+        output.write("=== HYPOTHESES ===\n")
+        writer = csv.writer(output)
+        writer.writerow(["Title", "MITRE Technique", "Status", "Description", "Evidence Notes"])
+        for h in data.get("hypotheses", []):
+            writer.writerow([
+                h.get("title", ""),
+                h.get("mitre_technique", ""),
+                h.get("status", ""),
+                h.get("description", ""),
+                h.get("evidence_notes", ""),
+            ])
+
+        output.write("\n=== ANNOTATIONS ===\n")
+        writer.writerow(["Severity", "Tag", "Text", "Dataset ID", "Row ID", "Created"])
+        for a in data.get("annotations", []):
+            writer.writerow([
+                a.get("severity", ""),
+                a.get("tag", ""),
+                a.get("text", ""),
+                a.get("dataset_id", ""),
+                a.get("row_id", ""),
+                a.get("created_at", ""),
+            ])
+
+        output.write("\n=== ENRICHMENTS ===\n")
+        writer.writerow(["IOC Value", "IOC Type", "Source", "Verdict", "Score", "Country"])
+        for e in data.get("enrichments", []):
+            writer.writerow([
+                e.get("ioc_value", ""),
+                e.get("ioc_type", ""),
+                e.get("source", ""),
+                e.get("verdict", ""),
+                e.get("score", ""),
+                e.get("country", ""),
+            ])
+
+        return output.getvalue()
+
+
+# Singleton
+report_generator = ReportGenerator()
--- a/backend/app/services/sans_rag.py
+++ b/backend/app/services/sans_rag.py
@@ -0,0 +1,346 @@
+"""SANS RAG service — queries the 300GB SANS courseware indexed in Open WebUI.
+
+Provides contextual SANS references for threat hunting guidance.
+Uses two approaches:
+1. Open WebUI RAG pipeline (if configured with a knowledge collection)
+2. Embedding-based semantic search against locally indexed SANS content
+"""
+
+import asyncio
+import logging
+import re
+import time
+from dataclasses import dataclass, field
+from typing import Optional
+
+import httpx
+
+from app.config import settings
+from app.agents.providers_v2 import _get_client
+from app.agents.registry import Node
+
+logger = logging.getLogger(__name__)
+
+
+# ── SANS course catalog for reference matching ────────────────────────
+
+SANS_COURSES = {
+    "SEC401": "Security Essentials",
+    "SEC504": "Hacker Tools, Techniques, and Incident Handling",
+    "SEC503": "Network Monitoring and Threat Detection In-Depth",
+    "SEC505": "Securing Windows and PowerShell Automation",
+    "SEC506": "Securing Linux/Unix",
+    "SEC510": "Public Cloud Security: AWS, Azure, and GCP",
+    "SEC511": "Continuous Monitoring and Security Operations",
+    "SEC530": "Defensible Security Architecture and Engineering",
+    "SEC540": "Cloud Security and DevSecOps Automation",
+    "SEC555": "SIEM with Tactical Analytics",
+    "SEC560": "Enterprise Penetration Testing",
+    "SEC565": "Red Team Operations and Adversary Emulation",
+    "SEC573": "Automating Information Security with Python",
+    "SEC575": "Mobile Device Security and Ethical Hacking",
+    "SEC588": "Cloud Penetration Testing",
+    "SEC599": "Defeating Advanced Adversaries - Purple Team Tactics",
+    "FOR408": "Windows Forensic Analysis",
+    "FOR498": "Digital Acquisition and Rapid Triage",
+    "FOR500": "Windows Forensic Analysis",
+    "FOR508": "Advanced Incident Response, Threat Hunting, and Digital Forensics",
+    "FOR509": "Enterprise Cloud Forensics and Incident Response",
+    "FOR518": "Mac and iOS Forensic Analysis and Incident Response",
+    "FOR572": "Advanced Network Forensics: Threat Hunting, Analysis, and Incident Response",
+    "FOR578": "Cyber Threat Intelligence",
+    "FOR585": "Smartphone Forensic Analysis In-Depth",
+    "FOR610": "Reverse-Engineering Malware: Malware Analysis Tools and Techniques",
+    "FOR710": "Reverse-Engineering Malware: Advanced Code Analysis",
+    "ICS410": "ICS/SCADA Security Essentials",
+    "ICS515": "ICS Visibility, Detection, and Response",
+}
+
+# Topic-to-course mapping for fallback recommendations
+TOPIC_COURSE_MAP = {
+    "malware": ["FOR610", "FOR710", "SEC504"],
+    "reverse engineer": ["FOR610", "FOR710"],
+    "incident response": ["FOR508", "SEC504"],
+    "forensic": ["FOR508", "FOR500", "FOR408"],
+    "windows forensic": ["FOR500", "FOR408"],
+    "network forensic": ["FOR572"],
+    "threat hunting": ["FOR508", "SEC504", "FOR578"],
+    "threat intelligence": ["FOR578"],
+    "powershell": ["SEC505", "FOR508"],
+    "lateral movement": ["SEC504", "FOR508"],
+    "persistence": ["FOR508", "SEC504"],
+    "privilege escalation": ["SEC504", "SEC560"],
+    "credential": ["SEC504", "SEC560"],
+    "memory forensic": ["FOR508"],
+    "disk forensic": ["FOR500", "FOR408"],
+    "registry": ["FOR500", "FOR408"],
+    "event log": ["FOR508", "SEC555"],
+    "siem": ["SEC555"],
+    "log analysis": ["SEC555", "SEC503"],
+    "network monitor": ["SEC503"],
+    "pcap": ["SEC503", "FOR572"],
+    "cloud": ["SEC510", "SEC540", "FOR509"],
+    "aws": ["SEC510", "SEC540", "FOR509"],
+    "azure": ["SEC510", "FOR509"],
+    "linux": ["SEC506"],
+    "mobile": ["SEC575", "FOR585"],
+    "penetration test": ["SEC560", "SEC565"],
+    "red team": ["SEC565", "SEC599"],
+    "purple team": ["SEC599"],
+    "python": ["SEC573"],
+    "automation": ["SEC573", "SEC540"],
+    "deobfusc": ["FOR610", "SEC504"],
+    "base64": ["FOR610", "SEC504"],
+    "shellcode": ["FOR610", "FOR710"],
+    "ransomware": ["FOR508", "FOR610"],
+    "phishing": ["SEC504", "FOR578"],
+    "c2": ["FOR508", "SEC504", "FOR572"],
+    "command and control": ["FOR508", "SEC504"],
+    "exfiltration": ["FOR508", "FOR572", "SEC503"],
+    "dns": ["FOR572", "SEC503"],
+    "ioc": ["FOR508", "FOR578"],
+    "mitre": ["FOR508", "SEC504", "SEC599"],
+    "att&ck": ["FOR508", "SEC504"],
+    "velociraptor": ["FOR508"],
+    "volatility": ["FOR508"],
+    "scheduled task": ["FOR508", "SEC504"],
+    "service": ["FOR508", "SEC504"],
+    "wmi": ["FOR508", "SEC504"],
+    "process": ["FOR508"],
+    "dll": ["FOR610", "FOR508"],
+}
+
+
+@dataclass
+class RAGResult:
+    """Result from a RAG query."""
+    query: str
+    context: str  # Retrieved relevant text
+    sources: list[str] = field(default_factory=list)  # Source document names
+    course_references: list[str] = field(default_factory=list)  # SANS course IDs
+    confidence: float = 0.0
+    latency_ms: int = 0
+
+
+class SANSRAGService:
+    """Service for querying SANS courseware via Open WebUI RAG pipeline."""
+
+    def __init__(self):
+        self.openwebui_url = settings.OPENWEBUI_URL.rstrip("/")
+        self.api_key = settings.OPENWEBUI_API_KEY
+        self.rag_model = settings.DEFAULT_FAST_MODEL
+        self._available: bool | None = None
+
+    def _headers(self) -> dict:
+        h = {"Content-Type": "application/json"}
+        if self.api_key:
+            h["Authorization"] = f"Bearer {self.api_key}"
+        return h
+
+    async def query(
+        self,
+        question: str,
+        context: str = "",
+        max_tokens: int = 1024,
+    ) -> RAGResult:
+        """Query SANS courseware for relevant context.
+
+        Uses Open WebUI's RAG-enabled chat to retrieve from indexed SANS content.
+        Falls back to topic-based course recommendations if RAG is unavailable.
+        """
+        start = time.monotonic()
+
+        # Try Open WebUI RAG pipeline first
+        try:
+            result = await self._query_openwebui_rag(question, context, max_tokens)
+            result.latency_ms = int((time.monotonic() - start) * 1000)
+
+            # Enrich with course references if not already present
+            if not result.course_references:
+                result.course_references = self._match_courses(question)
+
+            return result
+
+        except Exception as e:
+            logger.warning(f"RAG query failed, using fallback: {e}")
+            # Fallback to topic-based matching
+            courses = self._match_courses(question)
+            return RAGResult(
+                query=question,
+                context="",
+                sources=[],
+                course_references=courses,
+                confidence=0.3 if courses else 0.0,
+                latency_ms=int((time.monotonic() - start) * 1000),
+            )
+
+    async def _query_openwebui_rag(
+        self,
+        question: str,
+        context: str,
+        max_tokens: int,
+    ) -> RAGResult:
+        """Query Open WebUI with RAG context retrieval.
+
+        Open WebUI automatically retrieves from its indexed knowledge base
+        when the model is configured with a knowledge collection.
+        """
+        client = _get_client()
+
+        system_msg = (
+            "You are a SANS cybersecurity knowledge assistant. "
+            "Use your indexed SANS courseware to answer the question. "
+            "Always cite the specific SANS course (e.g., FOR508, SEC504) "
+            "and relevant section when referencing material. "
+            "If the question relates to threat hunting procedures, "
+            "reference the specific SANS methodology or framework."
+        )
+
+        messages = [
+            {"role": "system", "content": system_msg},
+        ]
+
+        if context:
+            messages.append({
+                "role": "user",
+                "content": f"Investigation context:\n{context}\n\nQuestion: {question}",
+            })
+        else:
+            messages.append({"role": "user", "content": question})
+
+        payload = {
+            "model": self.rag_model,
+            "messages": messages,
+            "max_tokens": max_tokens,
+            "temperature": 0.2,
+            "stream": False,
+        }
+
+        resp = await client.post(
+            f"{self.openwebui_url}/v1/chat/completions",
+            json=payload,
+            headers=self._headers(),
+        )
+        resp.raise_for_status()
+        data = resp.json()
+
+        content = ""
+        if data.get("choices"):
+            content = data["choices"][0].get("message", {}).get("content", "")
+
+        # Extract course references from response
+        course_refs = self._extract_course_refs(content)
+        sources = self._extract_sources(data)
+
+        return RAGResult(
+            query=question,
+            context=content,
+            sources=sources,
+            course_references=course_refs,
+            confidence=0.8 if content else 0.0,
+        )
+
+    def _extract_course_refs(self, text: str) -> list[str]:
+        """Extract SANS course references from response text."""
+        refs = set()
+        # Match patterns like SEC504, FOR508, ICS410
+        pattern = r'\b(SEC|FOR|ICS|MGT|AUD|DEV|LEG)\d{3}\b'
+        matches = re.findall(pattern, text, re.IGNORECASE)
+        # Need to get the full match
+        full_pattern = r'\b(?:SEC|FOR|ICS|MGT|AUD|DEV|LEG)\d{3}\b'
+        full_matches = re.findall(full_pattern, text, re.IGNORECASE)
+        for m in full_matches:
+            course_id = m.upper()
+            if course_id in SANS_COURSES:
+                refs.add(f"{course_id}: {SANS_COURSES[course_id]}")
+            else:
+                refs.add(course_id)
+        return sorted(refs)
+
+    def _extract_sources(self, api_response: dict) -> list[str]:
+        """Extract source document references from Open WebUI response metadata."""
+        sources = []
+        # Open WebUI may include source metadata in various formats
+        if "sources" in api_response:
+            for src in api_response["sources"]:
+                if isinstance(src, dict):
+                    sources.append(src.get("name", src.get("title", str(src))))
+                else:
+                    sources.append(str(src))
+        # Check in metadata
+        for choice in api_response.get("choices", []):
+            meta = choice.get("metadata", {})
+            if "sources" in meta:
+                for src in meta["sources"]:
+                    if isinstance(src, dict):
+                        sources.append(src.get("name", str(src)))
+                    else:
+                        sources.append(str(src))
+        return sources[:10]  # Limit
+
+    def _match_courses(self, query: str) -> list[str]:
+        """Match query keywords to SANS courses using topic map."""
+        q = query.lower()
+        matched = set()
+        for topic, courses in TOPIC_COURSE_MAP.items():
+            if topic in q:
+                for course_id in courses:
+                    if course_id in SANS_COURSES:
+                        matched.add(f"{course_id}: {SANS_COURSES[course_id]}")
+        return sorted(matched)[:5]
+
+    async def get_course_context(self, course_id: str) -> str:
+        """Get a brief course description for context injection."""
+        course_id = course_id.upper().split(":")[0].strip()
+        if course_id in SANS_COURSES:
+            return f"{course_id}: {SANS_COURSES[course_id]}"
+        return ""
+
+    async def enrich_prompt(
+        self,
+        query: str,
+        investigation_context: str = "",
+    ) -> str:
+        """Generate SANS-enriched context to inject into agent prompts.
+
+        Returns a context string with relevant SANS references.
+        """
+        result = await self.query(query, context=investigation_context, max_tokens=512)
+
+        parts = []
+        if result.context:
+            parts.append(f"SANS Reference Context:\n{result.context}")
+        if result.course_references:
+            parts.append(f"Relevant SANS Courses: {', '.join(result.course_references)}")
+        if result.sources:
+            parts.append(f"Sources: {', '.join(result.sources[:5])}")
+
+        return "\n".join(parts) if parts else ""
+
+    async def health_check(self) -> dict:
+        """Check RAG service availability."""
+        try:
+            client = _get_client()
+            resp = await client.get(
+                f"{self.openwebui_url}/v1/models",
+                headers=self._headers(),
+                timeout=5,
+            )
+            available = resp.status_code == 200
+            self._available = available
+            return {
+                "available": available,
+                "url": self.openwebui_url,
+                "model": self.rag_model,
+            }
+        except Exception as e:
+            self._available = False
+            return {
+                "available": False,
+                "url": self.openwebui_url,
+                "error": str(e),
+            }
+
+
+# Singleton
+sans_rag = SANSRAGService()
--- a/backend/app/services/scanner.py
+++ b/backend/app/services/scanner.py
@@ -0,0 +1,233 @@
+"""AUP Keyword Scanner — searches dataset rows, hunts, annotations, and
+messages for keyword matches.
+
+Scanning is done in Python (not SQL LIKE on JSON columns) for portability
+across SQLite / PostgreSQL and to provide per-cell match context.
+"""
+
+import logging
+import re
+from dataclasses import dataclass, field
+
+from sqlalchemy import select, func
+from sqlalchemy.ext.asyncio import AsyncSession
+
+from app.db.models import (
+    KeywordTheme,
+    Keyword,
+    DatasetRow,
+    Dataset,
+    Hunt,
+    Annotation,
+    Message,
+    Conversation,
+)
+
+logger = logging.getLogger(__name__)
+
+BATCH_SIZE = 500
+
+
+@dataclass
+class ScanHit:
+    theme_name: str
+    theme_color: str
+    keyword: str
+    source_type: str       # dataset_row | hunt | annotation | message
+    source_id: str | int
+    field: str
+    matched_value: str
+    row_index: int | None = None
+    dataset_name: str | None = None
+
+
+@dataclass
+class ScanResult:
+    total_hits: int = 0
+    hits: list[ScanHit] = field(default_factory=list)
+    themes_scanned: int = 0
+    keywords_scanned: int = 0
+    rows_scanned: int = 0
+
+
+class KeywordScanner:
+    """Scans multiple data sources for keyword/regex matches."""
+
+    def __init__(self, db: AsyncSession):
+        self.db = db
+
+    # ── Public API ────────────────────────────────────────────────────
+
+    async def scan(
+        self,
+        dataset_ids: list[str] | None = None,
+        theme_ids: list[str] | None = None,
+        scan_hunts: bool = True,
+        scan_annotations: bool = True,
+        scan_messages: bool = True,
+    ) -> dict:
+        """Run a full AUP scan and return dict matching ScanResponse."""
+        # Load themes + keywords
+        themes = await self._load_themes(theme_ids)
+        if not themes:
+            return ScanResult().__dict__
+
+        # Pre-compile patterns per theme
+        patterns = self._compile_patterns(themes)
+        result = ScanResult(
+            themes_scanned=len(themes),
+            keywords_scanned=sum(len(kws) for kws in patterns.values()),
+        )
+
+        # Scan dataset rows
+        await self._scan_datasets(patterns, result, dataset_ids)
+
+        # Scan hunts
+        if scan_hunts:
+            await self._scan_hunts(patterns, result)
+
+        # Scan annotations
+        if scan_annotations:
+            await self._scan_annotations(patterns, result)
+
+        # Scan messages
+        if scan_messages:
+            await self._scan_messages(patterns, result)
+
+        result.total_hits = len(result.hits)
+        return {
+            "total_hits": result.total_hits,
+            "hits": [h.__dict__ for h in result.hits],
+            "themes_scanned": result.themes_scanned,
+            "keywords_scanned": result.keywords_scanned,
+            "rows_scanned": result.rows_scanned,
+        }
+
+    # ── Internal ──────────────────────────────────────────────────────
+
+    async def _load_themes(self, theme_ids: list[str] | None) -> list[KeywordTheme]:
+        q = select(KeywordTheme).where(KeywordTheme.enabled == True)  # noqa: E712
+        if theme_ids:
+            q = q.where(KeywordTheme.id.in_(theme_ids))
+        result = await self.db.execute(q)
+        return list(result.scalars().all())
+
+    def _compile_patterns(
+        self, themes: list[KeywordTheme]
+    ) -> dict[tuple[str, str, str], list[tuple[str, re.Pattern]]]:
+        """Returns {(theme_id, theme_name, theme_color): [(keyword_value, compiled_pattern), ...]}"""
+        patterns: dict[tuple[str, str, str], list[tuple[str, re.Pattern]]] = {}
+        for theme in themes:
+            key = (theme.id, theme.name, theme.color)
+            compiled = []
+            for kw in theme.keywords:
+                try:
+                    if kw.is_regex:
+                        pat = re.compile(kw.value, re.IGNORECASE)
+                    else:
+                        pat = re.compile(re.escape(kw.value), re.IGNORECASE)
+                    compiled.append((kw.value, pat))
+                except re.error:
+                    logger.warning("Invalid regex pattern '%s' in theme '%s', skipping",
+                                   kw.value, theme.name)
+            patterns[key] = compiled
+        return patterns
+
+    def _match_text(
+        self,
+        text: str,
+        patterns: dict,
+        source_type: str,
+        source_id: str | int,
+        field_name: str,
+        hits: list[ScanHit],
+        row_index: int | None = None,
+        dataset_name: str | None = None,
+    ) -> None:
+        """Check text against all compiled patterns, append hits."""
+        if not text:
+            return
+        for (theme_id, theme_name, theme_color), keyword_patterns in patterns.items():
+            for kw_value, pat in keyword_patterns:
+                if pat.search(text):
+                    # Truncate matched_value for display
+                    matched_preview = text[:200] + ("…" if len(text) > 200 else "")
+                    hits.append(ScanHit(
+                        theme_name=theme_name,
+                        theme_color=theme_color,
+                        keyword=kw_value,
+                        source_type=source_type,
+                        source_id=source_id,
+                        field=field_name,
+                        matched_value=matched_preview,
+                        row_index=row_index,
+                        dataset_name=dataset_name,
+                    ))
+
+    async def _scan_datasets(
+        self, patterns: dict, result: ScanResult, dataset_ids: list[str] | None
+    ) -> None:
+        """Scan dataset rows in batches."""
+        # Build dataset name lookup
+        ds_q = select(Dataset.id, Dataset.name)
+        if dataset_ids:
+            ds_q = ds_q.where(Dataset.id.in_(dataset_ids))
+        ds_result = await self.db.execute(ds_q)
+        ds_map = {r[0]: r[1] for r in ds_result.fetchall()}
+
+        if not ds_map:
+            return
+
+        # Iterate rows in batches
+        offset = 0
+        row_q_base = select(DatasetRow).where(
+            DatasetRow.dataset_id.in_(list(ds_map.keys()))
+        ).order_by(DatasetRow.id)
+
+        while True:
+            rows_result = await self.db.execute(
+                row_q_base.offset(offset).limit(BATCH_SIZE)
+            )
+            rows = rows_result.scalars().all()
+            if not rows:
+                break
+
+            for row in rows:
+                result.rows_scanned += 1
+                data = row.data or {}
+                for col_name, cell_value in data.items():
+                    if cell_value is None:
+                        continue
+                    text = str(cell_value)
+                    self._match_text(
+                        text, patterns, "dataset_row", row.id,
+                        col_name, result.hits,
+                        row_index=row.row_index,
+                        dataset_name=ds_map.get(row.dataset_id),
+                    )
+
+            offset += BATCH_SIZE
+            if len(rows) < BATCH_SIZE:
+                break
+
+    async def _scan_hunts(self, patterns: dict, result: ScanResult) -> None:
+        """Scan hunt names and descriptions."""
+        hunts_result = await self.db.execute(select(Hunt))
+        for hunt in hunts_result.scalars().all():
+            self._match_text(hunt.name, patterns, "hunt", hunt.id, "name", result.hits)
+            if hunt.description:
+                self._match_text(hunt.description, patterns, "hunt", hunt.id, "description", result.hits)
+
+    async def _scan_annotations(self, patterns: dict, result: ScanResult) -> None:
+        """Scan annotation text."""
+        ann_result = await self.db.execute(select(Annotation))
+        for ann in ann_result.scalars().all():
+            self._match_text(ann.text, patterns, "annotation", ann.id, "text", result.hits)
+
+    async def _scan_messages(self, patterns: dict, result: ScanResult) -> None:
+        """Scan conversation messages (user messages only)."""
+        msg_result = await self.db.execute(
+            select(Message).where(Message.role == "user")
+        )
+        for msg in msg_result.scalars().all():
+            self._match_text(msg.content, patterns, "message", msg.id, "content", result.hits)
--- a/backend/app/services/triage.py
+++ b/backend/app/services/triage.py
@@ -0,0 +1,170 @@
+"""Auto-triage service - fast LLM analysis of dataset batches via Roadrunner."""
+
+from __future__ import annotations
+
+import json
+import logging
+import re
+
+import httpx
+from sqlalchemy import func, select
+
+from app.config import settings
+from app.db.engine import async_session
+from app.db.models import Dataset, DatasetRow, TriageResult
+
+logger = logging.getLogger(__name__)
+
+DEFAULT_FAST_MODEL = "qwen2.5-coder:7b-instruct-q4_K_M"
+ROADRUNNER_URL = f"{settings.roadrunner_url}/api/generate"
+
+ARTIFACT_FOCUS = {
+    "Windows.System.Pslist": "Look for: suspicious parent-child, LOLBins, unsigned, injection indicators, abnormal paths.",
+    "Windows.Network.Netstat": "Look for: C2 beaconing, unusual ports, connections to rare IPs, non-browser high-port listeners.",
+    "Windows.System.Services": "Look for: services in temp dirs, misspelled names, unsigned ServiceDll, unusual start modes.",
+    "Windows.Forensics.Prefetch": "Look for: recon tools, lateral movement tools, rarely-run executables with high run counts.",
+    "Windows.EventLogs.EvtxHunter": "Look for: logon type 10/3 anomalies, service installs, PowerShell script blocks, clearing.",
+    "Windows.Sys.Autoruns": "Look for: recently added entries, entries in temp/user dirs, encoded commands, suspicious DLLs.",
+    "Windows.Registry.Finder": "Look for: run keys, image file execution options, hidden services, encoded payloads.",
+    "Windows.Search.FileFinder": "Look for: files in unusual locations, recently modified system files, known tool names.",
+}
+
+
+def _parse_llm_response(text: str) -> dict:
+    text = text.strip()
+    fence = re.search(r"`(?:json)?\s*\n?(.*?)\n?\s*`", text, re.DOTALL)
+    if fence:
+        text = fence.group(1).strip()
+    try:
+        return json.loads(text)
+    except json.JSONDecodeError:
+        brace = text.find("{")
+        bracket = text.rfind("}")
+        if brace != -1 and bracket != -1 and bracket > brace:
+            try:
+                return json.loads(text[brace : bracket + 1])
+            except json.JSONDecodeError:
+                pass
+    return {"raw_response": text[:3000]}
+
+
+async def triage_dataset(dataset_id: str) -> None:
+    logger.info("Starting triage for dataset %s", dataset_id)
+
+    async with async_session() as db:
+        ds_result = await db.execute(
+            select(Dataset).where(Dataset.id == dataset_id)
+        )
+        dataset = ds_result.scalar_one_or_none()
+        if not dataset:
+            logger.error("Dataset %s not found", dataset_id)
+            return
+
+        artifact_type = getattr(dataset, "artifact_type", None) or "Unknown"
+        focus = ARTIFACT_FOCUS.get(artifact_type, "Analyze for any suspicious indicators.")
+
+        count_result = await db.execute(
+            select(func.count()).where(DatasetRow.dataset_id == dataset_id)
+        )
+        total_rows = count_result.scalar() or 0
+
+        batch_size = settings.TRIAGE_BATCH_SIZE
+        suspicious_count = 0
+        offset = 0
+
+        while offset < total_rows:
+            if suspicious_count >= settings.TRIAGE_MAX_SUSPICIOUS_ROWS:
+                logger.info("Reached suspicious row cap for dataset %s", dataset_id)
+                break
+
+            rows_result = await db.execute(
+                select(DatasetRow)
+                .where(DatasetRow.dataset_id == dataset_id)
+                .order_by(DatasetRow.row_number)
+                .offset(offset)
+                .limit(batch_size)
+            )
+            rows = rows_result.scalars().all()
+            if not rows:
+                break
+
+            batch_data = []
+            for r in rows:
+                data = r.normalized_data or r.data
+                compact = {k: str(v)[:200] for k, v in data.items() if v}
+                batch_data.append(compact)
+
+            system_prompt = f"""You are a cybersecurity triage analyst. Analyze this batch of {artifact_type} forensic data.
+{focus}
+
+Return JSON with:
+- risk_score: 0.0 (benign) to 10.0 (critical threat)
+- verdict: "clean", "suspicious", "malicious", or "inconclusive"
+- findings: list of key observations
+- suspicious_indicators: list of specific IOCs or anomalies
+- mitre_techniques: list of MITRE ATT&CK IDs if applicable
+
+Be precise. Only flag genuinely suspicious items. Respond with valid JSON only."""
+
+            prompt = f"Rows {offset+1}-{offset+len(rows)} of {total_rows}:\n{json.dumps(batch_data, default=str)[:6000]}"
+
+            try:
+                async with httpx.AsyncClient(timeout=120.0) as client:
+                    resp = await client.post(
+                        ROADRUNNER_URL,
+                        json={
+                            "model": DEFAULT_FAST_MODEL,
+                            "prompt": prompt,
+                            "system": system_prompt,
+                            "stream": False,
+                            "options": {"temperature": 0.2, "num_predict": 2048},
+                        },
+                    )
+                    resp.raise_for_status()
+                    result = resp.json()
+                    llm_text = result.get("response", "")
+
+                parsed = _parse_llm_response(llm_text)
+                risk = float(parsed.get("risk_score", 0.0))
+
+                triage = TriageResult(
+                    dataset_id=dataset_id,
+                    row_start=offset,
+                    row_end=offset + len(rows) - 1,
+                    risk_score=risk,
+                    verdict=parsed.get("verdict", "inconclusive"),
+                    findings=parsed.get("findings", []),
+                    suspicious_indicators=parsed.get("suspicious_indicators", []),
+                    mitre_techniques=parsed.get("mitre_techniques", []),
+                    model_used=DEFAULT_FAST_MODEL,
+                    node_used="roadrunner",
+                )
+                db.add(triage)
+                await db.commit()
+
+                if risk >= settings.TRIAGE_ESCALATION_THRESHOLD:
+                    suspicious_count += len(rows)
+
+                logger.debug(
+                    "Triage batch %d-%d: risk=%.1f verdict=%s",
+                    offset, offset + len(rows) - 1, risk, triage.verdict,
+                )
+
+            except Exception as e:
+                logger.error("Triage batch %d failed: %s", offset, e)
+                triage = TriageResult(
+                    dataset_id=dataset_id,
+                    row_start=offset,
+                    row_end=offset + len(rows) - 1,
+                    risk_score=0.0,
+                    verdict="error",
+                    findings=[f"Error: {e}"],
+                    model_used=DEFAULT_FAST_MODEL,
+                    node_used="roadrunner",
+                )
+                db.add(triage)
+                await db.commit()
+
+            offset += batch_size
+
+    logger.info("Triage complete for dataset %s", dataset_id)
--- a/backend/config.py
+++ b/backend/config.py
--- a/backend/pyproject.toml
+++ b/backend/pyproject.toml
@@ -0,0 +1,12 @@
+[tool.pytest.ini_options]
+testpaths = ["tests"]
+asyncio_mode = "auto"
+filterwarnings = ["ignore::DeprecationWarning"]
+addopts = "-v --tb=short"
+
+[tool.coverage.run]
+source = ["app"]
+omit = ["app/agent/*"]
+
+[tool.coverage.report]
+show_missing = true
--- a/backend/requirements-full.txt
+++ b/backend/requirements-full.txt
--- a/backend/requirements.txt
+++ b/backend/requirements.txt
@@ -1,9 +1,29 @@
-flask==3.0.0
-flask-cors==4.0.0
-flask-sqlalchemy==3.1.1
-flask-jwt-extended==4.6.0
-psycopg2-binary==2.9.9
-python-dotenv==1.0.0
-requests==2.31.0
-werkzeug==3.0.1
-bcrypt==4.1.2
+# ── Core ──────────────────────────────────────
+fastapi>=0.104.1
+uvicorn[standard]>=0.24.0
+pydantic>=2.5.0
+pydantic-settings>=2.1.0
+
+# ── Database ──────────────────────────────────
+sqlalchemy>=2.0.23
+alembic>=1.13.0
+aiosqlite>=0.19.0
+# asyncpg>=0.29.0  # uncomment for PostgreSQL in production
+
+# ── HTTP / LLM ───────────────────────────────
+httpx>=0.25.1
+
+# ── CSV / File handling ──────────────────────
+chardet>=5.2.0
+python-multipart>=0.0.6
+
+# ── Auth / Security ──────────────────────────
+python-jose[cryptography]>=3.3.0
+passlib[bcrypt]>=1.7.4
+bcrypt>=4.0.0
+
+# ── Development / Testing ────────────────────
+pytest>=7.4.3
+pytest-asyncio>=0.21.1
+coverage>=7.3.0
+
--- a/backend/run.py
+++ b/backend/run.py
@@ -0,0 +1,17 @@
+"""Entry point for backend server."""
+
+import logging
+import uvicorn
+
+logging.basicConfig(
+    level=logging.INFO,
+    format="%(asctime)s - %(name)s - %(levelname)s - %(message)s",
+)
+
+if __name__ == "__main__":
+    uvicorn.run(
+        "app.main:app",
+        host="0.0.0.0",
+        port=8000,
+        reload=False,
+    )
--- a/backend/scan_cols.py
+++ b/backend/scan_cols.py
@@ -0,0 +1,8 @@
+import json, urllib.request
+url = "http://localhost:8000/api/datasets?skip=0&limit=20&hunt_id=fd8ba3fb45de4d65bea072f73d80544d"
+data = json.loads(urllib.request.urlopen(url).read())
+for d in data["datasets"]:
+    ioc = list((d["ioc_columns"] or {}).items())
+    norm = d.get("normalized_columns") or {}
+    hc = {k: v for k, v in norm.items() if v in ("hostname", "fqdn", "username", "src_ip", "dst_ip", "ip_address", "os")}
+    print(d["name"], "|", d["row_count"], "|", ioc, "|", hc)
--- a/backend/scan_rows.py
+++ b/backend/scan_rows.py
@@ -0,0 +1,23 @@
+import json, urllib.request
+
+def get(path):
+    return json.loads(urllib.request.urlopen("http://localhost:8000" + path).read())
+
+# Check ip_to_hostname_mapping
+ds_list = get("/api/datasets?skip=0&limit=20&hunt_id=fd8ba3fb45de4d65bea072f73d80544d")
+for d in ds_list["datasets"]:
+    if d["name"] == "ip_to_hostname_mapping":
+        rows = get(f"/api/datasets/{d['id']}/rows?offset=0&limit=5")
+        print("=== ip_to_hostname_mapping ===")
+        for r in rows["rows"]:
+            print(r)
+    if d["name"] == "Netstat":
+        rows = get(f"/api/datasets/{d['id']}/rows?offset=0&limit=3")
+        print("=== Netstat ===")
+        for r in rows["rows"]:
+            print(r)
+    if d["name"] == "netstat_enrich2":
+        rows = get(f"/api/datasets/{d['id']}/rows?offset=0&limit=3")
+        print("=== netstat_enrich2 ===")
+        for r in rows["rows"]:
+            print(r)
--- a/backend/tests/init.py
+++ b/backend/tests/init.py
@@ -0,0 +1 @@
+# Tests package
--- a/backend/tests/conftest.py
+++ b/backend/tests/conftest.py
@@ -0,0 +1,108 @@
+"""Shared pytest fixtures for ThreatHunt tests.
+
+Provides:
+- Async test database (in-memory SQLite)
+- Test client (httpx AsyncClient on the FastAPI app)
+- Factory functions for creating test hunts, datasets, etc.
+"""
+
+import asyncio
+import os
+from typing import AsyncGenerator
+
+import pytest
+import pytest_asyncio
+from httpx import ASGITransport, AsyncClient
+from sqlalchemy.ext.asyncio import AsyncSession, create_async_engine
+from sqlalchemy.orm import sessionmaker
+
+# Force test database
+os.environ["TH_DATABASE_URL"] = "sqlite+aiosqlite:///:memory:"
+os.environ["TH_JWT_SECRET"] = "test-secret-key-for-tests"
+
+from app.db.engine import Base, get_db
+from app.main import app
+
+
+# ── Database fixtures ─────────────────────────────────────────────────
+
+@pytest_asyncio.fixture(scope="session")
+def event_loop():
+    """Create an event loop for the test session."""
+    loop = asyncio.new_event_loop()
+    yield loop
+    loop.close()
+
+
+@pytest_asyncio.fixture(scope="session")
+async def test_engine():
+    """Create test database engine."""
+    engine = create_async_engine(
+        "sqlite+aiosqlite:///:memory:",
+        echo=False,
+    )
+    async with engine.begin() as conn:
+        await conn.run_sync(Base.metadata.create_all)
+    yield engine
+    await engine.dispose()
+
+
+@pytest_asyncio.fixture
+async def db_session(test_engine) -> AsyncGenerator[AsyncSession, None]:
+    """Create a fresh database session for each test."""
+    async_session = sessionmaker(
+        test_engine, class_=AsyncSession, expire_on_commit=False
+    )
+    async with async_session() as session:
+        yield session
+        await session.rollback()
+
+
+@pytest_asyncio.fixture
+async def client(db_session) -> AsyncGenerator[AsyncClient, None]:
+    """Create an async test client with overridden DB dependency."""
+
+    async def _override_get_db():
+        yield db_session
+
+    app.dependency_overrides[get_db] = _override_get_db
+
+    transport = ASGITransport(app=app)
+    async with AsyncClient(transport=transport, base_url="http://test") as ac:
+        yield ac
+
+    app.dependency_overrides.clear()
+
+
+# ── Factory helpers ───────────────────────────────────────────────────
+
+def make_csv_bytes(
+    columns: list[str],
+    rows: list[list[str]],
+    delimiter: str = ",",
+) -> bytes:
+    """Create CSV content as bytes for upload tests."""
+    lines = [delimiter.join(columns)]
+    for row in rows:
+        lines.append(delimiter.join(str(v) for v in row))
+    return "\n".join(lines).encode("utf-8")
+
+
+SAMPLE_CSV = make_csv_bytes(
+    ["timestamp", "hostname", "src_ip", "dst_ip", "process_name", "command_line"],
+    [
+        ["2025-01-15T10:30:00Z", "DESKTOP-ABC", "192.168.1.100", "10.0.0.50", "cmd.exe", "cmd /c whoami"],
+        ["2025-01-15T10:31:00Z", "DESKTOP-ABC", "192.168.1.100", "10.0.0.51", "powershell.exe", "powershell -enc SGVsbG8="],
+        ["2025-01-15T10:32:00Z", "DESKTOP-XYZ", "192.168.1.101", "8.8.8.8", "chrome.exe", "chrome.exe --no-sandbox"],
+        ["2025-01-15T10:33:00Z", "DESKTOP-ABC", "192.168.1.100", "203.0.113.5", "svchost.exe", "svchost.exe -k netsvcs"],
+        ["2025-01-15T10:34:00Z", "SERVER-DC01", "10.0.0.1", "10.0.0.50", "lsass.exe", "lsass.exe"],
+    ],
+)
+
+SAMPLE_HASH_CSV = make_csv_bytes(
+    ["filename", "md5", "sha256", "size"],
+    [
+        ["malware.exe", "d41d8cd98f00b204e9800998ecf8427e", "e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855", "1024"],
+        ["benign.dll", "098f6bcd4621d373cade4e832627b4f6", "5e884898da28047151d0e56f8dc6292773603d0d6aabbdd62a11ef721d1542d8", "2048"],
+    ],
+)
--- a/backend/tests/test_agents.py
+++ b/backend/tests/test_agents.py
@@ -0,0 +1,117 @@
+"""Tests for model registry and task router."""
+
+import pytest
+from app.agents.registry import (
+    ModelRegistry, ModelEntry, Capability, Tier, Node,
+    registry, ROADRUNNER_MODELS, WILE_MODELS,
+)
+from app.agents.router import TaskRouter, TaskType, task_router
+
+
+class TestModelRegistry:
+    """Tests for the model registry."""
+
+    def test_registry_has_models(self):
+        assert len(registry.models) > 0
+        assert len(ROADRUNNER_MODELS) > 0
+        assert len(WILE_MODELS) > 0
+
+    def test_find_by_capability(self):
+        chat_models = registry.find(capability=Capability.CHAT)
+        assert len(chat_models) > 0
+        for m in chat_models:
+            assert Capability.CHAT in m.capabilities
+
+    def test_find_code_models(self):
+        code_models = registry.find(capability=Capability.CODE)
+        assert len(code_models) > 0
+
+    def test_find_vision_models(self):
+        vision_models = registry.find(capability=Capability.VISION)
+        assert len(vision_models) > 0
+
+    def test_find_embedding_models(self):
+        embed_models = registry.find(capability=Capability.EMBEDDING)
+        assert len(embed_models) > 0
+
+    def test_find_by_node(self):
+        wile_models = registry.find(node=Node.WILE)
+        rr_models = registry.find(node=Node.ROADRUNNER)
+        assert len(wile_models) > 0
+        assert len(rr_models) > 0
+
+    def test_find_heavy_models(self):
+        heavy = registry.find(tier=Tier.HEAVY)
+        assert len(heavy) > 0
+        for m in heavy:
+            assert m.tier == Tier.HEAVY
+
+    def test_get_best(self):
+        best = registry.get_best(Capability.CHAT, prefer_tier=Tier.FAST)
+        assert best is not None
+        assert Capability.CHAT in best.capabilities
+
+    def test_get_best_vision_on_roadrunner(self):
+        best = registry.get_best(Capability.VISION, prefer_node=Node.ROADRUNNER)
+        assert best is not None
+        assert Capability.VISION in best.capabilities
+
+    def test_to_dict(self):
+        result = registry.to_dict()
+        assert isinstance(result, list)
+        assert len(result) > 0
+        assert "name" in result[0]
+        assert "capabilities" in result[0]
+
+
+class TestTaskRouter:
+    """Tests for the task router."""
+
+    def test_route_quick_chat(self):
+        decision = task_router.route(TaskType.QUICK_CHAT)
+        assert decision.model
+        assert decision.node
+
+    def test_route_deep_analysis(self):
+        decision = task_router.route(TaskType.DEEP_ANALYSIS)
+        assert decision.model
+        # Deep should route to heavy model
+        assert decision.task_type == TaskType.DEEP_ANALYSIS
+
+    def test_route_code_analysis(self):
+        decision = task_router.route(TaskType.CODE_ANALYSIS)
+        assert decision.model
+        assert "coder" in decision.model.lower() or "code" in decision.model.lower()
+
+    def test_route_vision(self):
+        decision = task_router.route(TaskType.VISION)
+        assert decision.model
+        assert decision.node == Node.ROADRUNNER
+
+    def test_route_with_model_override(self):
+        decision = task_router.route(TaskType.QUICK_CHAT, model_override="llama3.1:latest")
+        assert decision.model == "llama3.1:latest"
+
+    def test_route_unknown_model_to_cluster(self):
+        decision = task_router.route(TaskType.QUICK_CHAT, model_override="nonexistent-model:99b")
+        assert decision.node == Node.CLUSTER
+        assert decision.provider_type == "openwebui"
+
+    def test_classify_code_task(self):
+        assert task_router.classify_task("deobfuscate this powershell script") == TaskType.CODE_ANALYSIS
+        assert task_router.classify_task("decode this base64 payload") == TaskType.CODE_ANALYSIS
+
+    def test_classify_deep_task(self):
+        assert task_router.classify_task("detailed forensic analysis of this process tree") == TaskType.DEEP_ANALYSIS
+
+    def test_classify_vision_task(self):
+        assert task_router.classify_task("analyze this screenshot", has_image=True) == TaskType.VISION
+
+    def test_classify_quick_task(self):
+        assert task_router.classify_task("what does this process do?") == TaskType.QUICK_CHAT
+
+    def test_debate_model_overrides(self):
+        for task_type in [TaskType.DEBATE_PLANNER, TaskType.DEBATE_CRITIC, TaskType.DEBATE_PRAGMATIST, TaskType.DEBATE_JUDGE]:
+            decision = task_router.route(task_type)
+            assert decision.model
+            assert decision.task_type == task_type
--- a/backend/tests/test_api.py
+++ b/backend/tests/test_api.py
@@ -0,0 +1,189 @@
+"""Tests for API endpoints — datasets, hunts, annotations."""
+
+import io
+import pytest
+from tests.conftest import SAMPLE_CSV
+
+
+@pytest.mark.asyncio
+class TestHealthEndpoints:
+    """Test basic health endpoints."""
+
+    async def test_root(self, client):
+        resp = await client.get("/")
+        assert resp.status_code == 200
+        data = resp.json()
+        assert data["service"] == "ThreatHunt API"
+        assert data["status"] == "running"
+
+    async def test_openapi_docs(self, client):
+        resp = await client.get("/openapi.json")
+        assert resp.status_code == 200
+        data = resp.json()
+        assert "/api/agent/assist" in data["paths"]
+        assert "/api/datasets/upload" in data["paths"]
+        assert "/api/hunts" in data["paths"]
+
+
+@pytest.mark.asyncio
+class TestHuntEndpoints:
+    """Test hunt CRUD operations."""
+
+    async def test_create_hunt(self, client):
+        resp = await client.post("/api/hunts", json={
+            "name": "Test Hunt",
+            "description": "Testing hunt creation",
+        })
+        assert resp.status_code == 200
+        data = resp.json()
+        assert data["name"] == "Test Hunt"
+        assert data["status"] == "active"
+        assert data["id"]
+
+    async def test_list_hunts(self, client):
+        # Create a hunt first
+        await client.post("/api/hunts", json={"name": "Hunt 1"})
+        await client.post("/api/hunts", json={"name": "Hunt 2"})
+
+        resp = await client.get("/api/hunts")
+        assert resp.status_code == 200
+        data = resp.json()
+        assert data["total"] >= 2
+
+    async def test_get_hunt(self, client):
+        # Create
+        create_resp = await client.post("/api/hunts", json={"name": "Specific Hunt"})
+        hunt_id = create_resp.json()["id"]
+
+        # Get
+        resp = await client.get(f"/api/hunts/{hunt_id}")
+        assert resp.status_code == 200
+        assert resp.json()["name"] == "Specific Hunt"
+
+    async def test_update_hunt(self, client):
+        create_resp = await client.post("/api/hunts", json={"name": "Original"})
+        hunt_id = create_resp.json()["id"]
+
+        resp = await client.put(f"/api/hunts/{hunt_id}", json={
+            "name": "Updated",
+            "status": "closed",
+        })
+        assert resp.status_code == 200
+        assert resp.json()["name"] == "Updated"
+        assert resp.json()["status"] == "closed"
+
+    async def test_get_nonexistent_hunt(self, client):
+        resp = await client.get("/api/hunts/nonexistent-id")
+        assert resp.status_code == 404
+
+
+@pytest.mark.asyncio
+class TestDatasetEndpoints:
+    """Test dataset upload and retrieval."""
+
+    async def test_upload_csv(self, client):
+        files = {"file": ("test.csv", io.BytesIO(SAMPLE_CSV), "text/csv")}
+        resp = await client.post(
+            "/api/datasets/upload",
+            files=files,
+            params={"name": "Test Dataset"},
+        )
+        assert resp.status_code == 200
+        data = resp.json()
+        assert data["name"] == "Test Dataset"
+        assert data["row_count"] == 5
+        assert "timestamp" in data["columns"]
+
+    async def test_upload_invalid_extension(self, client):
+        files = {"file": ("bad.exe", io.BytesIO(b"not csv"), "application/octet-stream")}
+        resp = await client.post("/api/datasets/upload", files=files)
+        assert resp.status_code == 400
+
+    async def test_upload_empty_file(self, client):
+        files = {"file": ("empty.csv", io.BytesIO(b""), "text/csv")}
+        resp = await client.post("/api/datasets/upload", files=files)
+        assert resp.status_code == 400
+
+    async def test_list_datasets(self, client):
+        # Upload first
+        files = {"file": ("test.csv", io.BytesIO(SAMPLE_CSV), "text/csv")}
+        await client.post("/api/datasets/upload", files=files, params={"name": "DS1"})
+
+        resp = await client.get("/api/datasets")
+        assert resp.status_code == 200
+        data = resp.json()
+        assert data["total"] >= 1
+
+    async def test_get_dataset_rows(self, client):
+        files = {"file": ("test.csv", io.BytesIO(SAMPLE_CSV), "text/csv")}
+        upload_resp = await client.post("/api/datasets/upload", files=files, params={"name": "RowTest"})
+        ds_id = upload_resp.json()["id"]
+
+        resp = await client.get(f"/api/datasets/{ds_id}/rows")
+        assert resp.status_code == 200
+        data = resp.json()
+        assert data["total"] == 5
+        assert len(data["rows"]) == 5
+
+
+@pytest.mark.asyncio
+class TestAnnotationEndpoints:
+    """Test annotation CRUD."""
+
+    async def test_create_annotation(self, client):
+        resp = await client.post("/api/annotations", json={
+            "text": "Suspicious process detected",
+            "severity": "high",
+            "tag": "suspicious",
+        })
+        assert resp.status_code == 200
+        data = resp.json()
+        assert data["text"] == "Suspicious process detected"
+        assert data["severity"] == "high"
+
+    async def test_list_annotations(self, client):
+        await client.post("/api/annotations", json={"text": "Ann 1", "severity": "info"})
+        await client.post("/api/annotations", json={"text": "Ann 2", "severity": "critical"})
+
+        resp = await client.get("/api/annotations")
+        assert resp.status_code == 200
+        assert resp.json()["total"] >= 2
+
+    async def test_filter_annotations_by_severity(self, client):
+        await client.post("/api/annotations", json={"text": "Critical finding", "severity": "critical"})
+
+        resp = await client.get("/api/annotations", params={"severity": "critical"})
+        assert resp.status_code == 200
+        for ann in resp.json()["annotations"]:
+            assert ann["severity"] == "critical"
+
+
+@pytest.mark.asyncio
+class TestHypothesisEndpoints:
+    """Test hypothesis CRUD."""
+
+    async def test_create_hypothesis(self, client):
+        resp = await client.post("/api/hypotheses", json={
+            "title": "Living off the Land",
+            "description": "Attacker using LOLBins for execution",
+            "mitre_technique": "T1059",
+            "status": "active",
+        })
+        assert resp.status_code == 200
+        data = resp.json()
+        assert data["title"] == "Living off the Land"
+        assert data["mitre_technique"] == "T1059"
+
+    async def test_update_hypothesis_status(self, client):
+        create_resp = await client.post("/api/hypotheses", json={
+            "title": "Test Hyp",
+            "status": "draft",
+        })
+        hyp_id = create_resp.json()["id"]
+
+        resp = await client.put(f"/api/hypotheses/{hyp_id}", json={
+            "status": "confirmed",
+            "evidence_notes": "Confirmed via process tree analysis",
+        })
+        assert resp.status_code == 200
+        assert resp.json()["status"] == "confirmed"
--- a/backend/tests/test_csv_parser.py
+++ b/backend/tests/test_csv_parser.py
@@ -0,0 +1,104 @@
+"""Tests for CSV parser and normalizer services."""
+
+import pytest
+from app.services.csv_parser import parse_csv_bytes, detect_encoding, detect_delimiter, infer_column_types
+from app.services.normalizer import normalize_columns, normalize_rows, detect_ioc_columns, detect_time_range
+from tests.conftest import SAMPLE_CSV, SAMPLE_HASH_CSV, make_csv_bytes
+
+
+class TestCSVParser:
+    """Tests for CSV parsing."""
+
+    def test_parse_csv_basic(self):
+        rows, meta = parse_csv_bytes(SAMPLE_CSV)
+        assert len(rows) == 5
+        assert "timestamp" in meta["columns"]
+        assert "hostname" in meta["columns"]
+        assert meta["encoding"] is not None
+        assert meta["delimiter"] == ","
+
+    def test_parse_csv_columns(self):
+        rows, meta = parse_csv_bytes(SAMPLE_CSV)
+        assert meta["columns"] == ["timestamp", "hostname", "src_ip", "dst_ip", "process_name", "command_line"]
+
+    def test_parse_csv_row_data(self):
+        rows, meta = parse_csv_bytes(SAMPLE_CSV)
+        assert rows[0]["hostname"] == "DESKTOP-ABC"
+        assert rows[0]["src_ip"] == "192.168.1.100"
+        assert rows[2]["process_name"] == "chrome.exe"
+
+    def test_parse_csv_hash_file(self):
+        rows, meta = parse_csv_bytes(SAMPLE_HASH_CSV)
+        assert len(rows) == 2
+        assert "md5" in meta["columns"]
+        assert "sha256" in meta["columns"]
+
+    def test_parse_tsv(self):
+        tsv_data = make_csv_bytes(
+            ["host", "ip", "port"],
+            [["server1", "10.0.0.1", "443"], ["server2", "10.0.0.2", "80"]],
+            delimiter="\t",
+        )
+        rows, meta = parse_csv_bytes(tsv_data)
+        assert len(rows) == 2
+
+    def test_parse_empty_file(self):
+        with pytest.raises(Exception):
+            parse_csv_bytes(b"")
+
+    def test_detect_encoding_utf8(self):
+        enc = detect_encoding(SAMPLE_CSV)
+        assert enc is not None
+        assert "ascii" in enc.lower() or "utf" in enc.lower()
+
+    def test_infer_column_types(self):
+        types = infer_column_types(
+            ["192.168.1.1", "10.0.0.1", "8.8.8.8"],
+            "src_ip",
+        )
+        assert types == "ip"
+
+    def test_infer_column_types_hash(self):
+        types = infer_column_types(
+            ["d41d8cd98f00b204e9800998ecf8427e"],
+            "hash",
+        )
+        assert types == "hash_md5"
+
+
+class TestNormalizer:
+    """Tests for column normalization."""
+
+    def test_normalize_columns(self):
+        mapping = normalize_columns(["SourceAddr", "DestAddr", "ProcessName"])
+        assert "SourceAddr" in mapping
+        # Should map to canonical names
+        assert mapping.get("SourceAddr") in ("src_ip", "source_address", None) or isinstance(mapping.get("SourceAddr"), str)
+
+    def test_normalize_known_columns(self):
+        mapping = normalize_columns(["timestamp", "hostname", "src_ip"])
+        assert mapping.get("timestamp") == "timestamp"
+        assert mapping.get("hostname") == "hostname"
+        assert mapping.get("src_ip") == "src_ip"
+
+    def test_detect_ioc_columns(self):
+        rows, meta = parse_csv_bytes(SAMPLE_CSV)
+        column_mapping = normalize_columns(meta["columns"])
+        iocs = detect_ioc_columns(meta["columns"], meta["column_types"], column_mapping)
+        # Should detect IP columns
+        assert isinstance(iocs, dict)
+
+    def test_detect_time_range(self):
+        rows, meta = parse_csv_bytes(SAMPLE_CSV)
+        column_mapping = normalize_columns(meta["columns"])
+        start, end = detect_time_range(rows, column_mapping)
+        # Should detect time range from timestamp column
+        if start:
+            assert "2025" in start
+
+    def test_normalize_rows(self):
+        rows = [{"SourceAddr": "10.0.0.1", "ProcessName": "cmd.exe"}]
+        mapping = {"SourceAddr": "src_ip", "ProcessName": "process_name"}
+        normalized = normalize_rows(rows, mapping)
+        assert len(normalized) == 1
+        assert normalized[0].get("src_ip") == "10.0.0.1"
--- a/backend/tests/test_keywords.py
+++ b/backend/tests/test_keywords.py
@@ -0,0 +1,199 @@
+"""Tests for AUP keyword themes, keyword CRUD, and scanner."""
+
+import pytest
+import pytest_asyncio
+from httpx import AsyncClient
+
+
+# ── Theme CRUD ────────────────────────────────────────────────────────
+
+
+@pytest.mark.asyncio
+async def test_list_themes_empty(client: AsyncClient):
+    """Initially (no seed in tests) the themes list should be empty or seeded."""
+    res = await client.get("/api/keywords/themes")
+    assert res.status_code == 200
+    data = res.json()
+    assert "themes" in data
+    assert "total" in data
+
+
+@pytest.mark.asyncio
+async def test_create_theme(client: AsyncClient):
+    res = await client.post("/api/keywords/themes", json={
+        "name": "Test Gambling", "color": "#f44336", "enabled": True,
+    })
+    assert res.status_code == 201
+    data = res.json()
+    assert data["name"] == "Test Gambling"
+    assert data["color"] == "#f44336"
+    assert data["enabled"] is True
+    assert data["keyword_count"] == 0
+    return data["id"]
+
+
+@pytest.mark.asyncio
+async def test_create_duplicate_theme(client: AsyncClient):
+    await client.post("/api/keywords/themes", json={"name": "Dup Theme"})
+    res = await client.post("/api/keywords/themes", json={"name": "Dup Theme"})
+    assert res.status_code == 409
+
+
+@pytest.mark.asyncio
+async def test_update_theme(client: AsyncClient):
+    create = await client.post("/api/keywords/themes", json={"name": "Updatable"})
+    tid = create.json()["id"]
+    res = await client.put(f"/api/keywords/themes/{tid}", json={
+        "name": "Updated Name", "color": "#00ff00", "enabled": False,
+    })
+    assert res.status_code == 200
+    data = res.json()
+    assert data["name"] == "Updated Name"
+    assert data["color"] == "#00ff00"
+    assert data["enabled"] is False
+
+
+@pytest.mark.asyncio
+async def test_delete_theme(client: AsyncClient):
+    create = await client.post("/api/keywords/themes", json={"name": "ToDelete"})
+    tid = create.json()["id"]
+    res = await client.delete(f"/api/keywords/themes/{tid}")
+    assert res.status_code == 204
+
+    # Verify gone
+    check = await client.get("/api/keywords/themes")
+    names = [t["name"] for t in check.json()["themes"]]
+    assert "ToDelete" not in names
+
+
+@pytest.mark.asyncio
+async def test_delete_nonexistent_theme(client: AsyncClient):
+    res = await client.delete("/api/keywords/themes/nonexistent")
+    assert res.status_code == 404
+
+
+# ── Keyword CRUD ──────────────────────────────────────────────────────
+
+
+@pytest.mark.asyncio
+async def test_add_keyword(client: AsyncClient):
+    create = await client.post("/api/keywords/themes", json={"name": "KW Test Theme"})
+    tid = create.json()["id"]
+
+    res = await client.post(f"/api/keywords/themes/{tid}/keywords", json={
+        "value": "poker", "is_regex": False,
+    })
+    assert res.status_code == 201
+    data = res.json()
+    assert data["value"] == "poker"
+    assert data["is_regex"] is False
+    assert data["theme_id"] == tid
+
+
+@pytest.mark.asyncio
+async def test_add_keywords_bulk(client: AsyncClient):
+    create = await client.post("/api/keywords/themes", json={"name": "Bulk KW Theme"})
+    tid = create.json()["id"]
+
+    res = await client.post(f"/api/keywords/themes/{tid}/keywords/bulk", json={
+        "values": ["steam", "epic games", "discord"],
+    })
+    assert res.status_code == 201
+    data = res.json()
+    assert data["added"] == 3
+    assert data["theme_id"] == tid
+
+    # Verify via theme list
+    themes = await client.get("/api/keywords/themes")
+    theme = [t for t in themes.json()["themes"] if t["id"] == tid][0]
+    assert theme["keyword_count"] == 3
+
+
+@pytest.mark.asyncio
+async def test_delete_keyword(client: AsyncClient):
+    create = await client.post("/api/keywords/themes", json={"name": "Del KW Theme"})
+    tid = create.json()["id"]
+
+    kw_res = await client.post(f"/api/keywords/themes/{tid}/keywords", json={"value": "removeme"})
+    kw_id = kw_res.json()["id"]
+
+    res = await client.delete(f"/api/keywords/keywords/{kw_id}")
+    assert res.status_code == 204
+
+
+@pytest.mark.asyncio
+async def test_add_keyword_to_nonexistent_theme(client: AsyncClient):
+    res = await client.post("/api/keywords/themes/fakeid/keywords", json={"value": "test"})
+    assert res.status_code == 404
+
+
+# ── Scanner ───────────────────────────────────────────────────────────
+
+
+@pytest.mark.asyncio
+async def test_scan_empty(client: AsyncClient):
+    """Scan with no data should return zero hits."""
+    res = await client.post("/api/keywords/scan", json={})
+    assert res.status_code == 200
+    data = res.json()
+    assert data["total_hits"] == 0
+    assert data["hits"] == []
+
+
+@pytest.mark.asyncio
+async def test_scan_with_dataset(client: AsyncClient):
+    """Upload a dataset with known keywords, verify scanner finds them."""
+    # Create a theme + keyword
+    theme_res = await client.post("/api/keywords/themes", json={
+        "name": "Scan Test", "color": "#ff0000",
+    })
+    tid = theme_res.json()["id"]
+    await client.post(f"/api/keywords/themes/{tid}/keywords", json={"value": "chrome.exe"})
+
+    # Upload CSV dataset that contains "chrome.exe"
+    from tests.conftest import SAMPLE_CSV
+    import io
+    files = {"file": ("test_scan.csv", io.BytesIO(SAMPLE_CSV), "text/csv")}
+    upload = await client.post("/api/datasets/upload", files=files)
+    assert upload.status_code == 200
+    ds_id = upload.json()["id"]
+
+    # Scan
+    res = await client.post("/api/keywords/scan", json={
+        "dataset_ids": [ds_id],
+        "theme_ids": [tid],
+        "scan_hunts": False,
+        "scan_annotations": False,
+        "scan_messages": False,
+    })
+    assert res.status_code == 200
+    data = res.json()
+    assert data["total_hits"] > 0
+    # Verify the hit references chrome.exe
+    kw_hits = [h for h in data["hits"] if h["keyword"] == "chrome.exe"]
+    assert len(kw_hits) > 0
+
+
+@pytest.mark.asyncio
+async def test_quick_scan(client: AsyncClient):
+    """Quick scan endpoint should work with a dataset_id parameter."""
+    # Create theme + keyword
+    theme_res = await client.post("/api/keywords/themes", json={
+        "name": "Quick Scan Theme", "color": "#00ff00",
+    })
+    tid = theme_res.json()["id"]
+    await client.post(f"/api/keywords/themes/{tid}/keywords", json={"value": "powershell"})
+
+    # Upload dataset
+    from tests.conftest import SAMPLE_CSV
+    import io
+    files = {"file": ("quick_scan.csv", io.BytesIO(SAMPLE_CSV), "text/csv")}
+    upload = await client.post("/api/datasets/upload", files=files)
+    ds_id = upload.json()["id"]
+
+    res = await client.get(f"/api/keywords/scan/quick?dataset_id={ds_id}")
+    assert res.status_code == 200
+    data = res.json()
+    assert "total_hits" in data
+    # powershell should match at least one row
+    assert data["total_hits"] > 0
--- a/backend/threathunt.db-shm
+++ b/backend/threathunt.db-shm
--- a/backend/threathunt.db-wal
+++ b/backend/threathunt.db-wal
--- a/Show More
+++ b/Show More
Author	SHA1	Message	Date
mblanke	bb562a91ca	version 0.3.1	2026-02-20 07:16:17 -05:00
mblanke	04a9946891	feat: host-centric network map, analysis dashboard, deduped inventory - Rewrote NetworkMap to use deduplicated host inventory (163 hosts from 394K rows) - New host_inventory.py service: scans datasets, groups by FQDN/ClientId, extracts IPs/users/OS - New /api/network/host-inventory endpoint - Added AnalysisDashboard with 6 tabs (IOC, anomaly, host profile, query, triage, reports) - Added 16 analysis API endpoints with job queue and load balancer - Added 4 AI/analysis ORM models (ProcessingJob, AnalysisResult, HostProfile, IOCEntry) - Filters system accounts (DWM-, UMFD-, LOCAL/NETWORK SERVICE) - Infers OS from hostname patterns (W10-* -> Windows 10) - Canvas 2D force-directed graph with host/external-IP node types - Click popover shows hostname, FQDN, IPs, OS, users, datasets, connections	2026-02-20 07:16:17 -05:00
mblanke	9b98ab9614	feat: interactive network map, IOC highlighting, AUP hunt selector, type filters - NetworkMap: hunt-scoped force-directed graph with click-to-inspect popover - NetworkMap: zoom/pan (wheel, drag, buttons), viewport transform - NetworkMap: clickable IP/Host/Domain/URL legend chips to filter node types - NetworkMap: brighter colors, 20% smaller nodes - DatasetViewer: IOC columns highlighted with colored headers + cell tinting - AUPScanner: hunt dropdown replacing dataset checkboxes, auto-select all - Rename 'Social Media (Personal)' theme to 'Social Media' with DB migration - Fix /api/hunts timeout: Dataset.rows lazy='noload' (was selectin cascade) - Add OS column mapping to normalizer - Full backend services, DB models, alembic migrations, new routes - New components: Dashboard, HuntManager, FileUpload, NetworkMap, etc. - Docker Compose deployment with nginx reverse proxy	2026-02-19 15:41:15 -05:00
mblanke	d0c9f88268	Add ThreatHunt agent backend/frontend scaffolding	2025-12-29 10:22:57 -05:00
mblanke	dc2dcd02c1	Document Analyst Assist Agents in THREATHUNT_INTENT.md Added section on Analyst Assist Agents in ThreatHunt.	2025-12-24 13:28:52 -05:00
mblanke	73a2efcde3	Add ThreatHunt roadmap with goals and non-goals This document outlines the roadmap for ThreatHunt, detailing near, mid, and long-term goals, as well as explicit non-goals.	2025-12-24 13:08:23 -05:00
mblanke	77509b08f5	docs: clarify VelociCompanion works with CSV uploads, not direct Velociraptor connection	2025-12-09 14:55:16 -05:00
				`@@ -0,0 +1 @@`
				`"""Repositories package — typed CRUD operations for each model."""`