Files
ThreatHunt/write_update.py
mblanke 5a2ad8ec1c feat: Add Playbook Manager, Saved Searches, and Timeline View components
- Implemented PlaybookManager for creating and managing investigation playbooks with templates.
- Added SavedSearches component for managing bookmarked queries and recurring scans.
- Introduced TimelineView for visualizing forensic event timelines with zoomable charts.
- Enhanced backend processing with auto-queued jobs for dataset uploads and improved database concurrency.
- Updated frontend components for better user experience and performance optimizations.
- Documented changes in update log for future reference.
2026-02-23 14:23:07 -05:00

164 lines
9.5 KiB
Python

import os
lines = []
a = lines.append
a("# ThreatHunt Update Log")
a("")
a("## 2026-02-22: Full Auto-Processing Pipeline, Performance Fixes, DB Concurrency")
a("")
a("### Auto-Processing Pipeline (Import-Time)")
a("- **Problem**: Only HOST_INVENTORY ran on dataset upload. Triage, anomaly detection, keyword scanning, and IOC extraction were manual-only, effectively dead code.")
a("- **Solution**: Wired ALL processing modules into the upload endpoint. On CSV import, 5 jobs are now auto-queued: TRIAGE, ANOMALY, KEYWORD_SCAN, IOC_EXTRACT, HOST_INVENTORY.")
a("- **Startup reprocessing**: On backend boot, queries for datasets with no anomaly results and queues the full pipeline for them.")
a("- **Completion tracking**: Pipeline completion callback updates `Dataset.processing_status` to `completed` or `completed_with_errors` when all 4 analysis jobs finish.")
a("- **Triage chaining**: After triage completes, automatically queues a HOST_PROFILE job for deep per-host LLM analysis.")
a("")
a("### Artifact Classification (Was Dead Code)")
a("- **Problem**: `classify_artifact()` in `artifact_classifier.py` existed but was never called.")
a("- **Fix**: Upload endpoint now calls `classify_artifact(columns)` to identify Velociraptor artifact types (30+ fingerprints) and stores `artifact_type` on the dataset.")
a("")
a("### Database Concurrency Fix")
a("- **Problem**: SQLite with `StaticPool` = single shared connection. Any long-running job (keyword scan, triage) blocked ALL other DB queries, freezing the entire app.")
a("- **Fix**: Switched to `NullPool` so each async session gets its own connection. Combined with WAL mode (`PRAGMA journal_mode=WAL`), `busy_timeout=30000`, and `synchronous=NORMAL` for concurrent reads during writes.")
a("")
a("#### Modified: `backend/app/db/engine.py`")
a("- `StaticPool` -> `NullPool` for SQLite")
a("- Added `_set_sqlite_pragmas` event listener: WAL mode, 30s busy timeout, NORMAL sync")
a("- Connection args: `timeout=60`, `check_same_thread=False`")
a("")
a("### Triage Model Fix")
a("- **Problem**: `triage.py` hardcoded `DEFAULT_FAST_MODEL = \"qwen2.5-coder:7b-instruct-q4_K_M\"` which didn't exist on Roadrunner, causing 404 errors on every triage batch.")
a("- **Fix**: Changed to `settings.DEFAULT_FAST_MODEL` which resolves to `llama3.1:latest` (available on Roadrunner). Configurable via `TH_DEFAULT_FAST_MODEL` env var.")
a("")
a("### Host Profiler ClientID Fix")
a("- **Problem**: Velociraptor ClientID-format hostnames (`C.82465a50d075ea20`) were sent to the LLM for profiling, producing empty/useless results.")
a("- **Fix**: Added regex filter `^C\\.[0-9a-fA-F]{8,}$` to skip ClientID entries before profiling.")
a("")
a("### Job Queue Expansion")
a("- **Before**: 3 job types (TRIAGE, HOST_PROFILE, REPORT), 3 workers")
a("- **After**: 8 job types, 5 workers, pipeline completion callbacks")
a("- Added: KEYWORD_SCAN, IOC_EXTRACT to JobType enum")
a("- Added: `PIPELINE_JOB_TYPES` frozenset (TRIAGE, ANOMALY, KEYWORD_SCAN, IOC_EXTRACT)")
a("- Added: `_on_pipeline_job_complete` callback updates `processing_status`")
a("- Added: `_handle_keyword_scan` using `KeywordScanner(db).scan()`")
a("- Added: `_handle_ioc_extract` using `extract_iocs_from_dataset()`")
a("- Triage now chains HOST_PROFILE after completion")
a("")
a("#### Modified: `backend/app/api/routes/datasets.py`")
a("- Upload calls `classify_artifact(columns)` for artifact type detection")
a("- Sets `artifact_type` and `processing_status=\"processing\"` on create")
a("- Queues 5 jobs: TRIAGE, ANOMALY, KEYWORD_SCAN, IOC_EXTRACT, HOST_INVENTORY")
a("- `UploadResponse` includes `artifact_type`, `processing_status`, `jobs_queued`")
a("")
a("#### Modified: `backend/app/main.py`")
a("- Startup reprocessing: finds datasets with no `AnomalyResult` records, queues full pipeline")
a("- Marks reprocessed datasets as `processing_status=\"processing\"`")
a("- Logs skip message when all datasets already processed")
a("")
a("### Network Map Performance Fix")
a("- **Problem**: 163 hosts + 1121 connections created 528 total nodes (365 external IPs). The O(N^2) force simulation did 278,784 pairwise calculations per animation frame, freezing the browser.")
a("- **Fix**: 6 optimizations applied to `frontend/src/components/NetworkMap.tsx`:")
a("")
a("| Fix | Detail |")
a("|-----|--------|")
a("| Cap external IPs | `MAX_EXTERNAL_NODES = 30` (was unlimited: 365) |")
a("| Sampling simulation | For N > 150 nodes, sample 40 random per node instead of N^2 pairs |")
a("| Distance cutoff | Skip repulsion for pairs > 600px apart |")
a("| Single redraw on hover | Was restarting full animation loop on every mouse hover |")
a("| Faster alpha decay | 0.97 -> 0.93 per frame (settles ~2x faster) |")
a("| Lower initial energy | simAlpha 0.6 -> 0.3, sim steps 80 -> 60 |")
a("")
a("### Test Results")
a("- **79/79 backend tests passing** (0.72s)")
a("- Both Docker containers healthy")
a("- 21/21 frontend-facing endpoints return 200 OK through nginx")
a("")
a("### Endpoint Verification (via nginx on port 3000)")
a("")
a("| Endpoint | Status | Size |")
a("|----------|--------|------|")
a("| /api/agent/health | 200 | 522b |")
a("| /api/hunts | 200 | 259b |")
a("| /api/datasets?hunt_id=... | 200 | 23KB |")
a("| /api/datasets/{id}/rows | 200 | 144KB |")
a("| /api/analysis/anomalies/{id} | 200 | 104KB |")
a("| /api/analysis/iocs/{id} | 200 | 1.2KB |")
a("| /api/analysis/triage/{id} | 200 | 9.5KB |")
a("| /api/analysis/profiles/{hunt} | 200 | 177KB |")
a("| /api/network/host-inventory | 200 | 181KB |")
a("| /api/timeline/hunt/{hunt} | 200 | 351KB |")
a("| /api/keywords/themes | 200 | 23KB |")
a("| /api/playbooks/templates | 200 | 2.5KB |")
a("| /api/reports/hunt/{hunt} | 200 | 10.6KB |")
a("| /api/export/stix/{hunt} | 200 | 391b |")
a("")
a("---")
a("")
a("## 2026-02-21: Feature Expansion, Dashboard Rewrite, Docker Deployment")
a("")
a("### New Features Added")
a("- **MITRE ATT&CK Matrix** (`/api/mitre/coverage`, `MitreMatrix.tsx`) - technique coverage visualization")
a("- **Timeline View** (`/api/timeline/hunt/{hunt}`, `TimelineView.tsx`) - chronological event explorer")
a("- **Playbook Manager** (`/api/playbooks`, `PlaybookManager.tsx`) - investigation playbook CRUD with templates")
a("- **Saved Searches** (`/api/searches`, `SavedSearches.tsx`) - save/run named queries")
a("- **STIX Export** (`/api/export/stix/{hunt}`) - STIX 2.1 bundle export for threat intel sharing")
a("")
a("### DB Models Added")
a("- `Playbook`, `PlaybookStep` - investigation playbook tracking")
a("- `SavedSearch` - persisted named queries")
a("")
a("### Dashboard & Correlation Rewrite")
a("- `Dashboard.tsx` - rewrote with live stat cards, dataset table, processing status indicators")
a("- `CorrelationView.tsx` - rewrote with working correlation analysis UI")
a("- `AgentPanel.tsx` - added SSE streaming for real-time agent responses")
a("")
a("### Docker Deployment")
a("- `Dockerfile.frontend` - added `TSC_COMPILE_ON_ERROR=true` for MUI X v8 compatibility")
a("- `nginx.conf` - SSE proxy headers, 500MB upload, 300s proxy timeout, SPA fallback")
a("- Frontend healthcheck changed from wget to curl with 127.0.0.1")
a("")
a("---")
a("")
a("## 2026-02-20: Host-Centric Network Map & Analysis Platform")
a("")
a("### Network Map Overhaul")
a("- **Problem**: Network Map showed 409 misclassified domain nodes (mostly process names like svchost.exe) and 0 hosts. No deduplication.")
a("- **Root Cause**: IOC column detection misclassified `Fqdn` as domain instead of hostname; `Name` column (process names) wrongly tagged as domain IOC.")
a("- **Solution**: Created host-centric inventory system. Scans all datasets, groups by `Fqdn`/`ClientId`, extracts IPs, users, OS, and network connections.")
a("")
a("#### New Backend Files")
a("- `host_inventory.py` - Deduplicated host inventory builder with in-memory cache, background job pattern (202 polling), 5000-row batches")
a("- `network.py` routes - `GET /api/network/host-inventory`, `/inventory-status`, `/rebuild-inventory`")
a("- `ioc_extractor.py` - Regex IOC extraction (IP, domain, hash, email, URL)")
a("- `anomaly_detector.py` - Embedding-based outlier detection via bge-m3")
a("- `data_query.py` - Natural language to structured query translation")
a("- `load_balancer.py` - Round-robin load balancer for Ollama LLM nodes")
a("- `job_queue.py` - Async job queue (initially 3 workers, 3 job types)")
a("- `analysis.py` routes - 16 analysis endpoints")
a("")
a("#### Frontend")
a("- `NetworkMap.tsx` - Canvas 2D force-directed graph, HiDPI, node dragging, search, popover, module-level cache")
a("- `AnalysisDashboard.tsx` - 6-tab analysis dashboard")
a("- `client.ts` - `network.*` and `analysis.*` API namespaces")
a("")
a("### Results (Radio Hunt - 20 Velociraptor datasets, 394K rows)")
a("")
a("| Metric | Before | After |")
a("|--------|--------|-------|")
a("| Nodes shown | 409 misclassified domains | **163 unique hosts** |")
a("| Hosts identified | 0 | **163** |")
a("| With IP addresses | N/A | **48** (172.17.x.x LAN) |")
a("| With logged-in users | N/A | **43** (real names only) |")
a("| OS detected | None | **Windows 10** (inferred) |")
a("| Deduplication | None | **Full** (by FQDN/ClientId) |")
a("")
a("### LLM Infrastructure")
a("- **Roadrunner** (100.110.190.11:11434): llama3.1:latest, qwen2.5-coder:7b, qwen2.5:14b, bge-m3 embeddings")
a("- **Wile** (100.110.190.12:11434): llama3.1:70b-instruct-q4_K_M (heavy analysis)")
a("- **Open WebUI** (ai.guapo613.beer): Cluster management interface")
path = r'd:\Projects\Dev\ThreatHunt\update.md'
with open(path, 'w', encoding='utf-8') as f:
f.write('\n'.join(lines) + '\n')
print(f'Written {len(lines)} lines to update.md')