Files
ThreatHunt/docs/update.md
2026-02-20 14:32:42 -05:00

548 lines
27 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# ThreatHunt — Session Update (February 1920, 2026)
**Version: 0.4.0**
## Summary
This document covers feature improvements and bug fixes across the ThreatHunt platform.
---
## 1. Network Map — Clickable Type Filters (IP / Host / Domain / URL)
**Request:** "IP, Host, Domain, URL on the network map… can I click on these and they would be filtered in or out"
**Changes (`NetworkMap.tsx`):**
- Legend chips (IP, Host, Domain, URL) are now clickable toggle filters
- Added `visibleTypes` state (`Set<NodeType>`) and `filteredGraph` useMemo that filters nodes + edges by visible types
- Active chips: filled background with type color, fully opaque
- Inactive chips: outlined with type color, dimmed at 50% opacity
- Each chip shows node count for that type, e.g. `IP (42)`
- At least one type must stay visible (can't disable all four)
- Stats chips (nodes/edges) update to reflect the filtered view
- All mouse handlers (pan, hover, click, wheel zoom) updated to use `filteredGraph` instead of raw `graph`
---
## 2. Network Map — Cleaner Nodes (Brighter Colors, 20% Smaller)
**Request:** "Clean up the icons, the colours are dull and the icons are too big in the map… shrink by 20%"
**Changes (`NetworkMap.tsx`):**
- Colors bumped to more saturated variants:
- IP: `#60a5fa``#3b82f6`
- Host: `#10b981``#22c55e`
- Domain: `#f59e0b``#eab308`
- URL: `#a78bfa``#8b5cf6`
- Node radius shrunk ~20%:
- Max: 22 → 18
- Base: 5 → 4
- Multiplier: 2.0 → 1.6
---
## 3. Dataset Viewer — IOC Column Highlighting
**Request:** "I'm looking at one of the datasets and 4 IOCs are showing… but I can't see it… is there a way to highlight that"
**Changes (`DatasetViewer.tsx`):**
- IOC columns in the DataGrid are now visually highlighted
- **Header:** colored background tint + bold colored text + IOC type label (e.g. `src_ip ◆ IP`)
- **Cells:** subtle colored background + left border stripe
- Color-coded by IOC type matching the Network Map palette:
- IP — blue (`#3b82f6`)
- Hostname — green (`#22c55e`)
- Domain — amber (`#eab308`)
- URL — purple (`#8b5cf6`)
- Hashes (MD5/SHA1/SHA256) — rose (`#f43f5e`)
- Added `IOC_COLORS` mapping, `iocTypeFor()` helper, and dynamic `headerClassName`/`cellClassName` on DataGrid columns
- CSS-in-JS styles injected via DataGrid `sx` prop
---
## 4. AUP Scanner — "Social Media (Personal)" → "Social Media" Rename
**Request:** "On the AUP page there is Social Media (Personal) — can we remove personal, it's messing up with the formatting"
**Changes (`keyword_defaults.py`):**
- Default theme key renamed from `"Social Media (Personal)"` to `"Social Media"`
- Added rename migration in `seed_defaults()`: checks for old name in DB, if found renames via SQL UPDATE + commit before normal seeding
- Backend log confirmed: `Renamed AUP theme 'Social Media (Personal)' → 'Social Media'`
---
## 5. AUP Scanner — Hunt Dropdown (previous session, deployed)
- Replaced individual dataset checkboxes with a hunt selector dropdown
- Selecting a hunt auto-loads and selects all its datasets
- Shows dataset/row counts below the dropdown
---
## 6. Network Map — Hunt-Scoped Interactive Map (previous session, deployed)
- Hunt selector dropdown loads only that hunt's datasets
- Enriched nodes with hostname, IP, OS metadata
- Click-to-inspect MUI Popover showing node details
- Zoom (wheel + buttons) and pan (drag) with full viewport transform
- Force-directed layout with co-occurrence edges
---
## 7. Agent Assist — "Failed to Fetch" Diagnosis
**Request:** "AI assist failed: Error: Failed to fetch"
**Diagnosis:**
- Backend agent endpoint works correctly (tested via PowerShell `Invoke-RestMethod` through nginx proxy)
- Health endpoint healthy — both LLM nodes (Wile + Roadrunner) available
- Extra fields sent by frontend (`mode`, `hunt_id`, `conversation_id`) are accepted by Pydantic v2 (ignored, not rejected)
- "Failed to fetch" was a transient browser-level network error, not a backend issue
- Response time ~5s from LLM — within nginx 120s proxy timeout
- **Resolution:** Hard refresh (Ctrl+Shift+R) resolves the issue
---
## 8. Performance Fix — /api/hunts Timeout (previous session)
- Root cause: `Dataset.rows` relationship had `lazy="selectin"` causing SQLAlchemy to cascade-load every DatasetRow when listing hunts
- Fix: Changed `Dataset.rows` and `DatasetRow.annotations` to `lazy="noload"` in `backend/app/db/models.py`
- Result: Hunts endpoint returns instantly
---
## Files Modified This Session
| File | Change |
|------|--------|
| `frontend/src/components/NetworkMap.tsx` | Type filter toggles, brighter colors, smaller nodes |
| `frontend/src/components/DatasetViewer.tsx` | IOC column highlighting in DataGrid |
| `backend/app/services/keyword_defaults.py` | Theme rename + DB migration |
## Deployment
All changes built and deployed via Docker Compose:
```
docker compose build --no-cache frontend
docker compose up -d frontend
docker compose build --no-cache backend
docker compose up -d backend
```
## Git
Committed and pushed to GitHub (`main` branch):
```
92 files changed, 13050 insertions(+), 1097 deletions(-)
d0c9f88..9b98ab9 main -> main
```
---
## 9. Network Picture — Deduplicated Host Inventory (February 20, 2026)
**Request:** "While files are being dropped in, give me a clean clear network picture — like netstat. No 100 iterations of the same computer. Clean pic: hostname, IP, and any users logged into that machine."
### What It Does
New **Net Picture** page that scans all datasets in a hunt and produces a **one-row-per-host** inventory table. If a host appears in 900 netstat rows, it shows once — with all unique IPs, usernames, OS versions, MACs, and ports aggregated via server-side set deduplication. Supports networks with up to 1000 hosts; no artificial caps on unique values per host.
### Backend
**New service**`backend/app/services/network_inventory.py`:
- `build_network_picture(db, hunt_id)` streams all dataset rows in batches of 1000
- Groups rows by `hostname` (case-insensitive), falls back to `src_ip`/`ip_address` if no hostname column
- Per host, aggregates into Python `set()`s: IPs, users, OS, MAC addresses, protocols, open ports, remote targets, dataset sources
- Tracks `connection_count` (total rows), `first_seen`/`last_seen` from timestamp columns
- Returns sorted by connection count descending
**New API route**`backend/app/api/routes/network.py`:
- `GET /api/network/picture?hunt_id={id}``NetworkPictureResponse`
- Response: `hosts[]` (each with hostname, ips, users, os, mac_addresses, protocols, open_ports, remote_targets, datasets, connection_count, first_seen, last_seen) + `summary` (total_hosts, total_connections, total_unique_ips, datasets_scanned)
**Normalizer additions**`backend/app/services/normalizer.py`:
- Added `mac_address` canonical mapping: `mac`, `mac_address`, `physical_address`, `mac_addr`, `hw_addr`, `ethernet`
- Added `connection_state` canonical mapping: `state`, `status`, `tcp_state`, `conn_state`
### Frontend
**New component**`frontend/src/components/NetworkPicture.tsx`:
- Hunt selector dropdown → loads network picture from backend
- MUI Table with sortable columns: Hostname, IPs, Users, OS, MAC, Connections, Ports
- **ChipList** sub-component: shows first 5 values inline as coloured Chips; "+N more" badge expands to show all (no data hidden, just visual overflow control)
- Click any row → expand panel showing: remote targets, all open ports, protocols, MAC addresses, dataset sources, time range
- Search bar filters by hostname, IP, user, OS, or MAC
- Summary stat cards: total hosts, unique IPs, connections, datasets scanned
- Colour palette matches Network Map: IP blue, User green, OS amber, MAC purple, Port rose
**App integration**`frontend/src/App.tsx`:
- New nav item: **Net Picture** with `DevicesIcon`, route `/netpicture`
- Import and route for `NetworkPicture` component
**API client**`frontend/src/api/client.ts`:
- Added `HostEntry`, `PictureSummary`, `NetworkPictureResponse` interfaces
- Added `network.picture(huntId)` method
### Files Modified
| File | Change |
|------|--------|
| `backend/app/services/normalizer.py` | Added `mac_address` + `connection_state` mappings |
| `backend/app/services/network_inventory.py` | **New** — host aggregation service |
| `backend/app/api/routes/network.py` | **New**`/api/network/picture` endpoint |
| `backend/app/main.py` | Registered network router |
| `frontend/src/api/client.ts` | Added network picture types + API method |
| `frontend/src/components/NetworkPicture.tsx` | **New** — host inventory table component |
| `frontend/src/App.tsx` | Added Nav item + route for Net Picture |
---
## 10. Test Data — Velociraptor Mock Network CSVs (February 20, 2026)
**Request:** "Create 10-15 different CSV files you'd expect from Velociraptor for a mock network with 50-100 hosts and random network traffic with a few keywords that would trigger the AUP."
### What Was Created
12 realistic Velociraptor-style CSV files in `backend/tests/test_csvs/`, generated by a Python script (`generate_test_csvs.py`) with deterministic seed for reproducibility.
**Mock Network:**
- **82 hosts**: 75 workstations (IT-WS, HR-WS, FIN-WS, SLS-WS, ENG-WS, LEG-WS, MKT-WS, EXEC-WS) + 7 servers (DC-01, DC-02, FILE-01, EXCH-01, WEB-01, SQL-01, PROXY-01)
- **14 users**: 10 named users (jsmith, agarcia, bwilson, etc.) + 4 service accounts
- **3 subnets**: `10.10.1.x`, `10.10.2.x`, `10.10.3.x`
- **Domain**: `acme.local`
- **Time range**: Feb 1020, 2026
### Files
| # | File | Rows | Velociraptor Artifact |
|---|------|------|----------------------|
| 1 | `01_netstat_connections.csv` | 2,012 | `Windows.Network.Netstat` |
| 2 | `02_dns_queries.csv` | 2,964 | `Windows.Network.DNS` |
| 3 | `03_process_listing.csv` | 1,896 | `Windows.System.Pslist` |
| 4 | `04_network_interfaces.csv` | 94 | `Windows.Network.Interfaces` |
| 5 | `05_logged_in_users.csv` | 171 | `Windows.Sys.Users` |
| 6 | `06_scheduled_tasks.csv` | 410 | `Windows.System.TaskScheduler` |
| 7 | `07_browser_history.csv` | 1,586 | `Windows.Application.Chrome.History` |
| 8 | `08_sysmon_network.csv` | 2,517 | Sysmon Event ID 3 |
| 9 | `09_autoruns.csv` | 342 | `Windows.Sys.AutoRuns` |
| 10 | `10_logon_events.csv` | 1,604 | Windows Security (4624/4625) |
| 11 | `11_proxy_logs.csv` | 2,605 | Web proxy / filter |
| 12 | `12_file_listing.csv` | 421 | `Windows.Search.FileFinder` |
### AUP Triggers Embedded
| Category | Examples |
|----------|----------|
| Gambling | bet365.com, pokerstars.com, draftkings.com |
| Gaming | steam.exe, discord.exe, steamcommunity.com, epicgameslauncher.exe |
| Streaming | netflix.com, hulu.com, spotify.exe, open.spotify.com |
| Downloads / Piracy | thepiratebay.org, 1337x.to, utorrent.exe, qbittorrent.exe, free_movie_2026.torrent |
| Adult Content | pornhub.com, onlyfans.com, xvideos.com |
| Social Media | facebook.com, tiktok.com, reddit.com |
| Job Search | indeed.com, glassdoor.com, linkedin.com/jobs |
| Shopping | amazon.com, ebay.com, shein.com |
AUP keywords appear in: DNS queries (~12%), browser history (~15%), proxy logs (~10%), process listing (~15% of hosts), autoruns (~20% of workstations), sysmon network (~10%), file listing (crack_photoshop.exe, keygen_v2.exe, free_movie_2026.torrent).
### Files Added
| File | Purpose |
|------|---------|
| `backend/tests/generate_test_csvs.py` | **New** — deterministic CSV generator script |
| `backend/tests/test_csvs/*.csv` (×12) | **New** — mock Velociraptor test data |
---
## 11. Phase 8 — Analyzer Framework & Alerts (February 20, 2026)
### What It Does
Automated alerting engine with 6 built-in analyzers that scan dataset rows for suspicious activity and produce scored, MITRE-tagged alerts. Full alert lifecycle (new → acknowledged → in-progress → resolved / false-positive), bulk operations, and configurable alert rules.
### Backend
**New service**`backend/app/services/analyzers.py` (~350 lines):
- `BaseAnalyzer` ABC with `name`, `description`, and `async analyze(rows)` method
- 6 built-in analyzers:
- **EntropyAnalyzer** — Shannon entropy on command_line/url/path fields, threshold 4.5, maps to T1027 (Obfuscated Files or Information)
- **SuspiciousCommandAnalyzer** — 19 regex patterns (mimikatz, encoded PowerShell, schtasks, psexec, vssadmin, etc.) with per-pattern MITRE mappings
- **NetworkAnomalyAnalyzer** — Beaconing detection (dst IP frequency), suspicious ports (4444, 5555, etc.), large transfers (>10 MB), maps to T1071/T1048/T1571
- **FrequencyAnomalyAnalyzer** — Flags values <1% occurrence in process_name/username/event_type fields
- **AuthAnomalyAnalyzer** — Brute force (>5 failed logins per user), unusual logon types (3=network, 10=RDP), maps to T1110/T1021
- **PersistenceAnalyzer** — Registry Run keys, services, Winlogon, IFEO patterns, maps to T1547/T1543/T1546
- `run_all_analyzers(rows, analyzers?)` — runs all or selected analyzers, returns sorted `AlertCandidate` list
**New API route**`backend/app/api/routes/alerts.py` (~300 lines):
- `GET /api/alerts` — list with filters (status, severity, analyzer, hunt_id, dataset_id)
- `GET /api/alerts/stats` — severity/status/analyzer/MITRE breakdowns
- `GET /api/alerts/{id}`, `PUT /api/alerts/{id}`, `DELETE /api/alerts/{id}`
- `POST /api/alerts/bulk-update` — bulk status changes on selected alert IDs
- `GET /api/alerts/analyzers/list` — available analyzer metadata
- `POST /api/alerts/analyze` — run analyzers on a dataset/hunt, auto-creates Alert records
- Alert Rules CRUD: `GET /api/alerts/rules/list`, `POST /api/alerts/rules`, `PUT /api/alerts/rules/{id}`, `DELETE /api/alerts/rules/{id}`
**New ORM models**`backend/app/db/models.py`:
- `Alert` — 18 fields (id, title, description, severity, status, analyzer, score, evidence JSON, mitre_technique, tags JSON, hunt_id FK, dataset_id FK, case_id FK, assignee, acknowledged_at, resolved_at, timestamps; indexes on severity/status/hunt/dataset)
- `AlertRule` — 10 fields (id, name, description, analyzer, config JSON, severity_override, enabled, hunt_id FK, timestamps; index on analyzer)
**Alembic migration**`backend/alembic/versions/b4c2d3e5f6a7_add_alerts_and_alert_rules.py`
### Frontend
**New component**`frontend/src/components/AlertPanel.tsx` (~350 lines):
- **Alerts tab**: MUI DataGrid with severity/status chips, score, MITRE technique; checkbox selection for bulk actions (acknowledge / resolve / false-positive); click → detail dialog with evidence JSON viewer
- **Stats tab**: Cards for total, per-severity counts, status/analyzer breakdowns, top MITRE techniques
- **Rules tab**: Rule cards with enable/disable switch, delete; create rule dialog with analyzer selector
- Selector bar: Hunt + Dataset pickers, "Run Analyzers" button, severity/status filters
**API client**`frontend/src/api/client.ts`:
- Added `AlertData`, `AlertStats`, `AlertRuleData`, `AnalyzerInfo`, `AnalyzeResult` interfaces
- Added `alerts` namespace with full CRUD + analyze + rules methods
**App integration**`frontend/src/App.tsx`:
- New nav item: **Alerts** with `NotificationsActiveIcon`, route `/alerts`
### Files
| File | Change |
|------|--------|
| `backend/app/services/analyzers.py` | **New** — 6 analyzers + runner |
| `backend/app/api/routes/alerts.py` | **New** — alerts API (CRUD, stats, bulk, rules) |
| `backend/app/db/models.py` | Added `Alert` + `AlertRule` models |
| `backend/alembic/versions/b4c2d3e5f6a7_...py` | **New** — alerts migration |
| `frontend/src/components/AlertPanel.tsx` | **New** — alerts dashboard |
| `frontend/src/api/client.ts` | Added alert types + `alerts` namespace |
| `frontend/src/App.tsx` | Added Alerts nav + route |
---
## 12. Phase 9 — Investigation Notebooks & Playbooks (February 20, 2026)
### What It Does
Cell-based investigation notebooks (markdown / query / code) for documenting hunts, plus a playbook engine with 4 built-in response templates and step-by-step execution tracking.
### Backend
**New service**`backend/app/services/playbook.py` (~250 lines):
- `NotebookCell` dataclass + `validate_notebook_cells()` helper
- 4 built-in playbook templates:
- **Suspicious Process Investigation** (6 steps): identify → process tree → network → analyzers → MITRE → document
- **Lateral Movement Hunt** (5 steps): remote tools → auth anomaly → network anomaly → knowledge graph → escalate
- **Data Exfiltration Check** (5 steps): large transfers → DNS → timeline → correlate → MITRE + document
- **Ransomware Triage** (5 steps): indicators search → all analyzers → persistence → LLM deep → critical case
- Each step: order, title, description, action, action_config, expected_outcome
**New API route**`backend/app/api/routes/notebooks.py` (~280 lines):
- Notebook CRUD: `GET /api/notebooks`, `GET /api/notebooks/{id}`, `POST /api/notebooks`, `PUT /api/notebooks/{id}`, `DELETE /api/notebooks/{id}`
- Cell operations: `POST /api/notebooks/{id}/cells/upsert`, `DELETE /api/notebooks/{id}/cells/{cell_id}`
- Playbook endpoints: `GET /api/notebooks/playbooks/templates`, `GET /api/notebooks/playbooks/templates/{name}`, `POST /api/notebooks/playbooks/start`, `GET /api/notebooks/playbooks/runs`, `GET /api/notebooks/playbooks/runs/{id}`, `POST /api/notebooks/playbooks/runs/{id}/complete-step`, `POST /api/notebooks/playbooks/runs/{id}/abort`
**New ORM models**`backend/app/db/models.py`:
- `Notebook` — 10 fields (id, title, description, cells JSON, hunt_id FK, case_id FK, owner_id FK, tags JSON, timestamps; index on hunt_id)
- `PlaybookRun` — 12 fields (id, playbook_name, status, current_step, total_steps, step_results JSON, hunt_id FK, case_id FK, started_by, timestamps, completed_at; indexes on hunt_id/status)
**Alembic migration**`backend/alembic/versions/c5d3e4f6a7b8_add_notebooks_and_playbook_runs.py`
### Frontend
**New component**`frontend/src/components/InvestigationNotebook.tsx` (~280 lines):
- List view: Grid of notebook cards with title, description, cell count, tags, open/delete
- Detail view: Cell-based editor with markdown/query/code cell types
- Markdown cells render via ReactMarkdown + remarkGfm; code/query cells as monospace pre blocks
- Edit mode: multiline TextField with Ctrl+S save shortcut
- Cell operations: add (markdown/query/code), edit, save, delete, move up/down
**New component**`frontend/src/components/PlaybookManager.tsx` (~280 lines):
- **Templates tab**: Grid of template cards with category chips, tags, step count; view detail (MUI Stepper showing all steps) or start new run
- **Runs tab**: List of active/completed runs with progress bars
- Active run view: Vertical MUI Stepper with step completion, notes field, skip/abort, LinearProgress
**API client**`frontend/src/api/client.ts`:
- Added `NotebookCell`, `NotebookData`, `PlaybookTemplate`, `PlaybookStep`, `PlaybookTemplateDetail`, `PlaybookRunData` interfaces
- Added `notebooks` namespace (CRUD + cell ops) and `playbooks` namespace (templates, runs, step-complete, abort)
**App integration**`frontend/src/App.tsx`:
- New nav items: **Notebooks** (`MenuBookIcon`, `/notebooks`) and **Playbooks** (`PlaylistPlayIcon`, `/playbooks`)
### Files
| File | Change |
|------|--------|
| `backend/app/services/playbook.py` | **New** — 4 playbook templates + cell validation |
| `backend/app/api/routes/notebooks.py` | **New** — notebooks + playbooks API |
| `backend/app/db/models.py` | Added `Notebook` + `PlaybookRun` models |
| `backend/alembic/versions/c5d3e4f6a7b8_...py` | **New** — notebooks migration |
| `frontend/src/components/InvestigationNotebook.tsx` | **New** — cell-based notebook editor |
| `frontend/src/components/PlaybookManager.tsx` | **New** — playbook template browser + run stepper |
| `frontend/src/api/client.ts` | Added notebook + playbook types + namespaces |
| `frontend/src/App.tsx` | Added Notebooks + Playbooks nav + routes |
---
## 13. Docker Build Fixes (February 20, 2026)
Several issues were resolved to get the full 22-page frontend and updated backend deployed in Docker Compose.
### npm / TypeScript Fixes
| Issue | Fix |
|-------|-----|
| `npm ci` failing: "Missing: yaml@2.8.2 from lock file" | Installed `yaml@2``postcss-load-config` (from tailwindcss) required `^2.4.2` but only `1.10.2` was present |
| `GridRowSelectionModel` cast to `string[]` | MUI X DataGrid v8 changed to `{ type, ids: Set }` — fixed with `Array.from(model.ids)` |
| Recharts `formatter` type mismatch | Changed explicit `number`/`string` params to `any` in `Dashboard.tsx` |
| Missing declarations for `cytoscape-dagre` / `cytoscape-cola` | Created `frontend/src/declarations.d.ts` |
| Missing `HuntOut` type export | Added `export type HuntOut = Hunt` alias in `client.ts` |
| `cytoscape.Stylesheet` no longer exists | Renamed to `cytoscape.StylesheetStyle` in ProcessTree, StorylineGraph, KnowledgeGraph |
### Backend Startup Fixes
| Issue | Fix |
|-------|-----|
| `init_db()` crash: "table cases already exists" | Rewrote to inspect existing tables and only create missing ones |
| Alembic migration replay on startup | DB had all tables (from `init_db()`) but Alembic was stamped at `98ab619418bc` — ran `alembic stamp head` to sync to `c5d3e4f6a7b8` |
### Files Modified
| File | Change |
|------|--------|
| `frontend/package.json` / `package-lock.json` | Added `yaml@2` dependency |
| `frontend/src/declarations.d.ts` | **New** — ambient module declarations |
| `frontend/src/api/client.ts` | Added `HuntOut` type alias |
| `frontend/src/components/AlertPanel.tsx` | Fixed `GridRowSelectionModel` handling |
| `frontend/src/components/Dashboard.tsx` | Fixed Recharts formatter types |
| `frontend/src/components/ProcessTree.tsx` | `Stylesheet``StylesheetStyle` |
| `frontend/src/components/StorylineGraph.tsx` | `Stylesheet``StylesheetStyle` |
| `frontend/src/components/KnowledgeGraph.tsx` | `Stylesheet``StylesheetStyle` |
| `backend/app/db/engine.py` | Safe `init_db()` with inspector |
---
## Current Navigation (22 pages)
Dashboard · Hunts · Datasets · Upload · Agent · Analysis · Annotations · Hypotheses · Correlation · Network Map · Net Picture · Proc Tree · Storyline · Timeline · Search · MITRE Map · Knowledge · Cases · **Alerts** · **Notebooks** · **Playbooks** · AUP Scanner
## Deployment
```
docker compose build
docker compose up -d
# Backend: http://localhost:8000 (healthy)
# Frontend: http://localhost:3000 (200 OK)
```
## Alembic Migration Chain
```
9790f482da06 (initial schema)
98ab619418bc (keyword themes + keywords)
a3b1c2d4e5f6 (cases + activity logs)
b4c2d3e5f6a7 (alerts + alert rules)
c5d3e4f6a7b8 (notebooks + playbook runs) ← HEAD
```
---
## 14. Process Tree — HTTP 500 Fix (February 20, 2026)
**Request:** "process tree: HTTP 500"
**Root Cause:** `MissingGreenlet` error in `_fetch_rows` — async SQLAlchemy lazy-loading the `dataset` relationship on `DatasetRow` outside a greenlet context.
**Fix (`backend/app/services/process_tree.py`):**
- Added `selectinload(DatasetRow.dataset)` to the `_fetch_rows` query so the relationship is eagerly loaded within the async session
- Endpoint now returns data correctly (verified: 3,908 processes for full hunt)
---
## 15. Process Tree — UX Rewrite (February 20, 2026)
**Request:** "proctree view is horrible… there are 3908 processes and I can't see / zoom in to see and when I do click on one the resolution changes to show the data on the right. Ideally I'd like another dropdown to pick a host."
**Complete rewrite of `frontend/src/components/ProcessTree.tsx` (~520 lines):**
- **Host dropdown**: Autocomplete populated by extracting unique hostnames from tree data (82 hosts)
- **Server-side hostname filtering**: API `?hostname=` param reduces returned data per view
- **Grid layout fallback**: When edges < 10% of nodes (e.g. test data with no parent-child relationships), uses grid layout instead of dagre
- **Overlay detail panel**: Absolute-positioned panel on click — no longer reflows the graph
- **ResizeObserver**: Keeps Cytoscape canvas in sync with container size changes
- **Search/highlight**: Filter processes by name or PID
- **Zoom controls**: Fit, zoom-in, zoom-out buttons
---
## 16. Process Tree — White Screen Crash Fix (February 20, 2026)
**Request:** "I picked a different host and the screen went white"
**Root Cause:** Cytoscape's `destroy()` removed `<canvas>` elements that were children of the same DOM node React was managing, causing `NotFoundError: Failed to execute 'removeChild' on 'Node'` when React tried to reconcile.
**Fix (`frontend/src/components/ProcessTree.tsx`):**
- Separated Cytoscape container into a plain `<div>` with zero React children
- Loading spinner and placeholder text are now sibling overlays (not children of the Cytoscape div)
- Added explicit `cyRef.current = null` after `destroy()`
- Cleanup on unmount prevents stale references
- Verified: switching between hosts (DC-01 → EXEC-WS-039 → ENG-WS-052) works with 0 console errors
---
## 17. LLM Analysis — Response Parsing Fix (February 20, 2026)
**Request:** "llm analysis: HTTP 500 — searching for evidence of adult site access"
**Root Cause Chain:**
1. **`AttributeError: 'dict' object has no attribute 'strip'`** — `OllamaProvider.generate()` returns a dict (`{"response": "<llm text>", "model": ..., ...}`), but `_parse_analysis` expected a string.
2. **Wrong dict key extraction** — Initial dict-handling fix looked for `raw.get("analysis")` but Ollama's `/api/generate` endpoint puts the LLM text in the `"response"` key.
3. **JSON parsing failures** — The 70B model sometimes produces slightly malformed JSON (trailing commas, etc.) that `json.loads` rejects.
**Fixes (`backend/app/services/llm_analysis.py`):**
| Change | Detail |
|--------|--------|
| Dict response extraction | `raw.get("response")` instead of `raw.get("analysis")` — correctly extracts LLM text from Ollama API dict |
| Robust JSON parsing | New `_extract_json_candidates()` generator: tries full text, then outermost `{…}` block, then trailing-comma-fixed variant |
| Reduced prompt size | `max_sample` 50 → 20 rows, `max_chars` 8000 → 6000 |
| Reduced row limit | `/llm-analyze` endpoint loads 2000 rows (was 5000) |
| Timeout | `asyncio.wait_for(..., timeout=300)` wraps the LLM call — returns graceful "timed out" message instead of hanging |
| Debug logging | `_parse_analysis` logs extraction source, text length, parse success/failure with error details |
**Results:**
- Quick mode (llama3.1:latest on roadrunner): 18s, confidence 0.85, risk score 65, 3 key findings — all structured fields populated
- Deep mode (llama3.1:70b on wile): 128185s, full analysis returned — JSON parsing succeeds when model produces valid JSON, falls back to plain text otherwise
---
## 18. Version Bump — 0.3.0 → 0.4.0 (February 20, 2026)
| File | Change |
|------|--------|
| `backend/app/config.py` | `APP_VERSION` 0.3.0 → 0.4.0 |
| `frontend/package.json` | `version` 0.1.0 → 0.4.0 |
---
## Files Modified This Session (Sections 1418)
| File | Change |
|------|--------|
| `backend/app/services/process_tree.py` | Added `selectinload(DatasetRow.dataset)` to `_fetch_rows` |
| `frontend/src/components/ProcessTree.tsx` | Complete rewrite: host dropdown, grid layout, overlay panel, Cytoscape DOM separation |
| `backend/app/services/llm_analysis.py` | Fixed dict response extraction, robust JSON parsing, reduced prompt, added timeout |
| `backend/app/api/routes/analysis.py` | Row limit 5000 → 2000 |
| `backend/app/config.py` | Version 0.3.0 → 0.4.0 |
| `frontend/package.json` | Version 0.1.0 → 0.4.0 |
## Deployment
```
docker compose up -d --build backend
# Backend: http://localhost:8000 (healthy, v0.4.0)
# Frontend: http://localhost:3000 (200 OK)
```