Files
ThreatHunt/docs/update.md
2026-02-20 14:32:42 -05:00

27 KiB
Raw Blame History

ThreatHunt — Session Update (February 1920, 2026)

Version: 0.4.0

Summary

This document covers feature improvements and bug fixes across the ThreatHunt platform.


1. Network Map — Clickable Type Filters (IP / Host / Domain / URL)

Request: "IP, Host, Domain, URL on the network map… can I click on these and they would be filtered in or out"

Changes (NetworkMap.tsx):

  • Legend chips (IP, Host, Domain, URL) are now clickable toggle filters
  • Added visibleTypes state (Set<NodeType>) and filteredGraph useMemo that filters nodes + edges by visible types
  • Active chips: filled background with type color, fully opaque
  • Inactive chips: outlined with type color, dimmed at 50% opacity
  • Each chip shows node count for that type, e.g. IP (42)
  • At least one type must stay visible (can't disable all four)
  • Stats chips (nodes/edges) update to reflect the filtered view
  • All mouse handlers (pan, hover, click, wheel zoom) updated to use filteredGraph instead of raw graph

2. Network Map — Cleaner Nodes (Brighter Colors, 20% Smaller)

Request: "Clean up the icons, the colours are dull and the icons are too big in the map… shrink by 20%"

Changes (NetworkMap.tsx):

  • Colors bumped to more saturated variants:
    • IP: #60a5fa#3b82f6
    • Host: #10b981#22c55e
    • Domain: #f59e0b#eab308
    • URL: #a78bfa#8b5cf6
  • Node radius shrunk ~20%:
    • Max: 22 → 18
    • Base: 5 → 4
    • Multiplier: 2.0 → 1.6

3. Dataset Viewer — IOC Column Highlighting

Request: "I'm looking at one of the datasets and 4 IOCs are showing… but I can't see it… is there a way to highlight that"

Changes (DatasetViewer.tsx):

  • IOC columns in the DataGrid are now visually highlighted
  • Header: colored background tint + bold colored text + IOC type label (e.g. src_ip ◆ IP)
  • Cells: subtle colored background + left border stripe
  • Color-coded by IOC type matching the Network Map palette:
    • IP — blue (#3b82f6)
    • Hostname — green (#22c55e)
    • Domain — amber (#eab308)
    • URL — purple (#8b5cf6)
    • Hashes (MD5/SHA1/SHA256) — rose (#f43f5e)
  • Added IOC_COLORS mapping, iocTypeFor() helper, and dynamic headerClassName/cellClassName on DataGrid columns
  • CSS-in-JS styles injected via DataGrid sx prop

4. AUP Scanner — "Social Media (Personal)" → "Social Media" Rename

Request: "On the AUP page there is Social Media (Personal) — can we remove personal, it's messing up with the formatting"

Changes (keyword_defaults.py):

  • Default theme key renamed from "Social Media (Personal)" to "Social Media"
  • Added rename migration in seed_defaults(): checks for old name in DB, if found renames via SQL UPDATE + commit before normal seeding
  • Backend log confirmed: Renamed AUP theme 'Social Media (Personal)' → 'Social Media'

5. AUP Scanner — Hunt Dropdown (previous session, deployed)

  • Replaced individual dataset checkboxes with a hunt selector dropdown
  • Selecting a hunt auto-loads and selects all its datasets
  • Shows dataset/row counts below the dropdown

6. Network Map — Hunt-Scoped Interactive Map (previous session, deployed)

  • Hunt selector dropdown loads only that hunt's datasets
  • Enriched nodes with hostname, IP, OS metadata
  • Click-to-inspect MUI Popover showing node details
  • Zoom (wheel + buttons) and pan (drag) with full viewport transform
  • Force-directed layout with co-occurrence edges

7. Agent Assist — "Failed to Fetch" Diagnosis

Request: "AI assist failed: Error: Failed to fetch"

Diagnosis:

  • Backend agent endpoint works correctly (tested via PowerShell Invoke-RestMethod through nginx proxy)
  • Health endpoint healthy — both LLM nodes (Wile + Roadrunner) available
  • Extra fields sent by frontend (mode, hunt_id, conversation_id) are accepted by Pydantic v2 (ignored, not rejected)
  • "Failed to fetch" was a transient browser-level network error, not a backend issue
  • Response time ~5s from LLM — within nginx 120s proxy timeout
  • Resolution: Hard refresh (Ctrl+Shift+R) resolves the issue

8. Performance Fix — /api/hunts Timeout (previous session)

  • Root cause: Dataset.rows relationship had lazy="selectin" causing SQLAlchemy to cascade-load every DatasetRow when listing hunts
  • Fix: Changed Dataset.rows and DatasetRow.annotations to lazy="noload" in backend/app/db/models.py
  • Result: Hunts endpoint returns instantly

Files Modified This Session

File Change
frontend/src/components/NetworkMap.tsx Type filter toggles, brighter colors, smaller nodes
frontend/src/components/DatasetViewer.tsx IOC column highlighting in DataGrid
backend/app/services/keyword_defaults.py Theme rename + DB migration

Deployment

All changes built and deployed via Docker Compose:

docker compose build --no-cache frontend
docker compose up -d frontend
docker compose build --no-cache backend
docker compose up -d backend

Git

Committed and pushed to GitHub (main branch):

92 files changed, 13050 insertions(+), 1097 deletions(-)
d0c9f88..9b98ab9  main -> main

9. Network Picture — Deduplicated Host Inventory (February 20, 2026)

Request: "While files are being dropped in, give me a clean clear network picture — like netstat. No 100 iterations of the same computer. Clean pic: hostname, IP, and any users logged into that machine."

What It Does

New Net Picture page that scans all datasets in a hunt and produces a one-row-per-host inventory table. If a host appears in 900 netstat rows, it shows once — with all unique IPs, usernames, OS versions, MACs, and ports aggregated via server-side set deduplication. Supports networks with up to 1000 hosts; no artificial caps on unique values per host.

Backend

New servicebackend/app/services/network_inventory.py:

  • build_network_picture(db, hunt_id) streams all dataset rows in batches of 1000
  • Groups rows by hostname (case-insensitive), falls back to src_ip/ip_address if no hostname column
  • Per host, aggregates into Python set()s: IPs, users, OS, MAC addresses, protocols, open ports, remote targets, dataset sources
  • Tracks connection_count (total rows), first_seen/last_seen from timestamp columns
  • Returns sorted by connection count descending

New API routebackend/app/api/routes/network.py:

  • GET /api/network/picture?hunt_id={id}NetworkPictureResponse
  • Response: hosts[] (each with hostname, ips, users, os, mac_addresses, protocols, open_ports, remote_targets, datasets, connection_count, first_seen, last_seen) + summary (total_hosts, total_connections, total_unique_ips, datasets_scanned)

Normalizer additionsbackend/app/services/normalizer.py:

  • Added mac_address canonical mapping: mac, mac_address, physical_address, mac_addr, hw_addr, ethernet
  • Added connection_state canonical mapping: state, status, tcp_state, conn_state

Frontend

New componentfrontend/src/components/NetworkPicture.tsx:

  • Hunt selector dropdown → loads network picture from backend
  • MUI Table with sortable columns: Hostname, IPs, Users, OS, MAC, Connections, Ports
  • ChipList sub-component: shows first 5 values inline as coloured Chips; "+N more" badge expands to show all (no data hidden, just visual overflow control)
  • Click any row → expand panel showing: remote targets, all open ports, protocols, MAC addresses, dataset sources, time range
  • Search bar filters by hostname, IP, user, OS, or MAC
  • Summary stat cards: total hosts, unique IPs, connections, datasets scanned
  • Colour palette matches Network Map: IP blue, User green, OS amber, MAC purple, Port rose

App integrationfrontend/src/App.tsx:

  • New nav item: Net Picture with DevicesIcon, route /netpicture
  • Import and route for NetworkPicture component

API clientfrontend/src/api/client.ts:

  • Added HostEntry, PictureSummary, NetworkPictureResponse interfaces
  • Added network.picture(huntId) method

Files Modified

File Change
backend/app/services/normalizer.py Added mac_address + connection_state mappings
backend/app/services/network_inventory.py New — host aggregation service
backend/app/api/routes/network.py New/api/network/picture endpoint
backend/app/main.py Registered network router
frontend/src/api/client.ts Added network picture types + API method
frontend/src/components/NetworkPicture.tsx New — host inventory table component
frontend/src/App.tsx Added Nav item + route for Net Picture

10. Test Data — Velociraptor Mock Network CSVs (February 20, 2026)

Request: "Create 10-15 different CSV files you'd expect from Velociraptor for a mock network with 50-100 hosts and random network traffic with a few keywords that would trigger the AUP."

What Was Created

12 realistic Velociraptor-style CSV files in backend/tests/test_csvs/, generated by a Python script (generate_test_csvs.py) with deterministic seed for reproducibility.

Mock Network:

  • 82 hosts: 75 workstations (IT-WS, HR-WS, FIN-WS, SLS-WS, ENG-WS, LEG-WS, MKT-WS, EXEC-WS) + 7 servers (DC-01, DC-02, FILE-01, EXCH-01, WEB-01, SQL-01, PROXY-01)
  • 14 users: 10 named users (jsmith, agarcia, bwilson, etc.) + 4 service accounts
  • 3 subnets: 10.10.1.x, 10.10.2.x, 10.10.3.x
  • Domain: acme.local
  • Time range: Feb 1020, 2026

Files

# File Rows Velociraptor Artifact
1 01_netstat_connections.csv 2,012 Windows.Network.Netstat
2 02_dns_queries.csv 2,964 Windows.Network.DNS
3 03_process_listing.csv 1,896 Windows.System.Pslist
4 04_network_interfaces.csv 94 Windows.Network.Interfaces
5 05_logged_in_users.csv 171 Windows.Sys.Users
6 06_scheduled_tasks.csv 410 Windows.System.TaskScheduler
7 07_browser_history.csv 1,586 Windows.Application.Chrome.History
8 08_sysmon_network.csv 2,517 Sysmon Event ID 3
9 09_autoruns.csv 342 Windows.Sys.AutoRuns
10 10_logon_events.csv 1,604 Windows Security (4624/4625)
11 11_proxy_logs.csv 2,605 Web proxy / filter
12 12_file_listing.csv 421 Windows.Search.FileFinder

AUP Triggers Embedded

Category Examples
Gambling bet365.com, pokerstars.com, draftkings.com
Gaming steam.exe, discord.exe, steamcommunity.com, epicgameslauncher.exe
Streaming netflix.com, hulu.com, spotify.exe, open.spotify.com
Downloads / Piracy thepiratebay.org, 1337x.to, utorrent.exe, qbittorrent.exe, free_movie_2026.torrent
Adult Content pornhub.com, onlyfans.com, xvideos.com
Social Media facebook.com, tiktok.com, reddit.com
Job Search indeed.com, glassdoor.com, linkedin.com/jobs
Shopping amazon.com, ebay.com, shein.com

AUP keywords appear in: DNS queries (~12%), browser history (~15%), proxy logs (~10%), process listing (~15% of hosts), autoruns (~20% of workstations), sysmon network (~10%), file listing (crack_photoshop.exe, keygen_v2.exe, free_movie_2026.torrent).

Files Added

File Purpose
backend/tests/generate_test_csvs.py New — deterministic CSV generator script
backend/tests/test_csvs/*.csv (×12) New — mock Velociraptor test data

11. Phase 8 — Analyzer Framework & Alerts (February 20, 2026)

What It Does

Automated alerting engine with 6 built-in analyzers that scan dataset rows for suspicious activity and produce scored, MITRE-tagged alerts. Full alert lifecycle (new → acknowledged → in-progress → resolved / false-positive), bulk operations, and configurable alert rules.

Backend

New servicebackend/app/services/analyzers.py (~350 lines):

  • BaseAnalyzer ABC with name, description, and async analyze(rows) method
  • 6 built-in analyzers:
    • EntropyAnalyzer — Shannon entropy on command_line/url/path fields, threshold 4.5, maps to T1027 (Obfuscated Files or Information)
    • SuspiciousCommandAnalyzer — 19 regex patterns (mimikatz, encoded PowerShell, schtasks, psexec, vssadmin, etc.) with per-pattern MITRE mappings
    • NetworkAnomalyAnalyzer — Beaconing detection (dst IP frequency), suspicious ports (4444, 5555, etc.), large transfers (>10 MB), maps to T1071/T1048/T1571
    • FrequencyAnomalyAnalyzer — Flags values <1% occurrence in process_name/username/event_type fields
    • AuthAnomalyAnalyzer — Brute force (>5 failed logins per user), unusual logon types (3=network, 10=RDP), maps to T1110/T1021
    • PersistenceAnalyzer — Registry Run keys, services, Winlogon, IFEO patterns, maps to T1547/T1543/T1546
  • run_all_analyzers(rows, analyzers?) — runs all or selected analyzers, returns sorted AlertCandidate list

New API routebackend/app/api/routes/alerts.py (~300 lines):

  • GET /api/alerts — list with filters (status, severity, analyzer, hunt_id, dataset_id)
  • GET /api/alerts/stats — severity/status/analyzer/MITRE breakdowns
  • GET /api/alerts/{id}, PUT /api/alerts/{id}, DELETE /api/alerts/{id}
  • POST /api/alerts/bulk-update — bulk status changes on selected alert IDs
  • GET /api/alerts/analyzers/list — available analyzer metadata
  • POST /api/alerts/analyze — run analyzers on a dataset/hunt, auto-creates Alert records
  • Alert Rules CRUD: GET /api/alerts/rules/list, POST /api/alerts/rules, PUT /api/alerts/rules/{id}, DELETE /api/alerts/rules/{id}

New ORM modelsbackend/app/db/models.py:

  • Alert — 18 fields (id, title, description, severity, status, analyzer, score, evidence JSON, mitre_technique, tags JSON, hunt_id FK, dataset_id FK, case_id FK, assignee, acknowledged_at, resolved_at, timestamps; indexes on severity/status/hunt/dataset)
  • AlertRule — 10 fields (id, name, description, analyzer, config JSON, severity_override, enabled, hunt_id FK, timestamps; index on analyzer)

Alembic migrationbackend/alembic/versions/b4c2d3e5f6a7_add_alerts_and_alert_rules.py

Frontend

New componentfrontend/src/components/AlertPanel.tsx (~350 lines):

  • Alerts tab: MUI DataGrid with severity/status chips, score, MITRE technique; checkbox selection for bulk actions (acknowledge / resolve / false-positive); click → detail dialog with evidence JSON viewer
  • Stats tab: Cards for total, per-severity counts, status/analyzer breakdowns, top MITRE techniques
  • Rules tab: Rule cards with enable/disable switch, delete; create rule dialog with analyzer selector
  • Selector bar: Hunt + Dataset pickers, "Run Analyzers" button, severity/status filters

API clientfrontend/src/api/client.ts:

  • Added AlertData, AlertStats, AlertRuleData, AnalyzerInfo, AnalyzeResult interfaces
  • Added alerts namespace with full CRUD + analyze + rules methods

App integrationfrontend/src/App.tsx:

  • New nav item: Alerts with NotificationsActiveIcon, route /alerts

Files

File Change
backend/app/services/analyzers.py New — 6 analyzers + runner
backend/app/api/routes/alerts.py New — alerts API (CRUD, stats, bulk, rules)
backend/app/db/models.py Added Alert + AlertRule models
backend/alembic/versions/b4c2d3e5f6a7_...py New — alerts migration
frontend/src/components/AlertPanel.tsx New — alerts dashboard
frontend/src/api/client.ts Added alert types + alerts namespace
frontend/src/App.tsx Added Alerts nav + route

12. Phase 9 — Investigation Notebooks & Playbooks (February 20, 2026)

What It Does

Cell-based investigation notebooks (markdown / query / code) for documenting hunts, plus a playbook engine with 4 built-in response templates and step-by-step execution tracking.

Backend

New servicebackend/app/services/playbook.py (~250 lines):

  • NotebookCell dataclass + validate_notebook_cells() helper
  • 4 built-in playbook templates:
    • Suspicious Process Investigation (6 steps): identify → process tree → network → analyzers → MITRE → document
    • Lateral Movement Hunt (5 steps): remote tools → auth anomaly → network anomaly → knowledge graph → escalate
    • Data Exfiltration Check (5 steps): large transfers → DNS → timeline → correlate → MITRE + document
    • Ransomware Triage (5 steps): indicators search → all analyzers → persistence → LLM deep → critical case
  • Each step: order, title, description, action, action_config, expected_outcome

New API routebackend/app/api/routes/notebooks.py (~280 lines):

  • Notebook CRUD: GET /api/notebooks, GET /api/notebooks/{id}, POST /api/notebooks, PUT /api/notebooks/{id}, DELETE /api/notebooks/{id}
  • Cell operations: POST /api/notebooks/{id}/cells/upsert, DELETE /api/notebooks/{id}/cells/{cell_id}
  • Playbook endpoints: GET /api/notebooks/playbooks/templates, GET /api/notebooks/playbooks/templates/{name}, POST /api/notebooks/playbooks/start, GET /api/notebooks/playbooks/runs, GET /api/notebooks/playbooks/runs/{id}, POST /api/notebooks/playbooks/runs/{id}/complete-step, POST /api/notebooks/playbooks/runs/{id}/abort

New ORM modelsbackend/app/db/models.py:

  • Notebook — 10 fields (id, title, description, cells JSON, hunt_id FK, case_id FK, owner_id FK, tags JSON, timestamps; index on hunt_id)
  • PlaybookRun — 12 fields (id, playbook_name, status, current_step, total_steps, step_results JSON, hunt_id FK, case_id FK, started_by, timestamps, completed_at; indexes on hunt_id/status)

Alembic migrationbackend/alembic/versions/c5d3e4f6a7b8_add_notebooks_and_playbook_runs.py

Frontend

New componentfrontend/src/components/InvestigationNotebook.tsx (~280 lines):

  • List view: Grid of notebook cards with title, description, cell count, tags, open/delete
  • Detail view: Cell-based editor with markdown/query/code cell types
  • Markdown cells render via ReactMarkdown + remarkGfm; code/query cells as monospace pre blocks
  • Edit mode: multiline TextField with Ctrl+S save shortcut
  • Cell operations: add (markdown/query/code), edit, save, delete, move up/down

New componentfrontend/src/components/PlaybookManager.tsx (~280 lines):

  • Templates tab: Grid of template cards with category chips, tags, step count; view detail (MUI Stepper showing all steps) or start new run
  • Runs tab: List of active/completed runs with progress bars
  • Active run view: Vertical MUI Stepper with step completion, notes field, skip/abort, LinearProgress

API clientfrontend/src/api/client.ts:

  • Added NotebookCell, NotebookData, PlaybookTemplate, PlaybookStep, PlaybookTemplateDetail, PlaybookRunData interfaces
  • Added notebooks namespace (CRUD + cell ops) and playbooks namespace (templates, runs, step-complete, abort)

App integrationfrontend/src/App.tsx:

  • New nav items: Notebooks (MenuBookIcon, /notebooks) and Playbooks (PlaylistPlayIcon, /playbooks)

Files

File Change
backend/app/services/playbook.py New — 4 playbook templates + cell validation
backend/app/api/routes/notebooks.py New — notebooks + playbooks API
backend/app/db/models.py Added Notebook + PlaybookRun models
backend/alembic/versions/c5d3e4f6a7b8_...py New — notebooks migration
frontend/src/components/InvestigationNotebook.tsx New — cell-based notebook editor
frontend/src/components/PlaybookManager.tsx New — playbook template browser + run stepper
frontend/src/api/client.ts Added notebook + playbook types + namespaces
frontend/src/App.tsx Added Notebooks + Playbooks nav + routes

13. Docker Build Fixes (February 20, 2026)

Several issues were resolved to get the full 22-page frontend and updated backend deployed in Docker Compose.

npm / TypeScript Fixes

Issue Fix
npm ci failing: "Missing: yaml@2.8.2 from lock file" Installed yaml@2postcss-load-config (from tailwindcss) required ^2.4.2 but only 1.10.2 was present
GridRowSelectionModel cast to string[] MUI X DataGrid v8 changed to { type, ids: Set } — fixed with Array.from(model.ids)
Recharts formatter type mismatch Changed explicit number/string params to any in Dashboard.tsx
Missing declarations for cytoscape-dagre / cytoscape-cola Created frontend/src/declarations.d.ts
Missing HuntOut type export Added export type HuntOut = Hunt alias in client.ts
cytoscape.Stylesheet no longer exists Renamed to cytoscape.StylesheetStyle in ProcessTree, StorylineGraph, KnowledgeGraph

Backend Startup Fixes

Issue Fix
init_db() crash: "table cases already exists" Rewrote to inspect existing tables and only create missing ones
Alembic migration replay on startup DB had all tables (from init_db()) but Alembic was stamped at 98ab619418bc — ran alembic stamp head to sync to c5d3e4f6a7b8

Files Modified

File Change
frontend/package.json / package-lock.json Added yaml@2 dependency
frontend/src/declarations.d.ts New — ambient module declarations
frontend/src/api/client.ts Added HuntOut type alias
frontend/src/components/AlertPanel.tsx Fixed GridRowSelectionModel handling
frontend/src/components/Dashboard.tsx Fixed Recharts formatter types
frontend/src/components/ProcessTree.tsx StylesheetStylesheetStyle
frontend/src/components/StorylineGraph.tsx StylesheetStylesheetStyle
frontend/src/components/KnowledgeGraph.tsx StylesheetStylesheetStyle
backend/app/db/engine.py Safe init_db() with inspector

Current Navigation (22 pages)

Dashboard · Hunts · Datasets · Upload · Agent · Analysis · Annotations · Hypotheses · Correlation · Network Map · Net Picture · Proc Tree · Storyline · Timeline · Search · MITRE Map · Knowledge · Cases · Alerts · Notebooks · Playbooks · AUP Scanner

Deployment

docker compose build
docker compose up -d
# Backend: http://localhost:8000 (healthy)
# Frontend: http://localhost:3000 (200 OK)

Alembic Migration Chain

9790f482da06  (initial schema)
    ↓
98ab619418bc  (keyword themes + keywords)
    ↓
a3b1c2d4e5f6  (cases + activity logs)
    ↓
b4c2d3e5f6a7  (alerts + alert rules)
    ↓
c5d3e4f6a7b8  (notebooks + playbook runs)  ← HEAD

14. Process Tree — HTTP 500 Fix (February 20, 2026)

Request: "process tree: HTTP 500"

Root Cause: MissingGreenlet error in _fetch_rows — async SQLAlchemy lazy-loading the dataset relationship on DatasetRow outside a greenlet context.

Fix (backend/app/services/process_tree.py):

  • Added selectinload(DatasetRow.dataset) to the _fetch_rows query so the relationship is eagerly loaded within the async session
  • Endpoint now returns data correctly (verified: 3,908 processes for full hunt)

15. Process Tree — UX Rewrite (February 20, 2026)

Request: "proctree view is horrible… there are 3908 processes and I can't see / zoom in to see and when I do click on one the resolution changes to show the data on the right. Ideally I'd like another dropdown to pick a host."

Complete rewrite of frontend/src/components/ProcessTree.tsx (~520 lines):

  • Host dropdown: Autocomplete populated by extracting unique hostnames from tree data (82 hosts)
  • Server-side hostname filtering: API ?hostname= param reduces returned data per view
  • Grid layout fallback: When edges < 10% of nodes (e.g. test data with no parent-child relationships), uses grid layout instead of dagre
  • Overlay detail panel: Absolute-positioned panel on click — no longer reflows the graph
  • ResizeObserver: Keeps Cytoscape canvas in sync with container size changes
  • Search/highlight: Filter processes by name or PID
  • Zoom controls: Fit, zoom-in, zoom-out buttons

16. Process Tree — White Screen Crash Fix (February 20, 2026)

Request: "I picked a different host and the screen went white"

Root Cause: Cytoscape's destroy() removed <canvas> elements that were children of the same DOM node React was managing, causing NotFoundError: Failed to execute 'removeChild' on 'Node' when React tried to reconcile.

Fix (frontend/src/components/ProcessTree.tsx):

  • Separated Cytoscape container into a plain <div> with zero React children
  • Loading spinner and placeholder text are now sibling overlays (not children of the Cytoscape div)
  • Added explicit cyRef.current = null after destroy()
  • Cleanup on unmount prevents stale references
  • Verified: switching between hosts (DC-01 → EXEC-WS-039 → ENG-WS-052) works with 0 console errors

17. LLM Analysis — Response Parsing Fix (February 20, 2026)

Request: "llm analysis: HTTP 500 — searching for evidence of adult site access"

Root Cause Chain:

  1. AttributeError: 'dict' object has no attribute 'strip'OllamaProvider.generate() returns a dict ({"response": "<llm text>", "model": ..., ...}), but _parse_analysis expected a string.

  2. Wrong dict key extraction — Initial dict-handling fix looked for raw.get("analysis") but Ollama's /api/generate endpoint puts the LLM text in the "response" key.

  3. JSON parsing failures — The 70B model sometimes produces slightly malformed JSON (trailing commas, etc.) that json.loads rejects.

Fixes (backend/app/services/llm_analysis.py):

Change Detail
Dict response extraction raw.get("response") instead of raw.get("analysis") — correctly extracts LLM text from Ollama API dict
Robust JSON parsing New _extract_json_candidates() generator: tries full text, then outermost {…} block, then trailing-comma-fixed variant
Reduced prompt size max_sample 50 → 20 rows, max_chars 8000 → 6000
Reduced row limit /llm-analyze endpoint loads 2000 rows (was 5000)
Timeout asyncio.wait_for(..., timeout=300) wraps the LLM call — returns graceful "timed out" message instead of hanging
Debug logging _parse_analysis logs extraction source, text length, parse success/failure with error details

Results:

  • Quick mode (llama3.1:latest on roadrunner): 18s, confidence 0.85, risk score 65, 3 key findings — all structured fields populated
  • Deep mode (llama3.1:70b on wile): 128185s, full analysis returned — JSON parsing succeeds when model produces valid JSON, falls back to plain text otherwise

18. Version Bump — 0.3.0 → 0.4.0 (February 20, 2026)

File Change
backend/app/config.py APP_VERSION 0.3.0 → 0.4.0
frontend/package.json version 0.1.0 → 0.4.0

Files Modified This Session (Sections 1418)

File Change
backend/app/services/process_tree.py Added selectinload(DatasetRow.dataset) to _fetch_rows
frontend/src/components/ProcessTree.tsx Complete rewrite: host dropdown, grid layout, overlay panel, Cytoscape DOM separation
backend/app/services/llm_analysis.py Fixed dict response extraction, robust JSON parsing, reduced prompt, added timeout
backend/app/api/routes/analysis.py Row limit 5000 → 2000
backend/app/config.py Version 0.3.0 → 0.4.0
frontend/package.json Version 0.1.0 → 0.4.0

Deployment

docker compose up -d --build backend
# Backend: http://localhost:8000 (healthy, v0.4.0)
# Frontend: http://localhost:3000 (200 OK)