# ThreatHunt - Analyst-Assist Threat Hunting Platform A modern threat hunting platform with integrated analyst-assist agent guidance. Analyze CSV artifact data exported from Velociraptor with AI-powered suggestions for investigation directions, analytical pivots, and hypothesis formation. ## Overview ThreatHunt is a web application designed to help security analysts efficiently hunt for threats by: - Importing CSV artifacts from Velociraptor or other sources - Displaying data in an organized, queryable interface - Providing AI-powered guidance through an analyst-assist agent - Suggesting analytical directions, filters, and pivots - Highlighting anomalies and patterns of interest > **Agent Policy**: The analyst-assist agent provides read-only guidance only. It does not execute actions, escalate alerts, or modify data. All decisions remain with the analyst. ## Quick Start ### Docker (Recommended) ```bash # Clone and navigate git clone https://github.com/mblanke/ThreatHunt.git cd ThreatHunt # Configure provider (choose one) cp .env.example .env # Edit .env and set your LLM provider: # Option 1: Online (OpenAI, etc.) # THREAT_HUNT_AGENT_PROVIDER=online # THREAT_HUNT_ONLINE_API_KEY=sk-your-key # Option 2: Local (Ollama, GGML, etc.) # THREAT_HUNT_AGENT_PROVIDER=local # THREAT_HUNT_LOCAL_MODEL_PATH=/path/to/model # Option 3: Networked (Internal inference service) # THREAT_HUNT_AGENT_PROVIDER=networked # THREAT_HUNT_NETWORKED_ENDPOINT=http://service:5000 # Start services docker-compose up -d # Verify curl http://localhost:8000/api/agent/health curl http://localhost:3000 ``` Access at http://localhost:3000 ### Local Development **Backend**: ```bash cd backend python -m venv venv source venv/bin/activate # Windows: venv\Scripts\activate pip install -r requirements.txt # Configure provider export THREAT_HUNT_ONLINE_API_KEY=sk-your-key # OR set another provider env var # Run python run.py # API at http://localhost:8000/docs ``` **Frontend** (new terminal): ```bash cd frontend npm install npm start # App at http://localhost:3000 ``` ## Features ### Analyst-Assist Agent 🤖 - **Read-only guidance**: Explains data patterns and suggests investigation directions - **Context-aware**: Understands current dataset, host, and artifact type - **Pluggable providers**: Local, networked, or online LLM backends - **Transparent reasoning**: Explains logic with caveats and confidence scores - **Governance-compliant**: Strictly adheres to agent policy (no execution, no escalation) ### Chat Interface - Analyst asks questions about artifact data - Agent provides guidance with suggested pivots and filters - Conversation history for context continuity - Real-time typing and response indicators ### Data Management - Import CSV artifacts from Velociraptor - Browse and filter findings by severity, host, artifact type - Annotate findings with analyst notes - Track investigation progress ## Architecture ### Backend - **Framework**: FastAPI (Python 3.11) - **Agent Module**: Pluggable LLM provider interface - **API**: RESTful endpoints with OpenAPI documentation - **Structure**: Modular design with clear separation of concerns ### Frontend - **Framework**: React 18 with TypeScript - **Components**: Agent chat panel + analysis dashboard - **Styling**: CSS with responsive design - **State Management**: React hooks + Context API ### LLM Providers Supports three provider architectures: 1. **Local**: On-device or on-prem models (GGML, Ollama, vLLM) 2. **Networked**: Shared internal inference services 3. **Online**: External hosted APIs (OpenAI, Anthropic, Google) Auto-detection: Automatically uses the first available provider. ## Project Structure ``` ThreatHunt/ ├── backend/ │ ├── app/ │ │ ├── agents/ # Analyst-assist agent │ │ │ ├── core.py # ThreatHuntAgent class │ │ │ ├── providers.py # LLM provider interface │ │ │ ├── config.py # Configuration │ │ │ └── __init__.py │ │ ├── api/routes/ # API endpoints │ │ │ ├── agent.py # /api/agent/* routes │ │ │ ├── __init__.py │ │ ├── main.py # FastAPI app │ │ └── __init__.py │ ├── requirements.txt │ ├── run.py │ └── Dockerfile ├── frontend/ │ ├── src/ │ │ ├── components/ │ │ │ ├── AgentPanel.tsx # Chat interface │ │ │ └── AgentPanel.css │ │ ├── utils/ │ │ │ └── agentApi.ts # API communication │ │ ├── App.tsx │ │ ├── App.css │ │ ├── index.tsx │ │ └── index.css │ ├── public/index.html │ ├── package.json │ ├── tsconfig.json │ └── Dockerfile ├── docker-compose.yml ├── .env.example ├── .gitignore ├── AGENT_IMPLEMENTATION.md # Technical guide ├── INTEGRATION_GUIDE.md # Deployment guide ├── IMPLEMENTATION_SUMMARY.md # Overview ├── README.md # This file ├── ROADMAP.md └── THREATHUNT_INTENT.md ``` ## API Endpoints ### Agent Assistance - **POST /api/agent/assist** - Request guidance on artifact data - **GET /api/agent/health** - Check agent availability See full API documentation at http://localhost:8000/docs ## Configuration ### LLM Provider Selection Set via `THREAT_HUNT_AGENT_PROVIDER` environment variable: ```bash # Auto-detect (tries local → networked → online) THREAT_HUNT_AGENT_PROVIDER=auto # Local (on-device/on-prem) THREAT_HUNT_AGENT_PROVIDER=local THREAT_HUNT_LOCAL_MODEL_PATH=/models/model.gguf # Networked (internal service) THREAT_HUNT_AGENT_PROVIDER=networked THREAT_HUNT_NETWORKED_ENDPOINT=http://inference:5000 THREAT_HUNT_NETWORKED_KEY=api-key # Online (hosted API) THREAT_HUNT_AGENT_PROVIDER=online THREAT_HUNT_ONLINE_API_KEY=sk-your-key THREAT_HUNT_ONLINE_PROVIDER=openai THREAT_HUNT_ONLINE_MODEL=gpt-3.5-turbo ``` ### Agent Behavior ```bash THREAT_HUNT_AGENT_MAX_TOKENS=1024 THREAT_HUNT_AGENT_REASONING=true THREAT_HUNT_AGENT_HISTORY_LENGTH=10 THREAT_HUNT_AGENT_FILTER_SENSITIVE=true ``` See `.env.example` for all configuration options. ## Governance & Compliance This implementation strictly follows governance principles: - ✅ **Agents assist analysts** - No autonomous execution - ✅ **No tool execution** - Agent provides guidance only - ✅ **No alert escalation** - Analyst controls alerts - ✅ **No data modification** - Read-only analysis - ✅ **Transparent reasoning** - Explains guidance with caveats - ✅ **Analyst authority** - All decisions remain with analyst **References**: - `goose-core/governance/AGENT_POLICY.md` - `goose-core/governance/AI_RULES.md` - `THREATHUNT_INTENT.md` ## Documentation - **[AGENT_IMPLEMENTATION.md](AGENT_IMPLEMENTATION.md)** - Detailed technical architecture - **[INTEGRATION_GUIDE.md](INTEGRATION_GUIDE.md)** - Deployment and configuration - **[IMPLEMENTATION_SUMMARY.md](IMPLEMENTATION_SUMMARY.md)** - Feature overview ## Testing the Agent ### Check Health ```bash curl http://localhost:8000/api/agent/health ``` ### Test API ```bash curl -X POST http://localhost:8000/api/agent/assist \ -H "Content-Type: application/json" \ -d '{ "query": "What patterns suggest suspicious activity?", "dataset_name": "FileList", "artifact_type": "FileList", "host_identifier": "DESKTOP-ABC123" }' ``` ### Use UI 1. Open http://localhost:3000 2. Enter a question in the agent panel 3. View guidance with suggested pivots and filters ## Troubleshooting ### Agent Unavailable (503) - Check environment variables for provider configuration - Verify LLM provider is accessible - See logs: `docker-compose logs backend` ### No Frontend Response - Verify backend health: `curl http://localhost:8000/api/agent/health` - Check browser console for errors - See logs: `docker-compose logs frontend` See [INTEGRATION_GUIDE.md](INTEGRATION_GUIDE.md) for detailed troubleshooting. ## Development ### Running Tests ```bash cd backend pytest cd ../frontend npm test ``` ### Building Images ```bash docker-compose build ``` ### Logs ```bash docker-compose logs -f backend docker-compose logs -f frontend ``` ## Security Notes For production deployment: 1. Add authentication to API endpoints 2. Enable HTTPS/TLS 3. Implement rate limiting 4. Filter sensitive data before LLM 5. Add audit logging 6. Use secrets management for API keys See [INTEGRATION_GUIDE.md](INTEGRATION_GUIDE.md#security-notes) for details. ## Future Enhancements - [ ] Integration with actual CVE databases - [ ] Fine-tuned models for cybersecurity domain - [ ] Structured output from LLMs (JSON mode) - [ ] Feedback loop on guidance quality - [ ] Multi-modal support (images, documents) - [ ] Compliance reporting and audit trails - [ ] Performance optimization and caching ## Contributing Follow the architecture and governance principles in `goose-core`. All changes must: - Adhere to agent policy (read-only, advisory only) - Conform to shared terminology in goose-core - Include appropriate documentation - Pass tests and lint checks ## License See LICENSE file ## Support For issues or questions: 1. Check [INTEGRATION_GUIDE.md](INTEGRATION_GUIDE.md) 2. Review [AGENT_IMPLEMENTATION.md](AGENT_IMPLEMENTATION.md) 3. See API docs at http://localhost:8000/docs 4. Check backend logs for errors ## Getting Started ### Prerequisites - Docker and Docker Compose - Python 3.11+ (for local development) - Node.js 18+ (for local development) ### Quick Start with Docker 1. Clone the repository: ```bash git clone https://github.com/mblanke/ThreatHunt.git cd ThreatHunt ``` 2. Start all services: ```bash docker-compose up -d ``` 3. Access the application: - Frontend: http://localhost:3000 - Backend API: http://localhost:8000 - API Documentation: http://localhost:8000/docs ### Local Development #### Backend ```bash cd backend python -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate pip install -r requirements.txt # Set up environment variables cp .env.example .env # Edit .env with your settings # Run migrations alembic upgrade head # Start development server uvicorn app.main:app --reload ``` #### Frontend ```bash cd frontend npm install npm start ``` ## API Endpoints ### Authentication - `POST /api/auth/register` - Register a new user - `POST /api/auth/login` - Login and receive JWT token - `GET /api/auth/me` - Get current user profile - `PUT /api/auth/me` - Update current user profile ### User Management (Admin only) - `GET /api/users` - List all users in tenant - `GET /api/users/{user_id}` - Get user by ID - `PUT /api/users/{user_id}` - Update user - `DELETE /api/users/{user_id}` - Deactivate user ### Tenants - `GET /api/tenants` - List tenants - `POST /api/tenants` - Create tenant (admin) - `GET /api/tenants/{tenant_id}` - Get tenant by ID ### Hosts - `GET /api/hosts` - List hosts (scoped to tenant) - `POST /api/hosts` - Create host - `GET /api/hosts/{host_id}` - Get host by ID ### Ingestion - `POST /api/ingestion/ingest` - Upload and parse CSV files exported from Velociraptor ### VirusTotal - `POST /api/vt/lookup` - Lookup hash in VirusTotal ## Authentication Flow 1. User registers or logs in via `/api/auth/login` 2. Backend returns JWT token with user_id, tenant_id, and role 3. Frontend stores token in localStorage 4. All subsequent API requests include token in Authorization header 5. Backend validates token and enforces tenant scoping ## Multi-Tenancy - All data is scoped to tenant_id - Users can only access data within their tenant - Admin users have elevated permissions within their tenant - Cross-tenant access requires explicit permissions ## Database Migrations Create a new migration: ```bash cd backend alembic revision --autogenerate -m "Description of changes" ``` Apply migrations: ```bash alembic upgrade head ``` Rollback migrations: ```bash alembic downgrade -1 ``` ## Environment Variables ### Backend - `DATABASE_URL` - PostgreSQL connection string - `SECRET_KEY` - Secret key for JWT signing (min 32 characters) - `ACCESS_TOKEN_EXPIRE_MINUTES` - JWT token expiration time (default: 30) - `VT_API_KEY` - VirusTotal API key for hash lookups ### Frontend - `REACT_APP_API_URL` - Backend API URL (default: http://localhost:8000) ## Security - Passwords are hashed using bcrypt - JWT tokens include expiration time - All API endpoints (except login/register) require authentication - Role-based access control for admin operations - Data isolation through tenant scoping ## Testing ### Backend ```bash cd backend pytest ``` ### Frontend ```bash cd frontend npm test ``` ## Contributing 1. Fork the repository 2. Create a feature branch 3. Make your changes 4. Submit a pull request ## License [Your License Here] ## Support For issues and questions, please open an issue on GitHub.