ThreatHunt/docs/AGENT_IMPLEMENTATION.md

# ThreatHunt Analyst-Assist Agent Implementation

## Overview

This implementation adds an analyst-assist agent to ThreatHunt that provides read-only guidance on CSV artifact data, analytical pivots, and hypotheses. The agent strictly adheres to the governance principles defined in `goose-core/governance/AGENT_POLICY.md`.

## Architecture

### Backend Stack
- **Framework**: FastAPI (Python 3.11)
- **Agent Module**: `backend/app/agents/`
  - `core.py`: ThreatHuntAgent class with guidance logic
  - `providers.py`: Pluggable LLM provider interface
  - `config.py`: Configuration management

### Frontend Stack
- **Framework**: React with TypeScript
- **Components**: AgentPanel chat interface
- **Styling**: CSS with responsive design

### API Endpoint
- **POST /api/agent/assist**: Request analyst guidance
- **GET /api/agent/health**: Check agent availability

## LLM Provider Architecture

The agent supports three provider types, selectable via configuration:

### 1. Local Provider
**Use Case**: On-device or on-premise models

Environment variables:
```bash
THREAT_HUNT_AGENT_PROVIDER=local
THREAT_HUNT_LOCAL_MODEL_PATH=/path/to/model.gguf
```

Supported frameworks:
- llama-cpp-python (GGML models)
- Ollama API
- vLLM
- Other local inference engines

### 2. Networked Provider
**Use Case**: Shared internal inference services

Environment variables:
```bash
THREAT_HUNT_AGENT_PROVIDER=networked
THREAT_HUNT_NETWORKED_ENDPOINT=http://inference-service:5000
THREAT_HUNT_NETWORKED_KEY=api-key-here
```

Supported architectures:
- Internal inference service API
- LLM inference container clusters
- Enterprise inference gateways

### 3. Online Provider
**Use Case**: External hosted APIs

Environment variables:
```bash
THREAT_HUNT_AGENT_PROVIDER=online
THREAT_HUNT_ONLINE_API_KEY=sk-your-api-key
THREAT_HUNT_ONLINE_PROVIDER=openai
THREAT_HUNT_ONLINE_MODEL=gpt-3.5-turbo
```

Supported providers:
- OpenAI (GPT-3.5, GPT-4)
- Anthropic Claude
- Google Gemini
- Other hosted LLM services

### Auto Provider Selection
Set `THREAT_HUNT_AGENT_PROVIDER=auto` to automatically use the first available provider:
1. Local (if model path exists)
2. Networked (if endpoint is configured)
3. Online (if API key is set)

## Backend Implementation

### Agent Request/Response Flow

**Request** (AgentContext):
```python
{
    "query": "What patterns suggest suspicious file modifications?",
    "dataset_name": "FileList-2025-12-26",
    "artifact_type": "FileList",
    "host_identifier": "DESKTOP-ABC123",
    "data_summary": "File listing from system scan",
    "conversation_history": [...]
}
```

**Response** (AgentResponse):
```python
{
    "guidance": "Based on the files listed, ...",
    "confidence": 0.8,
    "suggested_pivots": ["Analyze temporal patterns", "Cross-reference with IOCs"],
    "suggested_filters": ["Filter by modification time", "Sort by file size"],
    "caveats": "Guidance is based on available data context...",
    "reasoning": "Analysis generated based on patterns..."
}
```

### Governance Enforcement

The agent is designed with hard constraints to ensure compliance:

1. **Read-Only**: Agent accepts context data but cannot:
   - Execute tools or actions
   - Modify database or schema
   - Escalate findings to alerts
   - Access external systems

2. **Advisory Only**: All guidance is clearly marked as:
   - Suggestions, not directives
   - Confidence-rated
   - Accompanied by caveats
   - Attributed to the agent

3. **Analyst Control**: The UI emphasizes:
   - Agent provides guidance only
   - Analysts retain all decision-making authority
   - All next steps require analyst action

## Frontend Implementation

### AgentPanel Component

Located in `frontend/src/components/AgentPanel.tsx`:

**Features**:
- Chat-style interface for analyst questions
- Context display showing current dataset/host/artifact
- Rich response formatting with:
  - Main guidance text
  - Suggested analytical pivots (clickable)
  - Suggested data filters
  - Confidence scores
  - Caveats and assumptions
  - Reasoning explanation
- Conversation history for context
- Responsive design (desktop and mobile)
- Loading states and error handling

**Props**:
```typescript
interface AgentPanelProps {
  dataset_name?: string;
  artifact_type?: string;
  host_identifier?: string;
  data_summary?: string;
  onAnalysisAction?: (action: string) => void;
}
```

### Integration in Main UI

The agent panel is integrated into the main ThreatHunt dashboard as a sidebar component. In `App.tsx`:

1. Main analysis view occupies left side
2. Agent panel occupies right sidebar
3. Context automatically updated when analyst switches datasets/hosts
4. Responsive layout: stacks vertically on mobile

## Configuration

### Environment Variables

```bash
# Provider selection
THREAT_HUNT_AGENT_PROVIDER=auto              # auto, local, networked, or online

# Local provider
THREAT_HUNT_LOCAL_MODEL_PATH=/models/model.gguf

# Networked provider
THREAT_HUNT_NETWORKED_ENDPOINT=http://service:5000
THREAT_HUNT_NETWORKED_KEY=api-key

# Online provider
THREAT_HUNT_ONLINE_API_KEY=sk-key
THREAT_HUNT_ONLINE_PROVIDER=openai
THREAT_HUNT_ONLINE_MODEL=gpt-3.5-turbo

# Agent behavior
THREAT_HUNT_AGENT_MAX_TOKENS=1024
THREAT_HUNT_AGENT_REASONING=true
THREAT_HUNT_AGENT_HISTORY_LENGTH=10
THREAT_HUNT_AGENT_FILTER_SENSITIVE=true

# Frontend
REACT_APP_API_URL=http://localhost:8000
```

### Docker Deployment

Use `docker-compose.yml` for full stack deployment:

```bash
# Build and start services
docker-compose up -d

# Verify health
curl http://localhost:8000/api/agent/health
curl http://localhost:3000

# View logs
docker-compose logs -f backend
docker-compose logs -f frontend

# Stop services
docker-compose down
```

## Security Considerations

1. **API Access**: Backend should be protected with authentication in production
2. **LLM Privacy**: Sensitive data (IPs, usernames) should be filtered before sending to online providers
3. **Error Messages**: Production should use generic error messages, not expose internal details
4. **Rate Limiting**: Implement rate limiting on agent endpoints
5. **Conversation History**: Consider data retention policies for conversation logs

## Testing

### Manual Testing

1. **Agent Health**:
   ```bash
   curl http://localhost:8000/api/agent/health
   ```

2. **Agent Assistance** (without frontend):
   ```bash
   curl -X POST http://localhost:8000/api/agent/assist \
     -H "Content-Type: application/json" \
     -d '{
       "query": "What suspicious patterns do you see?",
       "dataset_name": "FileList",
       "artifact_type": "FileList",
       "host_identifier": "HOST123"
     }'
   ```

3. **Frontend UI**:
   - Navigate to http://localhost:3000
   - Type question in agent panel
   - Verify response displays correctly

## Future Enhancements

1. **Structured Output**: Use LLM JSON mode or function calling for more reliable parsing
2. **Context Filtering**: Automatically filter sensitive data before sending to LLM
3. **Multi-Modal**: Support image uploads (binary analysis, network diagrams)
4. **Caching**: Cache common agent responses to reduce latency
5. **Feedback Loop**: Capture analyst feedback on guidance quality
6. **Integration**: Connect agent to actual CVE databases, threat feeds
7. **Custom Models**: Support fine-tuned models for threat hunting domain
8. **Audit Trail**: Comprehensive logging of all agent interactions

## Governance Compliance

This implementation strictly follows:
- `goose-core/governance/AGENT_POLICY.md` - Agent boundaries and allowed functions
- `goose-core/governance/AI_RULES.md` - AI system rules
- `goose-core/governance/SCOPE.md` - Shared vs application-specific responsibility
- `ThreatHunt/THREATHUNT_INTENT.md` - Agent role in threat hunting

**Key Principles**:
- ✅ Agents assist analysts, never act autonomously
- ✅ No execution without explicit analyst approval
- ✅ No database or schema changes
- ✅ No alert escalation
- ✅ Read-only guidance
- ✅ Transparent reasoning and caveats
- ✅ Analyst retains all authority

## Troubleshooting

### Agent Unavailable (503)
- Check environment variables for provider configuration
- Verify LLM provider is accessible
- Review backend logs: `docker-compose logs backend`

### Slow Responses
- Check LLM provider latency
- Reduce MAX_TOKENS if appropriate
- Consider local provider for latency-sensitive deployments

### No Responses from Frontend
- Verify backend health: `curl http://localhost:8000/api/agent/health`
- Check browser console for errors
- Verify REACT_APP_API_URL in frontend environment
- Check CORS configuration if frontend hosted separately

## File Structure

```
ThreatHunt/
├── backend/
│   ├── app/
│   │   ├── agents/              # Agent module
│   │   │   ├── __init__.py
│   │   │   ├── core.py          # ThreatHuntAgent class
│   │   │   ├── providers.py     # LLM provider interface
│   │   │   └── config.py        # Agent configuration
│   │   ├── api/
│   │   │   ├── routes/
│   │   │   │   ├── __init__.py
│   │   │   │   └── agent.py     # /api/agent/* endpoints
│   │   ├── __init__.py
│   │   └── main.py              # FastAPI app
│   ├── requirements.txt
│   └── run.py
├── frontend/
│   ├── src/
│   │   ├── components/
│   │   │   ├── AgentPanel.tsx   # Agent chat component
│   │   │   └── AgentPanel.css
│   │   ├── utils/
│   │   │   └── agentApi.ts      # API communication
│   │   ├── App.tsx              # Main app with agent
│   │   ├── App.css
│   │   ├── index.tsx
│   ├── public/
│   │   └── index.html
│   ├── package.json
│   └── tsconfig.json
├── Dockerfile.backend
├── Dockerfile.frontend
├── docker-compose.yml
├── .env.example
├── AGENT_IMPLEMENTATION.md       # This file
├── README.md
└── THREATHUNT_INTENT.md
```