commit 1fddc3574f579ca15e1ce6c0c29ef7ea17423a7d Author: mblanke Date: Mon Feb 2 14:12:33 2026 -0500 dev backbone template diff --git a/.claude/agents/architect-cyber.md b/.claude/agents/architect-cyber.md new file mode 100644 index 0000000..5305d92 --- /dev/null +++ b/.claude/agents/architect-cyber.md @@ -0,0 +1,150 @@ +--- +name: architect-cyber +description: > + Proactively design and govern architecture for cyber apps (Threat Hunt, Cyber Goose, Cyber Intel, PAD generator). + Use for new subsystems, major refactors, data model changes, auth/tenancy, and any new integration. +--- + +# Architect (Cyber) — playbook + +## Mission +Design a maintainable, secure, observable system that agents and humans can ship safely. + +You are not a code generator first — you are a *decision generator*: +- make boundaries crisp, +- make data flow explicit, +- make risks boring, +- make “done” measurable. + +## Inputs you should ask for (but don’t block if missing) +- Primary user + workflow (1–3 sentences) +- Constraints: offline/air-gapped? data sensitivity? deployment target? +- Existing stack choices (default if unknown): **Next.js + MUI portal**, **MCP tool spine**, **RAG for knowledge**, **DoD gates**. +- Integration list (APIs, DBs, agents/models, auth provider) + +## Outputs (always produce these) +1) **Architecture sketch** (components + responsibilities) +2) **Data flow** (what moves where, and why) +3) **Security posture** (threat model + guardrails) +4) **Operational plan** (logging/tracing/metrics + runbook basics) +5) **ADR(s)** for any non-trivial choice +6) **Implementation plan** (phased, with checkpoints) +7) **DoD gates** (tests + checks that define “done”) + +--- + +## Default architecture (use unless there’s a reason not to) + +### Layers +1) **Portal UI (MUI)** + - Deterministic workflows first; chat is secondary. + - Prefer “AI → JSON → UI” patterns for agent outputs when helpful. + +2) **API service** + - Thin orchestration: auth, input validation, calling MCP tools, calling RAG. + - Keep business logic in well-tested modules, not in route handlers. + +3) **MCP Tool Spine (Cyber MCP Gateway)** + - Outcome-oriented tools (hunt.*, intel.*, pad.*). + - Flat args, strict schemas, pagination, clear errors. + - Progressive disclosure: safe tools by default; “dangerous” tools require explicit unlock. + +4) **RAG / Knowledge** + - Curated corpus + citations. + - Prefer incremental ingestion and quality metrics over “ingest everything”. + +5) **Storage** + - Postgres for app state (cases, runs, users, configs) + - Object storage for artifacts (uploads, exports) + - Vector DB for embeddings (can be Postgres+pgvector or dedicated, but keep the interface stable) + +--- + +## Process (follow this order) + +### 1) Clarify the “job” +- What is the user trying to accomplish in 2 minutes? +- What are the top 3 screens/actions? + +### 2) Identify trust boundaries +- What data is sensitive? +- Where is execution allowed? +- Who can trigger “run commands” or “touch infra”? + +### 3) Define domain objects (nouns) +For cyber tools, typical objects: +- Case, Evidence, Artifact, Indicator (IOC), Finding, Detection, Query, Run, Source, Confidence, Citation. + +### 4) Define pipelines (verbs) +Threat hunt pipeline default: +- ingest → normalize → enrich → index → query → analyze → report/export + +PAD pipeline default: +- collect inputs → outline → section drafts → compliance checks → citations → export + +### 5) Choose interfaces first +- MCP tool contracts (schemas + examples) +- API endpoints (if needed) +- UI component contracts (JSON render schema if used) + +### 6) Produce ADRs +Use small ADRs (one per decision). Include: +- context, decision, alternatives, consequences, reversibility. + +### 7) Define DoD gates (non-negotiable) +Minimum: +- format + lint +- typecheck (TS) / static check (Python) +- unit tests for core logic +- integration test for at least one end-to-end happy path +- secret scanning / dependency scanning +- logging + trace correlation IDs on API requests + +--- + +## Cyber-specific checklists + +### Security checklist (minimum bar) +- AuthN + AuthZ: who can do what? +- Audit logging for privileged actions (exports, deletes, tool unlocks) +- Secrets: never in repo; use env/secret manager; rotate strategy +- Input validation everywhere (uploads, URLs, query params) +- Safe tool mode by default (read-only, limited scope) +- Clear “permission boundary” text in UI for destructive actions + +### Tool safety checklist (MCP) +- Tool scopes/roles (viewer, analyst, admin) +- Rate limits for expensive tools +- Deterministic error handling (no stack traces to client) +- Replayability: tool calls logged with inputs + outputs (redact secrets) + +### Observability checklist +- Structured logs (JSON) +- OpenTelemetry traces across: UI action → API → MCP tool → RAG +- Metrics: latency, error rate, token usage, cost/throughput, queue depth +- “Why did the agent do that?” debug trail (plan + tool calls + citations) + +### UX checklist (portal) +- Default to workflow pages (Cases, Evidence, Runs, Reports) +- Every AI output must be: editable, citeable, exportable +- Show confidence + sources when claiming facts +- One-click “Generate report artifact” and “Copy as markdown” + +--- + +## Red flags (stop and redesign) +- “One mega tool” that does everything +- Agents writing directly to prod databases +- No tests, no gates, but “agent says done” +- Unbounded ingestion (“let’s embed 40TB tonight”) +- No citations for knowledge-based answers + +--- + +## Final format (what you deliver to the team) +Provide, in order: +1) 1-page overview +2) Component diagram (text-based is fine) +3) ADR list +4) Phase plan with milestones +5) DoD gates checklist diff --git a/.github/copilot-instructions.md b/.github/copilot-instructions.md new file mode 100644 index 0000000..cab233c --- /dev/null +++ b/.github/copilot-instructions.md @@ -0,0 +1,9 @@ + +Follow `AGENTS.md` and `SKILLS.md`. + +Rules: +- Use the PLAN → IMPLEMENT → VERIFY → REVIEW loop. +- Keep model selection on Auto unless AGENTS.md role routing says to override for that role. +- Never claim "done" unless DoD passes (`./scripts/dod.sh` or `\scripts\dod.ps1`). +- Keep diffs small and add/update tests when behavior changes. +- Prefer reproducible commands and cite sources for generated documents. diff --git a/.github/workflows/dod.yml b/.github/workflows/dod.yml new file mode 100644 index 0000000..7cc778d --- /dev/null +++ b/.github/workflows/dod.yml @@ -0,0 +1,17 @@ + +name: DoD Gate + +on: + pull_request: + push: + branches: [ main, master ] + +jobs: + dod: + runs-on: ubuntu-latest + steps: + - uses: actions/checkout@v4 + - name: Run DoD + run: | + chmod +x ./scripts/dod.sh || true + ./scripts/dod.sh diff --git a/.gitignore b/.gitignore new file mode 100644 index 0000000..0b88c05 --- /dev/null +++ b/.gitignore @@ -0,0 +1,3 @@ + +# profiling artifacts +profiles/ diff --git a/.gitlab-ci.yml b/.gitlab-ci.yml new file mode 100644 index 0000000..a7b0805 --- /dev/null +++ b/.gitlab-ci.yml @@ -0,0 +1,15 @@ + +stages: [dod] + +dod: + stage: dod + image: ubuntu:24.04 + before_script: + - apt-get update -y + - apt-get install -y bash git ca-certificates curl python3 python3-pip nodejs npm jq + script: + - chmod +x ./scripts/dod.sh || true + - ./scripts/dod.sh + rules: + - if: $CI_PIPELINE_SOURCE == "merge_request_event" + - if: $CI_COMMIT_BRANCH diff --git a/AGENTS.md b/AGENTS.md new file mode 100644 index 0000000..fb9c97a --- /dev/null +++ b/AGENTS.md @@ -0,0 +1,78 @@ + +# Agent Operating System (Auto-first) + +Default: use **Auto** model selection. Only override the model when a role below says it is worth it. + +## Prime directive +- Never claim "done" unless **DoD passes**. +- Prefer small, reviewable diffs. +- If requirements are unclear: stop and produce a PLAN + questions inside the plan. +- Agents talk; **DoD decides**. + +## Always follow this loop +1) PLAN: goal, constraints, assumptions, steps (≤10), files to touch, test plan. +2) IMPLEMENT: smallest correct change. +3) VERIFY: run DoD until green. +4) REVIEW: summarize changes, risks, next steps. + +## DoD Gate (Definition of Done) +Required before "done": +- macOS/Linux: `./scripts/dod.sh` +- Windows: `\scripts\dod.ps1` + +If DoD cannot be run, say exactly why and what would be run. + +## Terminal agent workflow (Copilot CLI) +Preferred terminal assistant: GitHub Copilot CLI via `gh copilot`. + +Default loop: +1) Plan: draft plan + file list + test plan. +2) Build: implement smallest slice. +3) Steer: when stuck, ask for next action using current errors/logs. +4) Verify: run DoD until green. + +Rules: +- Keep diffs small. +- If the same error repeats twice, switch to Reviewer role and produce a fix plan. + +## Role routing (choose a role explicitly) + +### Planner +Use when: new feature, refactor, multi-file, uncertain scope. +Output: plan + acceptance criteria + risks + test plan. +Model: Auto (override to a “high reasoning / Codex” model only for complex design/debugging). + +### UI/UX Specialist +Use when: screens, layout, copy, design tradeoffs, component structure. +Output: component outline, UX notes, acceptance criteria. +Model: Auto (override to Gemini only when UI/UX is the main work). + +### Coder +Use when: writing/editing code, plumbing, tests, small refactors. +Rules: follow repo conventions; keep diff small; add/update tests when behavior changes. +Model: Auto (override to Claude Haiku only when speed matters and the change is well-scoped). + +### Reviewer +Use when: before merge, failing tests, risky changes, security-sensitive areas. +Output: concrete issues + recommended fixes + risk assessment + verification suggestions. +Model: Auto (override to the strongest available for high-stakes diffs). + +## Non-negotiables +- Do not expose secrets/tokens/keys. Never print env vars. +- No destructive commands unless explicitly required and narrowly scoped. +- Do not add new dependencies without stating why + impact + alternatives. +- Prefer deterministic, reproducible steps. +- Cite sources when generating documents from a knowledge base. + +## Repo facts (fill these in) +- Primary stack: +- Package manager: +- Test command: +- Lint/format command: +- Build command (if any): +- Deployment (if any): + + +## Claude Code Agents (optional) +- `.claude/agents/architect-cyber.md` — architecture + security + ops decisions for cyber apps. +- Add more agents in `.claude/agents/` as you standardize roles (reviewer, tester, security-lens). diff --git a/AGENT_LOG.md b/AGENT_LOG.md new file mode 100644 index 0000000..6d91e00 --- /dev/null +++ b/AGENT_LOG.md @@ -0,0 +1,13 @@ + +# Agent Handoff Log + +Use this file when multiple agents (or humans) are working in parallel. + +## Entry template +- Date/time: +- Branch/worktree: +- Role: (Planner / UI / Coder / Reviewer) +- What changed: +- Commands run + results: +- Current blockers: +- Next step: diff --git a/README.md b/README.md new file mode 100644 index 0000000..6191b06 --- /dev/null +++ b/README.md @@ -0,0 +1,32 @@ + +# Dev Backbone Template + +Drop-in baseline for: +- `AGENTS.md` + `SKILLS/` (consistent agent behavior) +- DoD gates (`scripts/dod.sh`, `scripts/dod.ps1`) +- Copilot instructions +- CI gates (GitHub Actions + GitLab CI) + +## Use for NEW projects +1) Make this repo a GitHub **Template repository**. +2) Create new repos from the template. + +## Apply to EXISTING repos +Copy these into the repo root: +- AGENTS.md +- SKILLS.md +- SKILLS/ +- scripts/ +- .github/copilot-instructions.md +- (optional) .github/workflows/dod.yml +- (optional) .gitlab-ci.yml + +Commit and push. + +## Run DoD +- macOS/Linux: `./scripts/dod.sh` +- Windows: `\scripts\dod.ps1` + +## Notes +- DoD scripts auto-detect Node/Python and run what exists. +- Customize per repo for extra checks (docker build, e2e, mypy, etc.). diff --git a/SKILLS.md b/SKILLS.md new file mode 100644 index 0000000..a42e375 --- /dev/null +++ b/SKILLS.md @@ -0,0 +1,23 @@ +# Skills Index + +These skill files define repeatable behaviors for agents and humans. + +Agents must follow them in this order: +1) SKILLS/00-operating-model.md +2) SKILLS/05-agent-taxonomy.md +3) SKILLS/10-definition-of-done.md +4) SKILLS/20-repo-map.md (use whenever unfamiliar with the repo) +5) SKILLS/25-algorithms-performance.md +6) SKILLS/26-vibe-coding-fundamentals.md +7) SKILLS/27-performance-profiling.md +8) SKILLS/30-implementation-rules.md +9) SKILLS/40-testing-quality.md +10) SKILLS/50-pr-review.md +11) SKILLS/56-ui-material-ui.md (for React/Next portal-style apps) +12) SKILLS/60-security-safety.md +13) SKILLS/70-docs-artifacts.md +14) SKILLS/82-mcp-server-design.md (when building MCP servers/tools) +15) SKILLS/83-fastmcp-3-patterns.md (if using FastMCP 3) +16) SKILLS/80-mcp-tools.md (if this repo has MCP tools) + +Rule: If anything conflicts, **AGENTS.md wins**. diff --git a/SKILLS/00-operating-model.md b/SKILLS/00-operating-model.md new file mode 100644 index 0000000..98d16b4 --- /dev/null +++ b/SKILLS/00-operating-model.md @@ -0,0 +1,21 @@ + +# Operating Model + +## Default cadence +- Prefer iterative progress over big bangs. +- Keep diffs small: target ≤ 300 changed lines per PR unless justified. +- Update tests/docs as part of the same change when possible. + +## Working agreement +- Start with a PLAN for non-trivial tasks. +- Implement the smallest slice that satisfies acceptance criteria. +- Verify via DoD. +- Write a crisp PR summary: what changed, why, and how verified. + +## Stop conditions (plan first) +Stop and produce a PLAN (do not code yet) if: +- scope is unclear +- more than 3 files will change +- data model changes +- auth/security boundaries +- performance-critical paths diff --git a/SKILLS/05-agent-taxonomy.md b/SKILLS/05-agent-taxonomy.md new file mode 100644 index 0000000..3c4a494 --- /dev/null +++ b/SKILLS/05-agent-taxonomy.md @@ -0,0 +1,36 @@ +# Agent Types & Roles (Practical Taxonomy) + +Use this skill to choose the *right* kind of agent workflow for the job. + +## Common agent “types” (in practice) + +### 1) Chat assistant (no tools) +Best for: explanations, brainstorming, small edits. +Risk: can hallucinate; no grounding in repo state. + +### 2) Tool-using single agent +Best for: well-scoped tasks where the agent can read/write files and run commands. +Key control: strict DoD gates + minimal permissions. + +### 3) Planner + Executor (2-role pattern) +Best for: medium complexity work (multi-file changes, feature work). +Flow: Planner writes plan + acceptance criteria → Executor implements → Reviewer checks. + +### 4) Multi-agent (specialists) +Best for: bigger features with separable workstreams (UI, backend, docs, tests). +Rule: isolate context per role; use separate branches/worktrees. + +### 5) Supervisor / orchestrator +Best for: long-running workflows with checkpoints (pipelines, report generation, PAD docs). +Rule: supervisor delegates, enforces gates, and composes final output. + +## Decision rules (fast) +- If you can describe it in ≤ 5 steps → single tool-using agent. +- If you need tradeoffs/design → Planner + Executor. +- If UI + backend + docs/tests all move → multi-agent specialists. +- If it’s a pipeline that runs repeatedly → orchestrator. + +## Guardrails (always) +- DoD is the truth gate. +- Separate branches/worktrees for parallel work. +- Log decisions + commands in AGENT_LOG.md. diff --git a/SKILLS/10-definition-of-done.md b/SKILLS/10-definition-of-done.md new file mode 100644 index 0000000..d99148e --- /dev/null +++ b/SKILLS/10-definition-of-done.md @@ -0,0 +1,24 @@ + +# Definition of Done (DoD) + +A change is "done" only when: + +## Code correctness +- Builds successfully (if applicable) +- Tests pass +- Linting/formatting passes +- Types/checks pass (if applicable) + +## Quality +- No new warnings introduced +- Edge cases handled (inputs validated, errors meaningful) +- Hot paths not regressed (if applicable) + +## Hygiene +- No secrets committed +- Docs updated if behavior or usage changed +- PR summary includes verification steps + +## Commands +- macOS/Linux: `./scripts/dod.sh` +- Windows: `\scripts\dod.ps1` diff --git a/SKILLS/20-repo-map.md b/SKILLS/20-repo-map.md new file mode 100644 index 0000000..810f986 --- /dev/null +++ b/SKILLS/20-repo-map.md @@ -0,0 +1,16 @@ + +# Repo Mapping Skill + +When entering a repo: +1) Read README.md +2) Identify entrypoints (app main / server startup / CLI) +3) Identify config (env vars, .env.example, config files) +4) Identify test/lint scripts (package.json, pyproject.toml, Makefile, etc.) +5) Write a 10-line "repo map" in the PLAN before changing code + +Output format: +- Purpose: +- Key modules: +- Data flow: +- Commands: +- Risks: diff --git a/SKILLS/25-algorithms-performance.md b/SKILLS/25-algorithms-performance.md new file mode 100644 index 0000000..7a63fc7 --- /dev/null +++ b/SKILLS/25-algorithms-performance.md @@ -0,0 +1,20 @@ +# Algorithms & Performance + +Use this skill when performance matters (large inputs, hot paths, or repeated calls). + +## Checklist +- Identify the **state** you’re recomputing. +- Add **memoization / caching** when the same subproblem repeats. +- Prefer **linear scans** + caches over nested loops when possible. +- If you can write it as a **recurrence**, you can test it. + +## Practical heuristics +- Measure first when possible (timing + input sizes). +- Optimize the biggest wins: avoid repeated I/O, repeated parsing, repeated network calls. +- Keep caches bounded (size/TTL) and invalidate safely. +- Choose data structures intentionally: dict/set for membership, heap for top-k, deque for queues. + +## Review notes (for PRs) +- Call out accidental O(n²) patterns. +- Suggest table/DP or memoization when repeated work is obvious. +- Add tests that cover base cases + typical cases + worst-case size. diff --git a/SKILLS/26-vibe-coding-fundamentals.md b/SKILLS/26-vibe-coding-fundamentals.md new file mode 100644 index 0000000..0eb231e --- /dev/null +++ b/SKILLS/26-vibe-coding-fundamentals.md @@ -0,0 +1,31 @@ +# Vibe Coding With Fundamentals (Safety Rails) + +Use this skill when you’re using “vibe coding” (fast, conversational building) but want production-grade outcomes. + +## The good +- Rapid scaffolding and iteration +- Fast UI prototypes +- Quick exploration of architectures and options + +## The failure mode +- “It works on my machine” code with weak tests +- Security foot-guns (auth, input validation, secrets) +- Performance cliffs (accidental O(n²), repeated I/O) +- Unmaintainable abstractions + +## Safety rails (apply every time) +- Always start with acceptance criteria (what “done” means). +- Prefer small PRs; never dump a huge AI diff. +- Require DoD gates (lint/test/build) before merge. +- Write tests for behavior changes. +- For anything security/data related: do a Reviewer pass. + +## When to slow down +- Auth/session/token work +- Anything touching payments, PII, secrets +- Data migrations/schema changes +- Performance-critical paths +- “It’s flaky” or “it only fails in CI” + +## Practical prompt pattern (use in PLAN) +- “State assumptions, list files to touch, propose tests, and include rollback steps.” diff --git a/SKILLS/27-performance-profiling.md b/SKILLS/27-performance-profiling.md new file mode 100644 index 0000000..6dc5504 --- /dev/null +++ b/SKILLS/27-performance-profiling.md @@ -0,0 +1,31 @@ +# Performance Profiling (Bun/Node) + +Use this skill when: +- a hot path feels slow +- CPU usage is high +- you suspect accidental O(n²) or repeated work +- you need evidence before optimizing + +## Bun CPU profiling +Bun supports CPU profiling via `--cpu-prof` (generates a `.cpuprofile` you can open in Chrome DevTools). + +Upcoming: `bun --cpu-prof-md