LLM Integration¶

Seerflow uses an LLM as an auxiliary signal — never as a primary detector. Traditional ML and Sigma cover the high-volume, latency- sensitive paths; the LLM handles low-volume, high-value workflows where natural language helps:

Service	Trigger	Output
Alert explanation	Operator clicks Explain on an alert.	Markdown summary of the alert with contributing events, MITRE context, and remediation hints.
Natural-language hunt	`seerflow hunt "…"` / `POST /api/v1/hunt`.	An internal `EventQuery` translated from the prompt, executed against the event store.
Sigma rule suggestion	Pattern seeded by ≥ N TP feedbacks.	Draft Sigma YAML, validated by `pySigma` before display.

All three services are opt-in. They are disabled by default — the core pipeline runs without ever calling the LLM.

Backend selection¶

Set the backend via llm.backend:

Value	Provider	Use case
`""`	(disabled)	Default; LLM features off.
`"llama_cpp"`	`llama-cpp-python`	Single-host, CPU/GPU, fully offline.
`"ollama"`	Local Ollama daemon (HTTP)	Shared host within the network, easy model swaps.
`"cloud"`	Anthropic / OpenAI SDK	Best quality, lowest latency; egress required.

`llama_cpp`¶

llm:
  backend: llama_cpp
  model_path: /opt/models/phi-4-mini-instruct-q4.gguf
  n_ctx: 4096
  n_gpu_layers: 0      # set > 0 to offload to GPU

`ollama`¶

llm:
  backend: ollama
  ollama_url: http://ollama.internal:11434
  ollama_model: phi4-mini
  ollama_timeout_s: 8.0

`cloud`¶

llm:
  backend: cloud
  cloud_provider: anthropic        # or "openai"
  cloud_model:    claude-haiku-4-5-20251001
  cloud_api_key:  ${ANTHROPIC_API_KEY}
  cloud_timeout_s: 8.0

All knobs are documented in Config Reference → LLM.

Alert explanation¶

When the dashboard requests an explanation, Seerflow:

Looks up the alert and a bounded number of contributing events (llm.explanation_max_contributing_events, default 8).
Builds a prompt capped at llm.explanation_max_prompt_chars.
Calls the backend with llm.explanation_timeout_s (default 12 s).
Caches the result in a per-process LRU (llm.explanation_cache_size / llm.explanation_cache_ttl_s).

Subsequent GET /api/v1/alerts/{id}/explanation calls return the cached entry until TTL expiry.

If the LLM is disabled or fails, the endpoints respond with 503 / 502 respectively — the alert detail view degrades to a plain summary without blocking.

Hunt (NL → query)¶

The hunt service converts a natural-language query into an EventQuery (entity / time-range / source filters). Output is then executed against the configured event store — the LLM never returns event data directly.

Cache keys are derived from the canonicalised query string, so two analysts typing the same hunt share results. Cache parameters: llm.hunt_cache_size, llm.hunt_cache_ttl_s.

Hard limits:

llm.hunt_max_query_chars — prompt size cap (default 512).
llm.hunt_max_results — default --limit for seerflow hunt (override per-call).

Deterministic fallback¶

POST /api/v1/hunt and seerflow hunt parse the query for an entity-style token (IPv4, hostname, entity UUID5) before the LLM call. When the token matches, the request short-circuits to an entity-timeline lookup, so the hunt surface stays usable when the LLM is unavailable.

Sigma rule suggestion¶

When an alert accumulates ≥ llm.rule_suggestion_min_tp true-positive feedbacks (default 3), the corresponding alert pattern becomes eligible for rule drafting:

Operator opens the Rule Suggestions tab.
Picks an eligible pattern → backend generates a Sigma draft.
Draft is validated by pySigma before display — a syntactically invalid draft never reaches the operator.
Operator can edit, dry-run, and upload via POST /api/v1/sigma/rules.

Cache: llm.rule_suggestion_cache_size / llm.rule_suggestion_cache_ttl_s (default 6 h — operators rarely re-draft within a session).

Look-back window: llm.rule_suggestion_window_days (default 0 = all time).

Cost and latency¶

Caching is the primary cost lever. Tune *_cache_ttl_s upward for stable environments; downward when alert distribution shifts rapidly.
Timeouts default to 12 s for hunt / explanation and 30 s for rule suggestion (structured YAML output is heavier). Reduce when using a cloud backend with sub-second latency.
Model choice trumps tuning. A 14 B local model on CPU may take 20 – 40 s for an explanation; a Haiku-class cloud call typically finishes in 1 – 3 s.

Security¶

llm.cloud_api_key is redacted from repr(LLMConfig) and from GET /api/v1/config.
Alert prompts include event fields but never raw entity values for fields marked sensitive (e.g. usernames are hashed before egress when cloud is selected and the relevant policy flag is enabled).
No PII is sent unsolicited — explanations and hunt are user-initiated.

For the API surface, see REST API → Hunt / Explanations / Sigma.