LLM Integration¶
Seerflow uses an LLM as an auxiliary signal — never as a primary detector. Traditional ML and Sigma cover the high-volume, latency- sensitive paths; the LLM handles low-volume, high-value workflows where natural language helps:
| Service | Trigger | Output |
|---|---|---|
| Alert explanation | Operator clicks Explain on an alert. | Markdown summary of the alert with contributing events, MITRE context, and remediation hints. |
| Natural-language hunt | seerflow hunt "…" / POST /api/v1/hunt. |
An internal EventQuery translated from the prompt, executed against the event store. |
| Sigma rule suggestion | Pattern seeded by ≥ N TP feedbacks. | Draft Sigma YAML, validated by pySigma before display. |
All three services are opt-in. They are disabled by default — the core pipeline runs without ever calling the LLM.
Backend selection¶
Set the backend via llm.backend:
| Value | Provider | Use case |
|---|---|---|
"" |
(disabled) | Default; LLM features off. |
"llama_cpp" |
llama-cpp-python |
Single-host, CPU/GPU, fully offline. |
"ollama" |
Local Ollama daemon (HTTP) | Shared host within the network, easy model swaps. |
"cloud" |
Anthropic / OpenAI SDK | Best quality, lowest latency; egress required. |
llama_cpp¶
llm:
backend: llama_cpp
model_path: /opt/models/phi-4-mini-instruct-q4.gguf
n_ctx: 4096
n_gpu_layers: 0 # set > 0 to offload to GPU
ollama¶
llm:
backend: ollama
ollama_url: http://ollama.internal:11434
ollama_model: phi4-mini
ollama_timeout_s: 8.0
cloud¶
llm:
backend: cloud
cloud_provider: anthropic # or "openai"
cloud_model: claude-haiku-4-5-20251001
cloud_api_key: ${ANTHROPIC_API_KEY}
cloud_timeout_s: 8.0
All knobs are documented in Config Reference → LLM.
Alert explanation¶
When the dashboard requests an explanation, Seerflow:
- Looks up the alert and a bounded number of contributing events
(
llm.explanation_max_contributing_events, default 8). - Builds a prompt capped at
llm.explanation_max_prompt_chars. - Calls the backend with
llm.explanation_timeout_s(default 12 s). - Caches the result in a per-process LRU
(
llm.explanation_cache_size/llm.explanation_cache_ttl_s).
Subsequent GET /api/v1/alerts/{id}/explanation calls return the
cached entry until TTL expiry.
If the LLM is disabled or fails, the endpoints respond with
503 / 502 respectively — the alert detail view degrades to a
plain summary without blocking.
Hunt (NL → query)¶
The hunt service converts a natural-language query into an
EventQuery (entity / time-range / source filters). Output is then
executed against the configured event store — the LLM never returns
event data directly.
Cache keys are derived from the canonicalised query string, so
two analysts typing the same hunt share results. Cache parameters:
llm.hunt_cache_size, llm.hunt_cache_ttl_s.
Hard limits:
llm.hunt_max_query_chars— prompt size cap (default 512).llm.hunt_max_results— default--limitforseerflow hunt(override per-call).
Deterministic fallback¶
POST /api/v1/hunt and seerflow hunt parse the query for an
entity-style token (IPv4, hostname, entity UUID5) before the LLM
call. When the token matches, the request short-circuits to an
entity-timeline lookup, so the hunt surface stays usable when the LLM
is unavailable.
Sigma rule suggestion¶
When an alert accumulates ≥ llm.rule_suggestion_min_tp true-positive
feedbacks (default 3), the corresponding alert pattern becomes
eligible for rule drafting:
- Operator opens the Rule Suggestions tab.
- Picks an eligible pattern → backend generates a Sigma draft.
- Draft is validated by
pySigmabefore display — a syntactically invalid draft never reaches the operator. - Operator can edit, dry-run, and upload via
POST /api/v1/sigma/rules.
Cache: llm.rule_suggestion_cache_size /
llm.rule_suggestion_cache_ttl_s (default 6 h — operators rarely
re-draft within a session).
Look-back window: llm.rule_suggestion_window_days (default 0 =
all time).
Cost and latency¶
- Caching is the primary cost lever. Tune
*_cache_ttl_supward for stable environments; downward when alert distribution shifts rapidly. - Timeouts default to 12 s for hunt / explanation and 30 s for rule suggestion (structured YAML output is heavier). Reduce when using a cloud backend with sub-second latency.
- Model choice trumps tuning. A 14 B local model on CPU may take 20 – 40 s for an explanation; a Haiku-class cloud call typically finishes in 1 – 3 s.
Security¶
llm.cloud_api_keyis redacted fromrepr(LLMConfig)and fromGET /api/v1/config.- Alert prompts include event fields but never raw entity values
for fields marked sensitive (e.g. usernames are hashed before
egress when
cloudis selected and the relevant policy flag is enabled). - No PII is sent unsolicited — explanations and hunt are user-initiated.
For the API surface, see REST API → Hunt / Explanations / Sigma.