Alerting & Feedback¶

This page explains how Seerflow generates alerts, delivers them to configured sinks, and refines detection thresholds through operator feedback. For the full list of alerting parameters, see the Configuration Reference.

Alert Lifecycle¶

Every alert passes through five stages before reaching an operator and, when marked as a false positive, feeds back into the detection layer.

flowchart LR
    A[Detection] --> B[Dedup]
    B --> R[Router]
    R --> C[Format]
    C --> D[Webhooks\nSlack / Teams / JSON]
    C --> E[PagerDuty]
    C --> F[OTLP]
    C --> M[Email / SMS\nTelegram / WhatsApp]
    D --> G[Feedback]
    E --> G
    F --> G
    M --> G
    G -->|FP| H[DSPOT Threshold ↑]
    G -->|FP| I[PagerDuty Resolve]
    G -->|TP/FP| J[Persist to store]

Stage	What happens
Detection	A detector (HST, Holt-Winters, CUSUM, Markov, UEBA, IoC, or Sigma) produces a scored event that crosses its DSPOT upper threshold.
Dedup	A composite dedup key (`alert_type:rule_name:entity_uuid`) collapses repeated firings into a single alert, incrementing `dedup_count`.
Router	The `NotificationRouter` evaluates `alerting.routing_rules` (first-match-wins) and picks the channels for this alert. Out-of-window severities are dropped by per-channel `quiet_hours_by_channel`.
Format	The alert is serialised into one or more channel-specific payloads (Slack Block Kit, Teams Adaptive Card, flat JSON, HTML email, SMS body, Telegram MarkdownV2, WhatsApp template).
Delivery	Each channel has an independent worker with its own rate-limit token bucket. Common retry policy: 3 attempts with back-off delays of 1 s, 2 s, 4 s. WhatsApp adds a per-target circuit breaker.
Feedback	An operator marks the alert `tp` or `fp` via the CLI, REST API, or dashboard. Feedback is persisted alongside the alert and feeds the DSPOT threshold adjuster, PagerDuty resolve, and the LLM Sigma rule-suggestion service.

Webhook Sinks¶

The AlertDispatcher delivers alerts to one or more WebhookTarget entries. Each target carries a URL, a format (slack, teams, or json), and an optional min_severity integer filter (alerts with a severity_id below this value are silently skipped).

The dispatcher holds an internal queue capped at 10,000 alerts. If the queue fills, incoming alerts are dropped with a WARNING log. Delivery uses a 10-second HTTP timeout per request.

SlackTeamsGeneric JSON

The Slack formatter (format_slack) produces a Block Kit payload. The payload structure is:

Block	Content
`header`	Severity emoji + `[SEVERITY] rule_name`
`section`	`description` and `entity_value (entity_type)`
`section` (fields)	Severity, Alert Type, Risk Score, Occurrences
`section` (ATT&CK)	MITRE Tactics and Techniques — omitted when both are empty
`context`	`alert_id` and ISO 8601 UTC timestamp
`actions`	"View in Dashboard" button — omitted when `dashboard_url` is not configured

Example configuration:

alerting:
  webhooks:
    - url: "https://hooks.slack.com/services/T00000/B00000/xxxxxxxxxxxx"
      format: slack
      min_severity: 3   # WARNING and above

The Teams formatter (format_teams) produces a message/attachments envelope containing an Adaptive Card (schema version 1.4). The card body contains:

Element	Content
`TextBlock` (bold)	`[SEVERITY] rule_name`, coloured by severity
`TextBlock`	Alert description
`FactSet`	Severity, Alert Type, Entity, Risk Score, Occurrences, Timestamp, MITRE fields (when present)
`TextBlock` (subtle)	`alert_id` footer
`Action.OpenUrl`	"View in Dashboard" — appended when `dashboard_url` is configured

Severity colours map as follows: TRACE → default, INFORMATIONAL/NOTICE → accent, WARNING → warning, ERROR/CRITICAL/FATAL → attention.

Example configuration:

alerting:
  webhooks:
    - url: "https://your-tenant.webhook.office.com/webhookb2/..."
      format: teams
      min_severity: 0   # all severities

The JSON formatter (format_json) produces a flat, JSON-serialisable dict suitable for any HTTP endpoint. All fields are top-level:

Field	Type	Notes
`alert_id`	string	UUID
`alert_type`	string	e.g. `anomaly`, `sigma`
`timestamp`	string	ISO 8601 UTC
`severity`	string	Uppercase enum name, e.g. `CRITICAL`
`rule_name`	string
`description`	string
`entity_uuid`	string	UUID
`entity_value`	string	e.g. an IP address or username
`entity_type`	string	e.g. `ip`, `user`
`mitre_tactics`	list[string]	Empty list when none
`mitre_techniques`	list[string]	Empty list when none
`risk_score`	float
`dedup_count`	int
`dashboard_url`	string	Only present when `dashboard_url` is configured

Example configuration:

alerting:
  webhooks:
    - url: "https://ingest.example.com/seerflow/alerts"
      format: json
      min_severity: 0

URL safety / SSRF defence¶

All webhook URLs are validated at startup and on each delivery:

Scheme must be http or https.
The host is resolved at request time and rejected if it resolves to a private, loopback, link-local, or reserved IP range.
DNS rebinding is mitigated by re-resolving on each request rather than caching the result.

This protects against an internal SSRF when an operator with config write access pastes a URL pointing at the host's metadata service.

HMAC Authentication¶

Planned Feature

HMAC webhook authentication (signing the request body with a shared secret and passing the signature in a custom header) is planned for a future release. Until then, protect webhook endpoints at the network level or use token-based URL authentication built into the receiving service.

Email, SMS, Telegram, WhatsApp¶

Seerflow also delivers alerts directly to operator inboxes and phones. Each channel has its own target dataclass with min_severity, a token-bucket rate limiter, and credentials redacted from repr.

Channel	Provider	Config key
Email	SMTP (STARTTLS)	`alerting.email_targets`
SMS	Twilio	`alerting.sms_targets`
Telegram	Bot API	`alerting.telegram_targets`
WhatsApp	WhatsApp Cloud API + pre-approved template	`alerting.whatsapp_targets`

Each target carries a unique name — routing rules reference channels by name, not by index. See Config Reference → Alerting for the full field set.

Minimal example¶

alerting:
  email_targets:
    - name: ops-email
      smtp_host: smtp.example.com
      smtp_port: 587
      use_starttls: true
      from_address: alerts@example.com
      to_addresses: [oncall@example.com]
      smtp_user: alerts
      smtp_password: ${SMTP_PASSWORD}
      max_per_minute: 30

  telegram_targets:
    - name: ops-telegram
      bot_token: ${TELEGRAM_BOT_TOKEN}
      chat_id:   ${TELEGRAM_CHAT_ID}
      min_severity: 2

Header injection safety¶

Free-text alert fields are sanitised before being placed into email subject lines and SMS bodies — CR / LF are stripped to block header injection via attacker-controlled rule names.

Circuit breaker (WhatsApp)¶

The WhatsApp Cloud API is rate-limited by Meta. The WhatsApp target trips a per-target circuit breaker after a burst of failures: while the breaker is open, alerts to that target are logged and dropped instead of being queued (queueing during an outage would amplify the rate-limit breach on recovery).

Routing Rules¶

By default, every alert flows to every configured channel. Routing rules override that, picking the channels for each alert based on its type, name, entity type, and severity. The first matching rule wins; remaining rules are skipped.

alerting:
  routing_rules:
    - match:
        alert_type: [sigma, correlation]
        min_severity: 5            # CRITICAL+
      notify:
        - { channel: ops-sms,   mode: immediate }
        - { channel: ops-email, mode: immediate }

    - match:
        alert_type: ml             # ML-only signal, lower urgency
        min_severity: 3
      notify:
        - { channel: ops-telegram, mode: digest, digest_window_minutes: 15 }

  default_routing:
    action: notify                 # alerts matching no rule
    notify:
      - { channel: ops-email, mode: digest, digest_window_minutes: 60 }

`match.*` field	Description
`alert_type`	One or more of `ml`, `sigma`, `correlation`, `ueba`, `ioc`.
`rule_name`	`fnmatch` glob (case-sensitive) on the alert's rule name.
`entity_type`	One or more of `user`, `ip`, `host`, …
`min_severity` / `max_severity`	Inclusive bounds (0–6).

notify[].mode is either immediate (deliver as soon as routed) or digest (buffer for digest_window_minutes and deliver the batch).

Digest mode¶

When a rule selects digest mode, alerts are accumulated per (rule_index, channel_name) and flushed after the configured window. Each channel's deliver_digest formatter produces a compact summary — one message containing the N alerts in the window — to avoid operator pager fatigue during alert storms.

Default routing¶

default_routing.action is either drop (the default — silently discard unmatched alerts) or notify with its own notify list. default_routing is only accepted when routing_rules is non-empty.

Quiet Hours¶

alerting.quiet_hours_by_channel mutes individual channels during a UTC window:

alerting:
  quiet_hours_by_channel:
    ops-sms:
      start: "22:00"
      end:   "07:00"        # wraps midnight
      min_severity: 5       # only suppress < CRITICAL

While inside the window, alerts to that channel with severity_id < min_severity are dropped (and logged at INFO). When start > end, the window wraps midnight.

Severities at or above min_severity always go through — quiet hours never suppress true critical incidents.

OTLP Export¶

The OtlpSink exports alerts as OpenTelemetry LogRecord protobufs to any OTLP-compatible backend (e.g. an OpenTelemetry Collector, Grafana Cloud, or Datadog Agent).

Transport comparison¶

	gRPC (`protocol: grpc`)	HTTP (`protocol: http`)
Endpoint format	`host:port` or `http(s)://host:port` — scheme stripped automatically	Full URL, e.g. `http://host:4318` — `/v1/logs` appended automatically
Content-Type	— (protobuf over gRPC)	`application/x-protobuf`
Connection	Persistent gRPC channel, reused across batches	`aiohttp.ClientSession` created on first flush
Retry on server error	Yes — 3 attempts, delays 1 s / 2 s / 4 s	Yes — 3 attempts, delays 1 s / 2 s / 4 s
Retry on client error (4xx)	No	No

Severity mapping¶

The _SEVERITY_MAP table translates Seerflow's 0-6 SeverityLevel to OTel SeverityNumber and SeverityText:

Seerflow level	Seerflow name	OTel SeverityNumber	OTel SeverityText
0	TRACE	1	`TRACE`
1	INFORMATIONAL	9	`INFO`
2	NOTICE	10	`INFO2`
3	WARNING	13	`WARN`
4	ERROR	17	`ERROR`
5	CRITICAL	21	`FATAL`
6	FATAL	24	`FATAL4`

Batching behaviour¶

OtlpSink accumulates incoming alerts in an in-memory pending list (capped at max_pending, default 10,000). Every export_interval seconds (default 5) the sink atomically swaps the pending list for a fresh one and serialises the batch into a single ExportLogsServiceRequest. A final flush is performed on shutdown before the background task exits.

If the pending list reaches max_pending, additional alerts are dropped with a WARNING log.

Each batch is wrapped in a single ResourceLogs / ScopeLogs pair. The Resource carries service.name = "seerflow" and service.version (read from the installed package metadata, falling back to "dev").

Per-alert attributes exported on each LogRecord:

alert.id, alert.type, rule.name, entity.uuid, entity.value, entity.type,
risk.score, mitre.tactics, mitre.techniques, dedup.key, dedup.count,
contributing.event_ids

Configuration example¶

alerting:
  otlp_endpoint: "otel-collector:4317"     # gRPC: host:port (no scheme)
  otlp_protocol: grpc                       # "grpc" or "http"
  otlp_export_interval_seconds: 5           # seconds between batch flushes

Custom CA and mTLS¶

For private collectors signed by an internal CA, or when the collector requires client certificates:

alerting:
  otlp_endpoint: "otel-collector.internal:4317"
  otlp_protocol: grpc
  otlp_tls: true
  otlp_tls_ca_file: /etc/seerflow/tls/internal-ca.pem
  otlp_mtls_cert_file: /etc/seerflow/tls/seerflow.crt
  otlp_mtls_key_file:  /etc/seerflow/tls/seerflow.key

otlp_tls: null (the default) auto-selects TLS based on protocol and URL scheme. Set explicitly to true/false to override.
otlp_tls_ca_file adds a custom CA bundle on top of the system trust store.
otlp_mtls_cert_file / otlp_mtls_key_file enable mutual TLS. Either both or neither must be set.

The CA, cert, and key are read once at sink construction. Rotating any of them requires a process restart — matching the behaviour of every other static-credential TLS stack (Go tls.Config, OpenSSL SSL_CTX).

Batch buffer

The internal pending buffer holds up to 10,000 alerts before dropping. This is not configurable — reduce otlp_export_interval_seconds if you see drop warnings.

PagerDuty Integration¶

Seerflow uses the PagerDuty Events API v2 to open and resolve incidents.

Trigger / resolve workflow¶

sequenceDiagram
    participant S as Seerflow
    participant PD as PagerDuty

    S->>PD: trigger (event_action: trigger)
    Note right of PD: Incident opened / deduped
    PD-->>S: 202 Accepted

    Note over S: Operator marks alert FP
    S->>PD: resolve (event_action: resolve, same dedup_key)
    Note right of PD: Incident resolved
    PD-->>S: 202 Accepted

Routing key¶

Provide the 32-character hexadecimal integration key from your PagerDuty service's Events API v2 integration page.

alerting:
  pagerduty_routing_key: "a1b2c3d4e5f6a1b2c3d4e5f6a1b2c3d4"  # 32-char hex

Keep the routing key secret

The routing key grants the ability to open incidents on your PagerDuty service. Store it in an environment variable or a secret manager — never commit it to source control.

Dedup key format¶

PagerDuty dedup keys prevent duplicate incidents for the same ongoing event. Seerflow constructs the key as:

{alert_type}:{rule_name}:{entity_uuid}

For example: anomaly:high-request-rate:550e8400-e29b-41d4-a716-446655440000

The same key is used in both the trigger and the resolve event, so a PagerDuty incident is correctly closed when FP feedback is submitted.

Severity mapping¶

Seerflow level	PagerDuty severity
CRITICAL (5) / FATAL (6)	`critical`
ERROR (4)	`error`
WARNING (3) / NOTICE (2)	`warning`
INFORMATIONAL (1) / TRACE (0)	`info`

Auto-resolve on FP feedback¶

When an alert is marked as a false positive and pagerduty_routing_key is configured, the feedback processor automatically sends a resolve event to PagerDuty using the alert's dedup key. This happens synchronously within the process_feedback call.

Configuration example¶

alerting:
  pagerduty_routing_key: ${PAGERDUTY_ROUTING_KEY}

Feedback Loop¶

Operators close the detection loop by marking alerts as true positives (TP) or false positives (FP). Feedback can be submitted via the CLI, the REST API, or the dashboard — all three paths land in the same persisted feedback record.

Submitting feedback¶

# CLI
seerflow feedback <alert_id> tp
seerflow feedback <alert_id> fp --note "Scheduled maintenance"

# REST API
curl -X POST http://localhost:8080/api/v1/alerts/<alert_id>/feedback \
     -H 'content-type: application/json' \
     -d '{"feedback":"fp","note":"Scheduled maintenance","origin":"console"}'

<alert_id> is the UUID shown in the alert payload (the alert_id field).

What is persisted¶

Every feedback submission is appended to a per-alert audit log (retrievable via GET /api/v1/alerts/{id}/feedback):

Field	Description
`feedback`	`tp` or `fp`
`note`	Free-text operator note (optional)
`origin`	`cli`, `api`, `console`, or other caller identifier
`timestamp_ns`	Submission time
`analyst_id`	Resolved from the auth layer when present

Persistence has two downstream effects:

DSPOT threshold adjustment — see below.
LLM rule-suggestion eligibility — once an alert pattern accumulates ≥ llm.rule_suggestion_min_tp TPs, that pattern is eligible for Sigma rule drafting.

What happens on FP¶

The feedback record is persisted.
If a DetectionEnsemble is available, the DSPOT upper threshold for the alert's source is multiplied by 1.05 (_FP_THRESHOLD_FACTOR). The source key is derived from the alert's dedup_key: for HST alerts (format hst:{template_id}:{source_type}:{entity_uuid}), the source_type at index 2 is used; for all other alert types, the alert_type field is used as a fallback.
The threshold cannot grow without bound — detection.dspot_threshold_cap_multiplier (default 5.0) caps the maximum factor over the calibrated baseline, so repeated FPs can never silently disable a detector.
If a PagerDuty routing key is configured, a resolve event is posted immediately for that alert's dedup key.

Threshold accumulation

Each FP multiplies the current DSPOT upper threshold by 1.05. Three consecutive FP marks on the same source raise the threshold by approximately 16% (1.05³), progressively suppressing noisy detectors. The cap multiplier ensures the threshold never exceeds 5.0× the calibrated baseline.

What happens on TP¶

The feedback record is persisted. No threshold or PagerDuty changes are made. If the alert's pattern accumulates enough TPs, an LLM Sigma rule suggestion may be drafted on the operator's next visit to the Rule Suggestions tab.

Troubleshooting¶

Symptom	Likely cause	Resolution
Webhook returns 400 Bad Request	Malformed payload or incorrect endpoint URL for the chosen format	Verify `format` matches the endpoint (e.g. do not send `json` format to a Slack incoming webhook URL). Check Seerflow logs for the sanitised response body. Client errors are not retried.
Webhook returns 401 / 403	Missing or invalid authentication token in the URL	Update the webhook URL with a valid token. If the service requires header-based auth, wait for the HMAC authentication feature.
Webhook returns 5xx repeatedly	Downstream service is unavailable	Seerflow retries 3 times (delays 1 s / 2 s / 4 s) then logs `ERROR: all 3 retries exhausted`. Check the downstream service health. Alerts are dropped after all retries are exhausted.
OTLP export — Connection refused	OTel Collector is not running or endpoint is wrong	Verify the collector is reachable from the Seerflow host. For gRPC, confirm the endpoint is `host:4317` (no scheme) or `http://host:4317`. For HTTP, confirm the endpoint includes scheme and resolves to the `/v1/logs` path.
OTLP export — batch drops logged	`max_pending` exceeded before flush	Reduce `export_interval` to flush more frequently, or increase `max_pending`. Consider scaling up the OTel Collector if it is the bottleneck.
PagerDuty — routing error / 400	Invalid or mismatched routing key	Confirm the 32-character hex key matches the Events API v2 integration (not a REST API key). Check the PagerDuty service is active.
PagerDuty — incident not resolving on FP	`pagerduty_routing_key` not set in config	Add the `pagerduty_routing_key` to the `process_feedback` configuration so the feedback processor can send the resolve event.
Alert queue full — dropping alerts	Dispatcher queue reached 10,000 items	Seerflow logs `WARNING: Alert dispatch queue full`. Add more webhook targets, reduce detection sensitivity, or investigate why the consumer loop is stalled (check for downstream HTTP timeouts).