DSPOT Adaptive Thresholds¶
Security: Compromised Service Account — Exceeding Adaptive Threshold
The compromised svc-deploy attack produces elevated blended scores from HST + Markov convergence. The 3 AM time window normally has lower scores (less activity = fewer anomalies). DSPOT's adaptive threshold accounts for this — but the attack's blended score of 4.8 exceeds even the seasonally adjusted z_q bound of 3.2.
Operations: Latency Tail Auto-Threshold
After the v2.3.1 deploy, p99 latency climbs from 200ms to 2s as connection pool exhaustion forces requests to queue. DSPOT's threshold had been adapting upward to accommodate gradual latency growth during the release window — by 03:15 the upper z_q had drifted to 892ms. But the post-deploy spike to 1,847ms and then 2,103ms lies well beyond even that adapted extreme bound.
{"timestamp": "2026-04-08T03:18:00Z", "service": "api-gateway", "metric": "p99_latency_ms",
"value": 1847, "dspot_threshold": 892, "exceeded": true, "anomaly_direction": "upper"}
{"timestamp": "2026-04-08T03:19:00Z", "service": "api-gateway", "metric": "p99_latency_ms",
"value": 2103, "dspot_threshold": 921, "exceeded": true, "anomaly_direction": "upper"}
Seerflow's DSPOT detector catches this by fitting a Generalized Pareto Distribution to the score tail — the threshold auto-adjusts for seasonal variation and gradual drift, but the post-deploy values land well beyond the expected extreme. See the Ops Primer for how deployment risk windows interact with adaptive thresholds.
Interactive: DSPOT adaptive thresholding
DSPOT's EVT-derived threshold adapts to the stream. The minute-180 spike clearly exceeds the current threshold.
Theory¶
Intuition¶
Instead of a fixed threshold, DSPOT asks: "Given what I've seen, how extreme is this score?" It uses Extreme Value Theory (EVT) ↗ — specifically the Generalized Pareto Distribution (GPD) ↗ — to model the tail of the score distribution. The threshold auto-adjusts as the environment changes, but truly extreme values still trigger. The approach is bidirectional: an upper threshold catches spikes and novel errors, while a lower threshold catches unusual silence or drops in activity — both are statistically grounded in the same EVT framework.
The Peaks-Over-Threshold (POT) ↗ method works like this: scores above a high initial percentile (the 98th by default) are treated as tail excesses. DSPOT collects these excesses, fits a GPD to them, and derives z_q — the score value where the probability of exceeding it is at most risk_level (default 0.0001, i.e. 1 in 10,000). As new excesses accumulate, the GPD is periodically refitted and z_q is updated. If the score distribution shifts — because a deployment changes normal behavior — the excess pool changes, the GPD shifts, and z_q follows. Genuine anomalies remain extreme relative to the updated tail.
Key Equations¶
GPD cumulative distribution function — the probability that a tail excess \(x\) is no greater than some value, given shape \(\xi\) and scale \(\sigma\):
Anomaly quantile — the threshold z_q above which an observation has probability at most \(q\) of occurring:
Where:
- \( x \) = excess above initial threshold \( t \) (i.e. the amount by which a score exceeds the 98th percentile)
- \( \xi \) = GPD shape parameter (tail heaviness — larger positive values mean heavier tails)
- \( \sigma \) = GPD scale parameter (fitted via maximum likelihood)
- \( t \) = initial threshold, set at the
initial_percentile-th percentile of the calibration window - \( n \) = total observations since calibration
- \( N_t \) = number of observations that exceeded \( t \) (exceedances used to fit the GPD)
- \( q \) = risk level (default 0.0001 — target false positive rate per observation)
When \( |\xi| < 10^{-10} \) (nearly exponential tail), the formula simplifies to:
The lower threshold mirrors this: excesses are computed as deficits below the 2nd percentile, the same GPD formula gives a lower z_q, and it is reflected back into score space.
Seerflow Implementation¶
Configuration¶
| Parameter | Type | Default | Range | Description |
|---|---|---|---|---|
dspot.calibration_window |
int |
1000 |
200–5000 | Number of scores collected before GPD fitting begins. Larger values produce more stable initial thresholds at the cost of a longer warmup period. |
dspot.risk_level |
float |
0.0001 |
0.00001–0.01 | Target false positive rate per observation. Lower values produce a higher (more conservative) z_q threshold. |
dspot.initial_percentile |
int |
98 |
90–99 | Percentile used to set the initial tail threshold after calibration. Upper tail at the P-th percentile, lower tail at the (100 − P)-th percentile. |
Calibration Phase¶
During the first calibration_window scores (default 1000), DSPOT collects all scores and flags nothing as anomalous — the ThresholdResult always returns is_anomaly=False. After the calibration window is full:
- Upper initial threshold is set to the 98th percentile of collected scores.
- Lower initial threshold is set to the 2nd percentile.
- Scores above the upper threshold are collected as excesses; scores below the lower threshold are collected as deficits (mirrored to positive values).
- GPD is fitted independently to upper excesses and lower deficits via
scipy.stats.genpareto. - z_q is computed for each tail using the quantile formula above.
- Fallback: if the GPD fit produces a non-finite z_q (too few excesses or a degenerate fit), the initial percentile value is used as the threshold until more data arrives.
Bidirectional Detection¶
Every call to DSpotThreshold.update(score) returns a ThresholdResult:
| Field | Type | Description |
|---|---|---|
is_anomaly |
bool |
True if the score exceeds either threshold |
upper_threshold |
float |
Current upper z_q |
lower_threshold |
float |
Current lower z_q |
score |
float |
The input score |
anomaly_direction |
"upper" \| "lower" \| None |
Direction of the anomaly, or None if not anomalous |
- Upper anomaly:
score > upper_z_q— spike, novel error, elevated activity. - Lower anomaly:
score < lower_z_q— unusual silence, service dropout, metric drop.
Re-calibration¶
After calibration, DSPOT continues to accumulate tail excesses and refits the GPD as new data arrives:
- Every new score above the upper initial threshold adds an excess to the upper pool and increments
n_exceed. - Every new score below the lower initial threshold adds a deficit to the lower pool.
- GPD is refitted every 50 new exceedances for each tail independently.
- Excess lists are capped at 10,000 entries (oldest are dropped) to bound memory growth.
n_total(total observations since calibration) is used in the z_q formula to keep the false positive rate calibrated as the stream grows.
Warmup and Memory¶
- No anomalies are flagged during the calibration window — the 1000-score warmup is a hard gate.
- Memory footprint after calibration: approximately 8 KB (two excess lists of up to 10,000 floats each, plus scalar state).
- Serialization uses
msgspecJSON encoding of a typed_BiDSpotStatestruct. Serialization is only safe after calibration (is_calibrated == True) — pre-calibration scores are not persisted.
Practical Examples¶
Security: Blended Score Exceeds Adaptive Threshold¶
After 1000 calibration scores from the 3 AM quiet window, DSPOT fits the tail and arrives at:
- Upper initial threshold \( t \) = 2.6 (98th percentile of nighttime blended scores)
- GPD fit: \( \xi = 0.15 \), \( \sigma = 0.41 \)
- Upper z_q = 3.2 at risk_level = 0.0001
The attacker's lateral movement produces a blended score of 4.8 (HST: 0.91, Markov: 0.83, combined by signal amplification). Since 4.8 > 3.2:
{
"is_anomaly": true,
"upper_threshold": 3.2,
"lower_threshold": 0.4,
"score": 4.8,
"anomaly_direction": "upper"
}
Ops: Post-Deploy Cascade¶
After the v2.3.1 deploy, latency scores were gradually drifting — normal connection pool warm-up behavior. DSPOT's z_q adapted upward through refit cycles. By 03:15 the calibrated z_q had reached 2.8. The post-deploy cascade pushed the blended score to 5.1:
{
"is_anomaly": true,
"upper_threshold": 2.8,
"lower_threshold": 0.3,
"score": 5.1,
"anomaly_direction": "upper"
}
Lower Threshold: Detecting Silence¶
If a normally busy service suddenly goes quiet — log volume drops, blended scores fall — the lower threshold catches it. A service that ordinarily produces blended scores around 1.2 going completely silent (score = 0.05) would trigger:
{
"is_anomaly": true,
"upper_threshold": 3.1,
"lower_threshold": 0.18,
"score": 0.05,
"anomaly_direction": "lower"
}
This is useful for detecting service outages, log pipeline breaks, or an attacker who has silenced logging on a compromised host.
Tuning Guide¶
| Symptom | Adjustment | Effect |
|---|---|---|
| Too many false positives | Decrease dspot.risk_level to 0.00001 |
Raises z_q — only the most extreme scores fire |
| Missing real anomalies | Increase dspot.risk_level to 0.001 |
Lowers z_q — more sensitive, more alerts |
| Thresholds too volatile | Increase dspot.calibration_window to 2000 |
More data before initial GPD fit = more stable thresholds |
| Want to catch drops | Monitor anomaly_direction == "lower" |
Lower z_q fires on silence, metric drops, service outages |
| Threshold drifting too fast | Review excess accumulation rate | If many scores exceed the initial percentile, consider raising dspot.initial_percentile to 99 |
Risk level as false positive budget: at risk_level = 0.0001, you expect at most 1 anomaly flag per 10,000 normal scores. At risk_level = 0.00001, the budget is 1 per 100,000. Choose based on your alert fatigue tolerance and downstream dedup strategy.
Calibration window sizing: the calibration window must capture a representative sample of the score distribution. For seasonal workloads (e.g. daytime vs. nighttime traffic), consider running separate DSpotThreshold instances per time window, or using a large calibration window (2000–5000) to span multiple cycles before fitting begins.
Cross-Links¶
- Half-Space Trees — produces the per-event anomaly scores that DSPOT thresholds
- Holt-Winters — volume-level anomaly detection that also feeds the blended score
- CUSUM — change-point detection complementing DSPOT's tail approach
- Markov Chains — sequence anomaly scores entering the blended signal
- Scoring & Attack Mapping — how DSPOT's
ThresholdResultcombines with other detectors - Ensemble Overview — how all detectors and DSPOT connect end-to-end
- Anomaly Detection concepts — background on EVT and streaming thresholds
- Configuration Reference —
dspot.calibration_window,dspot.risk_level,dspot.initial_percentile - Deployment Risk — ops context for threshold adaptation during deploys
Next: Scoring & Attack Mapping → — how detector scores are blended and mapped to MITRE ATT&CK.