IOCs & Entity Extraction¶

Every attacker leaves traces. In our SSH example, the attacker left several: an IP (198.51.100.23), a username (deploy), a hostname (web-prod-01), a domain (evil-c2.example.com). These traces are scattered across different log files, servers, and tools. Individually, each looks like ordinary data. Together, they tell the story of a breach.

Indicators of Compromise (IOCs)¶

An Indicator of Compromise (IOC) is any observable artifact that suggests a system has been breached or is under attack. Think of IOCs as digital fingerprints at a crime scene. Just as a detective collects fingerprints and shell casings, a security analyst collects IP addresses, file hashes, and domain names.

IOCs don't prove guilt on their own. A single IP in a log might be perfectly benign. But when that same IP shows up in failed SSH logins, blocked firewall connections, and DNS queries to a known malicious domain — the fingerprints form a pattern.

Common IOC Types¶

IOC Type	What It Is	Example	Where You Find It
IP Address	A numeric address identifying a network device	`198.51.100.23`	Firewall logs, SSH logs, web access logs
Domain	A human-readable name that resolves to an IP via DNS (Domain Name System)	`evil-c2.example.com`	DNS query logs, proxy logs, email headers
File Hash	A cryptographic fingerprint of a file's contents (SHA-256, MD5)	`a1b2c3d4e5f6...`	Endpoint detection logs, antivirus alerts
Username	An account identifier on a system	`deploy`	Authentication logs, SSH logs, audit trails
Process	A running program, identified by name or ID	`/tmp/.hidden/rev_shell`	Process execution logs, syslog
URL / Path	A web address or file path tied to malicious activity	`/wp-admin/shell.php`	Web server logs, proxy logs

IOCs in Our SSH Attack¶

As the brute-force attack progresses into a full breach, the attacker scatters fingerprints across every log source:

SSH authentication log

Failed password for deploy from 198.51.100.23 port 44231 ssh2
Accepted password for deploy from 198.51.100.23 port 44987 ssh2

DNS query log

query: evil-c2.example.com A record from 10.0.1.15 (web-prod-01)

Process execution log

web-prod-01: /tmp/.hidden/rev_shell executed by user deploy (PID 28841)

Three log sources. Five IOCs: the IP 198.51.100.23, the username deploy, the hostname web-prod-01, the domain evil-c2.example.com, and the process /tmp/.hidden/rev_shell. A human analyst could piece this together — but only if they look in all three places. An automated system needs a way to link these traces. That is where entities come in.

From IOCs to Entities¶

An IOC is a single data point. An entity is the "who" or "what" behind one or more IOCs — a user, a machine, a network address — that persists across multiple log events and sources.

The distinction matters because attackers don't stay in one log file. The IP 198.51.100.23 shows up everywhere the attacker goes:

Log Source	Log Entry	Entity Involved
SSH log	`Failed password for deploy from 198.51.100.23`	IP `198.51.100.23`, User `deploy`
Firewall log	`DENY 198.51.100.23 -> 10.0.1.15:3306 (MySQL)`	IP `198.51.100.23`, Host `10.0.1.15`
DNS log	`198.51.100.23 queried evil-c2.example.com`	IP `198.51.100.23`, Domain `evil-c2.example.com`
Web access log	`198.51.100.23 POST /api/upload 200`	IP `198.51.100.23`

By treating 198.51.100.23 as an entity, a security tool can aggregate every event involving that IP across all sources. Instead of four isolated log lines, you get a timeline: brute-forced SSH, blocked reaching the database, resolved a suspicious domain, uploaded via the web API. Entities turn scattered IOCs into a narrative.

Entity Resolution¶

Real infrastructure is messy. The same machine can appear under different identifiers depending on which log source recorded it:

Identifier	Format	Log Source
`web-prod-01`	Hostname (short name assigned by the OS)	SSH logs, syslog
`10.0.1.15`	Internal IP address (private network)	Firewall logs, flow logs
`web-prod-01.corp.example.com`	FQDN — Fully Qualified Domain Name (complete hostname + domain)	DNS logs, certificate logs
`i-0a1b2c3d4e5f67890`	Cloud instance ID (AWS, GCP, or Azure)	Cloud audit logs

All four refer to the same server. If a security tool treats each as separate, it fragments the attack story. Entity resolution is the process of recognizing that different identifiers refer to the same thing and merging them into a single entity.

It works by maintaining a mapping between known aliases. When one log mentions 10.0.1.15 and another mentions web-prod-01, the system knows those are the same host and attributes both events to one entity. Without resolution, the firewall block and the SSH login look unrelated. With it, they become two steps in the same attack.

Why Entities Matter More Than Individual IOCs¶

A single IOC tells you very little. One failed login could be a typo. But when entities tie IOCs together across sources and time, a picture emerges:

The IP 198.51.100.23 failed SSH logins 47 times (brute force)
The same IP succeeded on attempt 48 (credential compromise)
The compromised host web-prod-01 then queried evil-c2.example.com (command-and-control)
A suspicious process launched on that host (payload execution)

No single log source contains all four facts. Entities are the connective tissue that makes cross-source correlation possible.

How Seerflow Uses This

Seerflow extracts six entity types from every log event: IP address, user, hostname, domain, process, and file. These are the building blocks of all downstream detection and correlation.
Entity resolution bridges identity gaps automatically — linking hostname to IP, IP to FQDN, cloud instance ID back to the same host. When web-prod-01, 10.0.1.15, and i-0a1b2c3d4e5f67890 appear in different logs, Seerflow knows they are the same machine.
Entities power the entity graph — a network data structure connecting users, IPs, hosts, and domains through observed relationships. When user deploy authenticates from 198.51.100.23 to web-prod-01, which then queries evil-c2.example.com, those relationships become edges in the graph.
The KillChainTracker tracks entity progression across all log sources, mapping each step to the corresponding kill chain stage. It recognizes that a brute-force attempt, a successful login, a C2 connection, and data exfiltration are part of the same attack — even when each step appears in a different log.

Next: Sigma Rules →