Real-time DDoS Detection at Scale

Why Detection Speed Matters

In DDoS defense, detection speed determines outcome severity. A volumetric attack generating 10 Gbps of flood traffic will saturate a standard 1 Gbps server link in under one second. If your detection system polls every 60 seconds, the attack has been running for a full minute before you even know about it. Your services have been offline, your users have been impacted, and your SLA clock has been ticking for 59 seconds before your team gets the first alert.

The difference between 1-second and 60-second detection compounds at every stage of the response. With 1-second detection, automated mitigation can trigger within 2 to 3 seconds, potentially before end users notice any impact. The PCAP capture begins during the first moments of the attack, producing the richest forensic evidence. Escalation to upstream scrubbing starts immediately. With 60-second detection, every one of those actions is delayed by a full minute, and that minute often determines whether the incident is a brief blip or a major outage.

Per-Second vs Per-Minute Analysis

Most traditional DDoS detection systems operate on per-minute granularity. NetFlow and sFlow exporters typically aggregate data into 1-minute or 5-minute intervals. SNMP polling usually runs every 60 seconds. Cloud provider metrics (AWS CloudWatch, Azure Monitor) default to 1-minute or even 5-minute resolution. This means the detection system is working with averaged data that smooths out rapid changes.

Consider a 30-second burst attack that sends 500,000 PPS for 30 seconds and then stops. In a per-minute system, this gets averaged to 250,000 PPS over the minute, diluting the signal. If the baseline is 200,000 PPS, the averaged value may not even trigger a threshold. The attack succeeds, the services went down for 30 seconds, and the monitoring system reports nothing abnormal. Per-second analysis sees the full 500,000 PPS spike immediately and triggers detection within the first second.

Per-second analysis also captures attack ramps. Many sophisticated attacks start slow and increase gradually, testing defenses. A per-minute system sees a gentle upward slope. A per-second system sees the exact moment the traffic transitions from organic growth to attack ramp, because the per-second rate change is far more abrupt than the per-minute average suggests.

Architecture of Real-Time Detection

Real-time DDoS detection requires a fundamentally different architecture than traditional monitoring. The data source must provide per-second (or sub-second) resolution without sampling. The analysis engine must process each data point as it arrives, not in batch. And the alerting pipeline must deliver notifications with minimal latency.

Agent-based monitoring is the most practical approach for per-second detection. A lightweight agent on each server reads kernel counters directly from /proc/net/snmp, /proc/net/netstat, and /proc/net/dev every second. These pseudo-files are maintained by the kernel and updated on every packet, so reading them gives you exact, unsampled packet and byte counts. The cost of reading a few proc files is negligible: typically under 0.1% CPU and zero network overhead, because the data never leaves the host until the agent sends its compressed metric payload to the central platform.

# The raw counters that power per-second detection
# Packets received (all protocols)
cat /proc/net/dev | awk '/eth0/{print $3}'

# TCP segment counters (InSegs, OutSegs, RetransSegs, InErrs)
awk '/^Tcp:/{getline;print}' /proc/net/snmp

# UDP datagram counters (InDatagrams, NoPorts, InErrors, OutDatagrams)
awk '/^Udp:/{getline;print}' /proc/net/snmp

# SYN cookie activity (indicates SYN flood pressure)
awk '/^TcpExt:/{split($0,h);getline;split($0,v);for(i in h)if(h[i]~/Syncookies/)printf "%s=%s\n",h[i],v[i]}' /proc/net/snmp

The agent computes deltas between consecutive readings to derive per-second rates: PPS, BPS, new connections per second, SYN cookie rate, and conntrack utilization. These derived metrics feed directly into the baseline comparison engine. Because the computation happens on the agent, only the final metrics (not raw counters) need to be transmitted, keeping network overhead minimal even at scale.

How Flowtriq Achieves Sub-2-Second Detection

Flowtriq's detection pipeline has three stages, each optimized for minimal latency. Stage one is data collection: the node agent reads kernel counters every second, computes deltas, and evaluates them against the local baseline model. This takes under 5 milliseconds. Stage two is anomaly evaluation: if any metric exceeds the baseline threshold, the agent immediately classifies the attack vector based on which specific counters are elevated (SYN cookies for SYN floods, UDP InDatagrams for UDP floods, ICMP InMsgs for ICMP floods). Classification takes under 1 millisecond. Stage three is alert dispatch: the agent sends the incident to the Flowtriq platform, which triggers notifications to configured channels in parallel.

The total pipeline latency from anomalous packet arrival to alert delivery is typically under 2 seconds. The agent detects within 1 second (at the next polling cycle), classification is instant, and alert delivery to Slack, Discord, or webhook endpoints completes within the next second. PagerDuty and SMS may add an additional 1 to 3 seconds due to third-party API latency, but the incident is already created and visible in the dashboard within that first 2-second window.

The detection model runs entirely on the agent. This means detection works even if the network path to the Flowtriq platform is temporarily disrupted by the attack itself. The agent queues incidents locally and delivers them when connectivity is restored.

Scaling Considerations for Large Deployments

Per-second detection on a single server is straightforward. Scaling it to hundreds or thousands of nodes introduces challenges in data volume, baseline management, and alert correlation. A fleet of 500 nodes, each sending 10 metrics per second, produces 5,000 data points per second. That is manageable for any modern time-series backend. At 5,000 nodes, the rate climbs to 50,000 per second, which requires careful architecture but is well within the capabilities of purpose-built metric ingestion systems.

Baseline management at scale benefits from hierarchical modeling. Each node maintains its own per-node baseline, but fleet-wide patterns (such as a game patch causing all servers to spike simultaneously) are recognized at the platform level. This prevents a coordinated legitimate event from triggering 500 individual alerts. Instead, the platform identifies it as a fleet-wide traffic change and adjusts thresholds accordingly, or surfaces it as a single correlated event rather than hundreds of independent incidents.

Alert correlation is critical at scale. When a carpet-bombing attack hits an entire subnet, individual per-node alerts are less useful than a single correlated alert that says "47 nodes in 198.51.100.0/24 are experiencing elevated UDP traffic from diverse sources." Flowtriq groups related incidents by subnet, attack vector, and timing to present operators with actionable, consolidated alerts rather than a wall of individual notifications.

Detection in under 2 seconds, at any scale

Flowtriq's per-second analysis catches attacks before they impact your users. Deploy on bare metal, VMs, or cloud instances. $9.99/node/month with a free 7-day trial.

Start your free trial →

Back to Blog

Real-time DDoS detection
at scale

Why Detection Speed Matters

Per-Second vs Per-Minute Analysis

Architecture of Real-Time Detection

How Flowtriq Achieves Sub-2-Second Detection

Scaling Considerations for Large Deployments

Detection in under 2 seconds, at any scale

Related Articles

Real-time DDoS detectionat scale

Why Detection Speed Matters

Per-Second vs Per-Minute Analysis

Architecture of Real-Time Detection

How Flowtriq Achieves Sub-2-Second Detection

Scaling Considerations for Large Deployments

Detection in under 2 seconds, at any scale

Related Articles

Real-time DDoS protection: why every second counts

What 47,000 PPS looks like in /proc/net/snmp

Why node-level detection catches what network monitoring misses

Real-time DDoS detection
at scale