Back to Blog

Why DDoS Detection Belongs in Your Existing Stack

Most organizations run a monitoring stack they have spent months or years tuning. Prometheus scrapes metrics every 15 seconds. Grafana dashboards are pinned to every NOC screen. PagerDuty routes alerts to the right on-call engineer based on service ownership. Datadog anomaly detectors learn normal baselines over weeks of historical data.

Then DDoS detection arrives as a separate product with its own dashboard, its own alerting rules, and its own notification channels. Suddenly your team has two panes of glass instead of one. When an attack hits, someone has to check the DDoS tool, then switch to Grafana to see how application performance was affected, then open PagerDuty to coordinate the response. Context is lost at every tab switch.

The better approach is to treat DDoS metrics like any other infrastructure signal. Push them into Prometheus. Graph them in Grafana alongside your application latency and error rates. Route alerts through the same Alertmanager or Datadog monitors your team already trusts. When a volumetric flood hits, the same dashboard that shows your CPU spike and your 502 error rate also shows the 15 Mpps attack that caused both.

This is not just about convenience. Correlation is the real benefit. An on-call engineer who sees a latency spike alongside a PPS spike immediately understands the root cause. Without that correlation, they might spend 20 minutes investigating application code before someone thinks to check the DDoS dashboard.

The Metrics That Matter for DDoS Monitoring

Before wiring anything together, you need to know which metrics to collect. Not every network counter is useful for DDoS detection. Focus on these six:

  • Packets per second (PPS): The single most important DDoS metric. Volumetric attacks are defined by packet rate. A SYN flood that sends 10 million packets per second at 40 bytes each is only 3.2 Gbps in bandwidth but can exhaust connection tracking tables and CPU on any server.
  • Bits per second (BPS): Raw bandwidth consumption. Amplification attacks (DNS, NTP, memcached, CLDAP) generate massive BPS because each reflected packet is large. A 100 Gbps DNS amplification flood will saturate your uplink long before it exhausts your CPU.
  • Connection rate (new connections/sec): Tracks the rate of new TCP connections. SYN floods and HTTP floods drive this metric through the roof. Normal web traffic might create 500 new connections per second; an attack creates 500,000.
  • SYN-RECV count: The number of half-open TCP connections sitting in the SYN-RECV state. This is the definitive indicator of a SYN flood. A healthy server has fewer than 100 SYN-RECV sockets at any given time. During an attack, this number climbs to tens of thousands.
  • Conntrack utilization: Linux connection tracking (conntrack) has a finite table size. When it fills up, the kernel starts dropping legitimate connections. Monitoring nf_conntrack_count vs nf_conntrack_max tells you how close you are to that cliff.
  • Per-protocol breakdown: Knowing total PPS is useful, but knowing PPS broken down by protocol (TCP, UDP, ICMP, GRE) is far more useful. A sudden spike in UDP traffic from zero to 5 Mpps is almost certainly an amplification attack.

Prometheus + Grafana: The Open-Source Approach

If you run Prometheus and Grafana, you already have most of what you need. The node_exporter that is probably already running on your servers exposes network interface counters that serve as the foundation for DDoS detection metrics.

Key Prometheus Metrics

These metrics come from node_exporter out of the box:

  • node_network_receive_packets_total - total received packets (counter)
  • node_network_receive_bytes_total - total received bytes (counter)
  • node_network_transmit_packets_total - total transmitted packets (counter)
  • node_netstat_Tcp_CurrEstab - current established TCP connections
  • node_nf_conntrack_entries - current conntrack table entries
  • node_nf_conntrack_entries_limit - conntrack table max size

To get PPS and BPS as rates, use PromQL's rate() function. Here are the queries you will use most often:

# Packets per second (inbound) per interface
rate(node_network_receive_packets_total{device="eth0"}[1m])

# Bits per second (inbound) - multiply bytes by 8
rate(node_network_receive_bytes_total{device="eth0"}[1m]) * 8

# Conntrack utilization as a percentage
node_nf_conntrack_entries / node_nf_conntrack_entries_limit * 100

Prometheus Scrape Configuration

If you are not already scraping node_exporter, add it to your prometheus.yml. Use a 10-second scrape interval for network metrics to catch short burst attacks that a 60-second interval would miss:

# prometheus.yml
scrape_configs:
  - job_name: 'node'
    scrape_interval: 10s
    static_configs:
      - targets:
          - 'server-1:9100'
          - 'server-2:9100'
          - 'server-3:9100'
    relabel_configs:
      - source_labels: [__address__]
        regex: '(.+):9100'
        target_label: instance
        replacement: '${1}'

Alert Rules in Alertmanager

Define alert rules that fire when DDoS indicators cross your thresholds. These go in your Prometheus rules file:

# ddos_alerts.yml
groups:
  - name: ddos_detection
    rules:
      - alert: HighPacketRate
        expr: rate(node_network_receive_packets_total{device="eth0"}[1m]) > 500000
        for: 30s
        labels:
          severity: warning
          category: ddos
        annotations:
          summary: "High PPS on {{ $labels.instance }}"
          description: "{{ $labels.instance }} receiving {{ $value | humanize }} pps"

      - alert: CriticalPacketRate
        expr: rate(node_network_receive_packets_total{device="eth0"}[1m]) > 2000000
        for: 15s
        labels:
          severity: critical
          category: ddos
        annotations:
          summary: "Possible DDoS on {{ $labels.instance }}"
          description: "{{ $labels.instance }} receiving {{ $value | humanize }} pps"

      - alert: BandwidthFlood
        expr: rate(node_network_receive_bytes_total{device="eth0"}[1m]) * 8 > 5e9
        for: 15s
        labels:
          severity: critical
          category: ddos
        annotations:
          summary: "Bandwidth flood on {{ $labels.instance }}"
          description: "{{ $labels.instance }} receiving {{ $value | humanize }}bps"

      - alert: ConntrackNearFull
        expr: node_nf_conntrack_entries / node_nf_conntrack_entries_limit > 0.85
        for: 1m
        labels:
          severity: warning
          category: ddos
        annotations:
          summary: "Conntrack table filling on {{ $labels.instance }}"
          description: "Conntrack at {{ $value | humanizePercentage }}"

Route these alerts through Alertmanager to your existing notification channels. The category: ddos label lets you create specific routing rules if you want DDoS alerts to go to a network engineering channel rather than the general on-call rotation.

Grafana Dashboard

Build a dedicated DDoS panel row in your existing infrastructure dashboard, or create a standalone DDoS dashboard. Here is a Grafana dashboard JSON snippet for a PPS time series panel:

{
  "title": "Inbound PPS by Node",
  "type": "timeseries",
  "datasource": "Prometheus",
  "targets": [{
    "expr": "rate(node_network_receive_packets_total{device='eth0'}[1m])",
    "legendFormat": "{{ instance }}"
  }],
  "fieldConfig": {
    "defaults": {
      "unit": "pps",
      "thresholds": {
        "steps": [
          { "color": "green", "value": null },
          { "color": "yellow", "value": 500000 },
          { "color": "red", "value": 2000000 }
        ]
      }
    }
  }
}

Add attack event annotations from your DDoS detection tool. If your tool fires webhooks, you can write those events to a database that Grafana queries as an annotation source. Seeing a red vertical line on your latency graph at the exact moment an attack started is worth a thousand words.

Datadog: The Managed Approach

If your stack runs on Datadog, the integration is even simpler. The Datadog Agent already collects the network metrics you need. The key metric is system.net.packets_in.count, which provides inbound packet counts per interface.

Datadog Monitor for DDoS Detection

Create an anomaly detection monitor that learns your normal PPS baseline and alerts when traffic deviates significantly. This is more robust than a static threshold because it adapts to your traffic patterns:

{
  "name": "DDoS Detection - Anomalous Inbound PPS",
  "type": "query alert",
  "query": "avg(last_5m):anomalies(avg:system.net.packets_in.count{*}.as_rate(), 'agile', 3, direction='above') >= 1",
  "message": "Anomalous inbound packet rate detected on {{host.name}}. Current PPS is significantly above the learned baseline. Investigate for possible DDoS attack.\n\n@pagerduty-network-team @slack-noc-alerts",
  "options": {
    "thresholds": { "critical": 1.0 },
    "notify_no_data": false,
    "renotify_interval": 300
  }
}

For custom DDoS classification metrics (attack type, source count, protocol breakdown), use DogStatsD to push them from your detection tool:

# Push custom DDoS metrics via DogStatsD
echo "ddos.attack.pps:1500000|g|#attack_type:syn_flood,node:server-1" \
  | nc -u -w1 127.0.0.1 8125

echo "ddos.attack.sources:4200|g|#attack_type:syn_flood,node:server-1" \
  | nc -u -w1 127.0.0.1 8125

Build a Datadog dashboard with widgets for aggregate PPS across all hosts, per-host PPS breakdown, top attack sources, and active incident count. Overlay attack events on your APM service maps to see exactly which services were affected by each attack.

Generic Webhook Integration

Most DDoS detection tools, including Flowtriq, can send structured webhooks when attacks are detected, escalated, or resolved. This is the universal integration method that works with any downstream system.

Flowtriq sends webhook payloads with complete attack context:

{
  "event": "attack.detected",
  "timestamp": "2026-03-15T14:32:07Z",
  "node": "edge-us-east-1",
  "target_ip": "198.51.100.10",
  "attack_type": "udp_amplification",
  "sub_type": "dns_reflection",
  "pps": 8500000,
  "bps": 12800000000,
  "source_count": 4217,
  "top_sources": [
    {"ip": "203.0.113.50", "pps": 42000, "country": "CN"},
    {"ip": "203.0.113.51", "pps": 38000, "country": "RU"}
  ],
  "protocol_breakdown": {
    "udp": 97.2,
    "tcp": 2.1,
    "icmp": 0.7
  },
  "severity": "critical",
  "mitigation_level": 2
}

Forward this payload to any system that accepts webhooks:

  • Splunk: Send to an HTTP Event Collector (HEC) endpoint. Attack events become searchable log entries with full field extraction.
  • Elasticsearch / ELK: POST to an Elasticsearch index. Build Kibana dashboards for attack history, source analysis, and trend visualization.
  • PagerDuty: Trigger incidents via the Events API v2. Include attack details in the custom_details field so responders have full context without leaving PagerDuty.
  • OpsGenie: Create alerts with priority mapped from severity. Attach attack metadata as extra properties.
  • Custom SIEM: Any system with an HTTP ingest endpoint can receive these webhooks. Parse the JSON, enrich with your own asset inventory, and correlate with other security events.

Webhook-based integration is fire-and-forget. Your DDoS tool pushes events as they happen. No polling, no API rate limits, no scrape intervals. The downstream system receives attack data within seconds of detection.

Building a DDoS-Specific Grafana Dashboard

Whether you use Prometheus, Datadog, or another data source, here is the dashboard layout that works best for DDoS monitoring. Think of it as five panels arranged top to bottom:

Panel 1: Aggregate PPS Across All Nodes

A single time series showing total inbound PPS summed across your entire fleet. This is the "is anything on fire" panel. One glance tells you whether overall traffic is normal.

# PromQL: total inbound PPS across all nodes
sum(rate(node_network_receive_packets_total{device="eth0"}[1m]))

Panel 2: Per-Node PPS Heatmap

A heatmap where each row is a node and color intensity represents PPS. This immediately shows which node is under attack without reading individual graphs. Use Grafana's heatmap panel type with the following query:

# PromQL: per-node PPS for heatmap
rate(node_network_receive_packets_total{device="eth0"}[1m])

Panel 3: Active Incidents Table

A table panel populated from your DDoS tool's API or a webhook-fed database. Columns: node, target IP, attack type, PPS, BPS, duration, mitigation level, status. This gives your NOC a live incident board without switching away from Grafana.

Panel 4: Attack Type Distribution

A pie chart or bar chart showing the breakdown of attack types over the selected time range. This helps you understand your threat profile: are you mostly hit by SYN floods, UDP amplification, or HTTP floods? That knowledge drives infrastructure tuning decisions.

Panel 5: Correlation Overlay

This is the most valuable panel. Overlay PPS (from your DDoS metrics) with application response time and error rate (from your APM or application metrics) on the same time axis. When an attack hits, you see three lines spike together: PPS goes up, response time goes up, and successful requests go down. This visual correlation is the entire reason you integrate DDoS detection into your existing stack.

# PromQL: overlay attack traffic with HTTP error rate
# Panel A (left Y axis): PPS
rate(node_network_receive_packets_total{device="eth0"}[1m])

# Panel B (right Y axis): HTTP 5xx rate
rate(http_requests_total{status=~"5.."}[1m])
  / rate(http_requests_total[1m]) * 100

Correlation: The Real Payoff

The entire point of integration is correlation. Here is what it looks like in practice.

Without integration, your monitoring timeline during an attack looks like this: at 14:32, your APM shows response time spiking from 50 ms to 8 seconds. At 14:33, your error rate monitor fires. At 14:35, someone checks the DDoS dashboard and discovers an attack started at 14:31. The team spends three minutes investigating application issues before realizing the root cause was network-layer.

With integration, the timeline collapses: at 14:31, a single Grafana dashboard shows PPS spiking, response time spiking, and error rate climbing, all on the same graph. The DDoS alert fires through Alertmanager with a direct link to the correlated dashboard. The on-call engineer sees all three signals simultaneously and knows within seconds that this is a network attack, not an application bug.

This correlation works in reverse too. After an attack is mitigated, you can review the Grafana dashboard to measure exact business impact: how many requests failed, how much latency increased, and how long the degradation lasted. That data feeds into post-incident reports and justifies investment in better DDoS protection.

How Flowtriq Integrates with Your Stack

Flowtriq is designed to be a data source, not a walled garden. Every detection event, metric, and alert is available through multiple integration points:

  • Webhook output: Configurable webhooks fire on attack detection, escalation, mitigation, and resolution. Send structured JSON to any endpoint: Slack, PagerDuty, OpsGenie, Splunk, ELK, or a custom receiver.
  • Structured alert data: Every webhook payload includes attack type, PPS, BPS, source breakdown, protocol distribution, and mitigation status. No parsing ambiguous text strings. Every field is machine-readable.
  • Metrics API: Pull historical and real-time metrics from Flowtriq's API into custom dashboards. Query PPS, BPS, and incident data per node, per time range, in JSON format that maps directly to Grafana data source plugins.
  • Notification channels: Native integrations with Discord, Slack, email, SMS, PagerDuty, and OpsGenie. Configure per-channel severity filters so critical attacks page your on-call while warnings go to a Slack channel.
  • Grafana annotations: Use Flowtriq webhooks to write attack events as Grafana annotations. Every attack appears as a vertical marker on your existing dashboards, making correlation effortless.

The goal is simple: Flowtriq handles the detection and classification. Your existing stack handles the visualization, alerting, and incident response workflows you have already built. No duplicate tooling, no context switching, no second dashboard.

Ready to unify your monitoring? Flowtriq's free 7-day trial includes full webhook and API access. Connect it to your Prometheus, Grafana, or Datadog instance in minutes and start correlating DDoS events with your application metrics from day one.

Back to Blog

Related Articles