Back to Blog

Why Prometheus for DDoS Metrics

Prometheus has become the default time-series database for infrastructure monitoring. If your team already uses it for CPU, memory, and network interface metrics, adding DDoS telemetry from Flowtriq means your attack data lives in the same query language, the same dashboards, and the same alerting pipeline as everything else. No separate tool, no context switching.

Flowtriq exposes a /metrics endpoint on the agent that speaks native Prometheus exposition format. Every metric includes labels for node, target IP, attack vector, and severity. This means you can slice and dice DDoS data the same way you would any other Prometheus metric: by label selector, by aggregation, by recording rule.

In this guide, we will walk through the complete setup: configuring the Prometheus scrape target, exploring every available metric, building recording rules for efficient long-term queries, writing alert rules that catch attacks before they escalate, and tuning retention for DDoS data that can be bursty and high-cardinality.

Prerequisites

Before you begin, make sure you have:

  • A running Prometheus instance (version 2.40 or newer recommended)
  • At least one Flowtriq agent deployed and reporting traffic data
  • Network connectivity from your Prometheus server to the Flowtriq agent on port 9145 (the default metrics port)

If your Flowtriq agents run behind a firewall, you will need to allow inbound TCP on port 9145 from the Prometheus server. Alternatively, you can use a Prometheus push gateway, but we recommend the pull model for reliability and simplicity.

Step 1: Enable the Metrics Endpoint on the Agent

The Flowtriq agent exposes Prometheus metrics by default on port 9145. You can verify this by curling the endpoint directly from the machine running the agent:

curl http://localhost:9145/metrics

You should see output in Prometheus exposition format with lines like:

# HELP flowtriq_traffic_pps Current packets per second observed by the agent
# TYPE flowtriq_traffic_pps gauge
flowtriq_traffic_pps{node="edge-router-01"} 142389

# HELP flowtriq_traffic_bps Current bits per second observed by the agent
# TYPE flowtriq_traffic_bps gauge
flowtriq_traffic_bps{node="edge-router-01"} 891204352

If the endpoint is not responding, check that the agent configuration has metrics_enabled: true (this is the default). You can also change the port by setting metrics_port in the agent config file:

# /etc/flowtriq/agent.yml
metrics_enabled: true
metrics_port: 9145
metrics_path: /metrics

Step 2: Add the Scrape Target to Prometheus

Add a new scrape job to your prometheus.yml configuration file. Here is a basic static configuration for a single Flowtriq agent:

scrape_configs:
  - job_name: 'flowtriq'
    scrape_interval: 15s
    scrape_timeout: 10s
    static_configs:
      - targets:
          - 'flowtriq-agent-01:9145'
          - 'flowtriq-agent-02:9145'
          - 'flowtriq-agent-03:9145'
        labels:
          environment: 'production'

We recommend a 15-second scrape interval for DDoS metrics. This provides enough granularity to capture attack ramp-up patterns without overloading your Prometheus storage. For high-traffic environments with 10+ agents, you can increase the interval to 30 seconds.

Using Service Discovery

If your agents are deployed in a dynamic environment (Kubernetes, Consul, EC2), use service discovery instead of static targets. Here is an example using Consul service discovery:

scrape_configs:
  - job_name: 'flowtriq'
    scrape_interval: 15s
    consul_sd_configs:
      - server: 'consul.internal:8500'
        services:
          - 'flowtriq-agent'
    relabel_configs:
      - source_labels: [__meta_consul_tags]
        regex: '.*,production,.*'
        action: keep

For Kubernetes deployments, Flowtriq agents include Prometheus annotations by default. If you are using the Prometheus Operator or kube-prometheus-stack, the ServiceMonitor CRD will auto-discover agents based on the flowtriq.com/scrape: "true" annotation.

Step 3: Available Metrics Reference

Flowtriq exposes the following metrics. All metrics include a node label identifying the reporting agent. Attack-specific metrics include additional labels for target_ip, vector, and severity.

Traffic Metrics (Gauges)

  • flowtriq_traffic_pps: Current packets per second observed by the agent. Labels: node.
  • flowtriq_traffic_bps: Current bits per second observed by the agent. Labels: node.
  • flowtriq_traffic_flows: Number of active flows currently tracked. Labels: node.

Attack Metrics (Gauges)

  • flowtriq_attack_pps: Packets per second attributed to the current attack. Labels: node, target_ip, vector, severity.
  • flowtriq_attack_bps: Bits per second attributed to the current attack. Labels: node, target_ip, vector, severity.
  • flowtriq_active_attacks: Number of attacks currently in progress. Labels: node.

Incident Counters

  • flowtriq_incidents_total: Counter of total incidents detected since agent start. Labels: node, severity.
  • flowtriq_incidents_mitigated_total: Counter of incidents where automated mitigation was triggered. Labels: node, mitigation_type.

Agent Health Metrics

  • flowtriq_agent_uptime_seconds: Seconds since the agent process started. Labels: node.
  • flowtriq_agent_flow_sources: Number of active flow sources (sFlow/NetFlow collectors). Labels: node.
  • flowtriq_agent_last_report_timestamp: Unix timestamp of the last successful report to the Flowtriq API. Labels: node.

All gauge metrics are updated every scrape interval. Counter metrics are monotonically increasing and reset only when the agent restarts. Use rate() or increase() for counter metrics in your queries.

Step 4: Recording Rules for Efficient Queries

DDoS metrics can be high-cardinality during an attack because each unique combination of target_ip and vector creates a new time series. Recording rules pre-compute aggregations so your dashboards load quickly even when querying weeks of data.

Create a file called flowtriq_recording_rules.yml and add it to your Prometheus rule files configuration:

groups:
  - name: flowtriq_recordings
    interval: 30s
    rules:
      # Total traffic across all nodes
      - record: flowtriq:traffic_pps:sum
        expr: sum(flowtriq_traffic_pps) by ()

      - record: flowtriq:traffic_bps:sum
        expr: sum(flowtriq_traffic_bps) by ()

      # Per-node attack traffic
      - record: flowtriq:attack_pps:sum_by_node
        expr: sum(flowtriq_attack_pps) by (node)

      - record: flowtriq:attack_bps:sum_by_node
        expr: sum(flowtriq_attack_bps) by (node)

      # Incident rate over 5 minutes
      - record: flowtriq:incidents:rate5m
        expr: sum(rate(flowtriq_incidents_total[5m])) by (severity)

      # Active attacks across the fleet
      - record: flowtriq:active_attacks:sum
        expr: sum(flowtriq_active_attacks) by ()

Reference these recording rules in your Grafana dashboards instead of raw queries. The difference in query performance is significant when you have 50+ agents or months of retention.

Step 5: Alert Rules for DDoS Detection

While Flowtriq has its own alerting pipeline (Slack, PagerDuty, email, webhooks), some teams prefer to centralize all alerting in Prometheus Alertmanager. Here are production-tested alert rules for common DDoS scenarios:

groups:
  - name: flowtriq_alerts
    rules:
      # Alert when any node has an active attack lasting more than 2 minutes
      - alert: FlowtriqActiveAttack
        expr: flowtriq_active_attacks > 0
        for: 2m
        labels:
          severity: warning
        annotations:
          summary: "Active DDoS attack on {{ $labels.node }}"
          description: "Node {{ $labels.node }} has {{ $value }} active attack(s) for more than 2 minutes."

      # Alert when attack traffic exceeds 1 Gbps on any node
      - alert: FlowtriqHighBandwidthAttack
        expr: sum(flowtriq_attack_bps) by (node) > 1e9
        for: 1m
        labels:
          severity: critical
        annotations:
          summary: "High bandwidth attack on {{ $labels.node }}"
          description: "Attack traffic on {{ $labels.node }} is {{ humanize $value }}bps."

      # Alert when incident rate spikes (more than 5 new incidents in 5 minutes)
      - alert: FlowtriqIncidentSpike
        expr: sum(increase(flowtriq_incidents_total[5m])) by (node) > 5
        for: 0m
        labels:
          severity: warning
        annotations:
          summary: "Incident spike on {{ $labels.node }}"
          description: "{{ $value }} incidents detected in the last 5 minutes on {{ $labels.node }}."

      # Alert when agent stops reporting (stale metrics)
      - alert: FlowtriqAgentDown
        expr: time() - flowtriq_agent_last_report_timestamp > 300
        for: 2m
        labels:
          severity: critical
        annotations:
          summary: "Flowtriq agent {{ $labels.node }} is not reporting"
          description: "Agent on {{ $labels.node }} has not reported to the API in over 5 minutes."

The FlowtriqAgentDown rule is particularly important. If an agent goes silent during an attack, you lose visibility into that node. This rule ensures you are notified within minutes of an agent failure, regardless of whether an attack is in progress.

Step 6: Retention and Storage Tuning

DDoS metrics are bursty. During normal operations, you may have 10 to 20 time series per agent. During a multi-vector attack targeting dozens of IPs, that number can spike to several hundred series per agent because each unique target_ip and vector combination creates a new series.

Here are our recommendations for Prometheus storage configuration:

  • Retention period: 30 days for raw metrics. Use recording rules and remote write to a long-term store (Thanos, Cortex, or VictoriaMetrics) for anything longer.
  • Chunk encoding: Leave the default (XOR for floats). DDoS gauge metrics compress well because they change frequently.
  • WAL compression: Enable --storage.tsdb.wal-compression to reduce disk I/O during attack spikes.
  • Series limit: If you are concerned about cardinality explosions, set --storage.tsdb.max-block-duration=2h and monitor prometheus_tsdb_head_series during incidents.

For a fleet of 10 Flowtriq agents with a 15-second scrape interval, expect roughly 500 MB to 2 GB of storage per month depending on attack frequency. Fleets with frequent attacks or many targeted IPs will be on the higher end.

Remote Write to Long-Term Storage

For compliance or post-incident forensics, you may need DDoS metrics retained for 6 to 12 months. Use Prometheus remote write to send Flowtriq metrics to a long-term backend:

remote_write:
  - url: "http://thanos-receive:19291/api/v1/receive"
    write_relabel_configs:
      - source_labels: [__name__]
        regex: 'flowtriq_.*'
        action: keep

The relabel config ensures only Flowtriq metrics are forwarded, keeping your long-term storage costs predictable.

Verifying the Setup

After adding the scrape config and reloading Prometheus, verify that targets are being scraped successfully:

  1. Open the Prometheus web UI at http://your-prometheus:9090/targets
  2. Look for the flowtriq job. All targets should show a green "UP" state.
  3. Run a test query in the expression browser: flowtriq_traffic_pps
  4. Verify that the recording rules are being evaluated: check http://your-prometheus:9090/rules

If a target shows "DOWN", verify network connectivity to port 9145 and check that the agent process is running. The most common issue is a firewall rule blocking the scrape port.

Building Grafana Dashboards from These Metrics

Once your metrics are flowing into Prometheus, the next step is visualization. We publish a pre-built Grafana dashboard that uses the recording rules defined above. You can import it from the Flowtriq documentation or build your own panels using the metric names and labels described in this guide.

Key panels we recommend for a DDoS overview dashboard:

  • Fleet traffic overview: flowtriq:traffic_bps:sum as a time series panel with Gbps unit formatting.
  • Active attacks: flowtriq:active_attacks:sum as a stat panel with color thresholds (green for 0, red for 1+).
  • Incident rate: flowtriq:incidents:rate5m as a bar chart broken down by severity.
  • Per-node attack bandwidth: flowtriq:attack_bps:sum_by_node as a stacked time series to identify which nodes are under heaviest attack.
  • Agent health: A table showing each node's uptime, last report time, and flow source count.

For a detailed walkthrough of the Grafana dashboard setup, see our Grafana DDoS dashboard guide.

Tip: Combine Prometheus metrics with Flowtriq's native Slack alerts and PagerDuty escalation for a complete observability and alerting pipeline. Prometheus gives you the metrics layer; Flowtriq's integrations give you the notification layer.

Prometheus metric export is available on all Flowtriq plans starting at $9.99/node/month. Every agent includes the metrics endpoint at no additional cost. Start your free trial and have DDoS metrics flowing into Prometheus within minutes.

Back to Blog

Related Articles