Why Prometheus for DDoS Metrics
Prometheus has become the default time-series database for infrastructure monitoring. If your team already uses it for CPU, memory, and network interface metrics, adding DDoS telemetry from Flowtriq means your attack data lives in the same query language, the same dashboards, and the same alerting pipeline as everything else. No separate tool, no context switching.
Flowtriq exposes a /metrics endpoint on the agent that speaks native Prometheus exposition format. Every metric includes labels for node, target IP, attack vector, and severity. This means you can slice and dice DDoS data the same way you would any other Prometheus metric: by label selector, by aggregation, by recording rule.
In this guide, we will walk through the complete setup: configuring the Prometheus scrape target, exploring every available metric, building recording rules for efficient long-term queries, writing alert rules that catch attacks before they escalate, and tuning retention for DDoS data that can be bursty and high-cardinality.
Prerequisites
Before you begin, make sure you have:
- A running Prometheus instance (version 2.40 or newer recommended)
- At least one Flowtriq agent deployed and reporting traffic data
- Network connectivity from your Prometheus server to the Flowtriq agent on port 9145 (the default metrics port)
If your Flowtriq agents run behind a firewall, you will need to allow inbound TCP on port 9145 from the Prometheus server. Alternatively, you can use a Prometheus push gateway, but we recommend the pull model for reliability and simplicity.
Step 1: Enable the Metrics Endpoint on the Agent
The Flowtriq agent exposes Prometheus metrics by default on port 9145. You can verify this by curling the endpoint directly from the machine running the agent:
curl http://localhost:9145/metrics
You should see output in Prometheus exposition format with lines like:
# HELP flowtriq_traffic_pps Current packets per second observed by the agent
# TYPE flowtriq_traffic_pps gauge
flowtriq_traffic_pps{node="edge-router-01"} 142389
# HELP flowtriq_traffic_bps Current bits per second observed by the agent
# TYPE flowtriq_traffic_bps gauge
flowtriq_traffic_bps{node="edge-router-01"} 891204352
If the endpoint is not responding, check that the agent configuration has metrics_enabled: true (this is the default). You can also change the port by setting metrics_port in the agent config file:
# /etc/flowtriq/agent.yml metrics_enabled: true metrics_port: 9145 metrics_path: /metrics
Step 2: Add the Scrape Target to Prometheus
Add a new scrape job to your prometheus.yml configuration file. Here is a basic static configuration for a single Flowtriq agent:
scrape_configs:
- job_name: 'flowtriq'
scrape_interval: 15s
scrape_timeout: 10s
static_configs:
- targets:
- 'flowtriq-agent-01:9145'
- 'flowtriq-agent-02:9145'
- 'flowtriq-agent-03:9145'
labels:
environment: 'production'
We recommend a 15-second scrape interval for DDoS metrics. This provides enough granularity to capture attack ramp-up patterns without overloading your Prometheus storage. For high-traffic environments with 10+ agents, you can increase the interval to 30 seconds.
Using Service Discovery
If your agents are deployed in a dynamic environment (Kubernetes, Consul, EC2), use service discovery instead of static targets. Here is an example using Consul service discovery:
scrape_configs:
- job_name: 'flowtriq'
scrape_interval: 15s
consul_sd_configs:
- server: 'consul.internal:8500'
services:
- 'flowtriq-agent'
relabel_configs:
- source_labels: [__meta_consul_tags]
regex: '.*,production,.*'
action: keep
For Kubernetes deployments, Flowtriq agents include Prometheus annotations by default. If you are using the Prometheus Operator or kube-prometheus-stack, the ServiceMonitor CRD will auto-discover agents based on the flowtriq.com/scrape: "true" annotation.
Step 3: Available Metrics Reference
Flowtriq exposes the following metrics. All metrics include a node label identifying the reporting agent. Attack-specific metrics include additional labels for target_ip, vector, and severity.
Traffic Metrics (Gauges)
flowtriq_traffic_pps: Current packets per second observed by the agent. Labels:node.flowtriq_traffic_bps: Current bits per second observed by the agent. Labels:node.flowtriq_traffic_flows: Number of active flows currently tracked. Labels:node.
Attack Metrics (Gauges)
flowtriq_attack_pps: Packets per second attributed to the current attack. Labels:node,target_ip,vector,severity.flowtriq_attack_bps: Bits per second attributed to the current attack. Labels:node,target_ip,vector,severity.flowtriq_active_attacks: Number of attacks currently in progress. Labels:node.
Incident Counters
flowtriq_incidents_total: Counter of total incidents detected since agent start. Labels:node,severity.flowtriq_incidents_mitigated_total: Counter of incidents where automated mitigation was triggered. Labels:node,mitigation_type.
Agent Health Metrics
flowtriq_agent_uptime_seconds: Seconds since the agent process started. Labels:node.flowtriq_agent_flow_sources: Number of active flow sources (sFlow/NetFlow collectors). Labels:node.flowtriq_agent_last_report_timestamp: Unix timestamp of the last successful report to the Flowtriq API. Labels:node.
All gauge metrics are updated every scrape interval. Counter metrics are monotonically increasing and reset only when the agent restarts. Use
rate()orincrease()for counter metrics in your queries.
Step 4: Recording Rules for Efficient Queries
DDoS metrics can be high-cardinality during an attack because each unique combination of target_ip and vector creates a new time series. Recording rules pre-compute aggregations so your dashboards load quickly even when querying weeks of data.
Create a file called flowtriq_recording_rules.yml and add it to your Prometheus rule files configuration:
groups:
- name: flowtriq_recordings
interval: 30s
rules:
# Total traffic across all nodes
- record: flowtriq:traffic_pps:sum
expr: sum(flowtriq_traffic_pps) by ()
- record: flowtriq:traffic_bps:sum
expr: sum(flowtriq_traffic_bps) by ()
# Per-node attack traffic
- record: flowtriq:attack_pps:sum_by_node
expr: sum(flowtriq_attack_pps) by (node)
- record: flowtriq:attack_bps:sum_by_node
expr: sum(flowtriq_attack_bps) by (node)
# Incident rate over 5 minutes
- record: flowtriq:incidents:rate5m
expr: sum(rate(flowtriq_incidents_total[5m])) by (severity)
# Active attacks across the fleet
- record: flowtriq:active_attacks:sum
expr: sum(flowtriq_active_attacks) by ()
Reference these recording rules in your Grafana dashboards instead of raw queries. The difference in query performance is significant when you have 50+ agents or months of retention.
Step 5: Alert Rules for DDoS Detection
While Flowtriq has its own alerting pipeline (Slack, PagerDuty, email, webhooks), some teams prefer to centralize all alerting in Prometheus Alertmanager. Here are production-tested alert rules for common DDoS scenarios:
groups:
- name: flowtriq_alerts
rules:
# Alert when any node has an active attack lasting more than 2 minutes
- alert: FlowtriqActiveAttack
expr: flowtriq_active_attacks > 0
for: 2m
labels:
severity: warning
annotations:
summary: "Active DDoS attack on {{ $labels.node }}"
description: "Node {{ $labels.node }} has {{ $value }} active attack(s) for more than 2 minutes."
# Alert when attack traffic exceeds 1 Gbps on any node
- alert: FlowtriqHighBandwidthAttack
expr: sum(flowtriq_attack_bps) by (node) > 1e9
for: 1m
labels:
severity: critical
annotations:
summary: "High bandwidth attack on {{ $labels.node }}"
description: "Attack traffic on {{ $labels.node }} is {{ humanize $value }}bps."
# Alert when incident rate spikes (more than 5 new incidents in 5 minutes)
- alert: FlowtriqIncidentSpike
expr: sum(increase(flowtriq_incidents_total[5m])) by (node) > 5
for: 0m
labels:
severity: warning
annotations:
summary: "Incident spike on {{ $labels.node }}"
description: "{{ $value }} incidents detected in the last 5 minutes on {{ $labels.node }}."
# Alert when agent stops reporting (stale metrics)
- alert: FlowtriqAgentDown
expr: time() - flowtriq_agent_last_report_timestamp > 300
for: 2m
labels:
severity: critical
annotations:
summary: "Flowtriq agent {{ $labels.node }} is not reporting"
description: "Agent on {{ $labels.node }} has not reported to the API in over 5 minutes."
The FlowtriqAgentDown rule is particularly important. If an agent goes silent during an attack, you lose visibility into that node. This rule ensures you are notified within minutes of an agent failure, regardless of whether an attack is in progress.
Step 6: Retention and Storage Tuning
DDoS metrics are bursty. During normal operations, you may have 10 to 20 time series per agent. During a multi-vector attack targeting dozens of IPs, that number can spike to several hundred series per agent because each unique target_ip and vector combination creates a new series.
Here are our recommendations for Prometheus storage configuration:
- Retention period: 30 days for raw metrics. Use recording rules and remote write to a long-term store (Thanos, Cortex, or VictoriaMetrics) for anything longer.
- Chunk encoding: Leave the default (XOR for floats). DDoS gauge metrics compress well because they change frequently.
- WAL compression: Enable
--storage.tsdb.wal-compressionto reduce disk I/O during attack spikes. - Series limit: If you are concerned about cardinality explosions, set
--storage.tsdb.max-block-duration=2hand monitorprometheus_tsdb_head_seriesduring incidents.
For a fleet of 10 Flowtriq agents with a 15-second scrape interval, expect roughly 500 MB to 2 GB of storage per month depending on attack frequency. Fleets with frequent attacks or many targeted IPs will be on the higher end.
Remote Write to Long-Term Storage
For compliance or post-incident forensics, you may need DDoS metrics retained for 6 to 12 months. Use Prometheus remote write to send Flowtriq metrics to a long-term backend:
remote_write:
- url: "http://thanos-receive:19291/api/v1/receive"
write_relabel_configs:
- source_labels: [__name__]
regex: 'flowtriq_.*'
action: keep
The relabel config ensures only Flowtriq metrics are forwarded, keeping your long-term storage costs predictable.
Verifying the Setup
After adding the scrape config and reloading Prometheus, verify that targets are being scraped successfully:
- Open the Prometheus web UI at
http://your-prometheus:9090/targets - Look for the
flowtriqjob. All targets should show a green "UP" state. - Run a test query in the expression browser:
flowtriq_traffic_pps - Verify that the recording rules are being evaluated: check
http://your-prometheus:9090/rules
If a target shows "DOWN", verify network connectivity to port 9145 and check that the agent process is running. The most common issue is a firewall rule blocking the scrape port.
Building Grafana Dashboards from These Metrics
Once your metrics are flowing into Prometheus, the next step is visualization. We publish a pre-built Grafana dashboard that uses the recording rules defined above. You can import it from the Flowtriq documentation or build your own panels using the metric names and labels described in this guide.
Key panels we recommend for a DDoS overview dashboard:
- Fleet traffic overview:
flowtriq:traffic_bps:sumas a time series panel with Gbps unit formatting. - Active attacks:
flowtriq:active_attacks:sumas a stat panel with color thresholds (green for 0, red for 1+). - Incident rate:
flowtriq:incidents:rate5mas a bar chart broken down by severity. - Per-node attack bandwidth:
flowtriq:attack_bps:sum_by_nodeas a stacked time series to identify which nodes are under heaviest attack. - Agent health: A table showing each node's uptime, last report time, and flow source count.
For a detailed walkthrough of the Grafana dashboard setup, see our Grafana DDoS dashboard guide.
Tip: Combine Prometheus metrics with Flowtriq's native Slack alerts and PagerDuty escalation for a complete observability and alerting pipeline. Prometheus gives you the metrics layer; Flowtriq's integrations give you the notification layer.
Prometheus metric export is available on all Flowtriq plans starting at $9.99/node/month. Every agent includes the metrics endpoint at no additional cost. Start your free trial and have DDoS metrics flowing into Prometheus within minutes.
Back to Blog