Use Case
DDoS Protection Built for
GPU Cloud & AI Infrastructure
Your GPU nodes are the most expensive servers in your fleet. When a DDoS attack disrupts a training run or inference pipeline, you lose compute hours worth thousands of dollars per minute. Flowtriq detects attacks in under 1 second and auto-mitigates without interrupting GPU workloads.
The Problem
GPU server downtime is the most expensive downtime in your datacenter
A single H100 node costs $30,000 to $50,000. When a DDoS attack knocks it offline mid-training, the job fails at 67% completion and has to restart from the last checkpoint. That is $12,000 in wasted compute, hours of lost progress, and an SLA breach that puts the customer relationship at risk.
Inference APIs are even more time-sensitive. Your customers are running real-time AI applications where every millisecond of latency matters and every second of downtime triggers their own SLA penalties downstream. A volumetric flood targeting your API gateway cascades failures across every customer relying on your endpoints.
Bare-metal GPU servers cannot hide behind CDNs or WAFs. Attackers know this. High-value GPU infrastructure is increasingly targeted for ransom because the cost of downtime is so high that victims are more likely to pay.
The GPU cloud market is still early. Most providers are focused on scaling capacity and have not yet built out their security stack. Traditional DDoS solutions are designed for web applications, not bare-metal compute clusters running CUDA workloads.
09:00:12 UDP flood targeting cluster gateway
09:00:45 Network link saturated at 8Gbps
09:01:02 Training job interrupted at 67%
09:01:30 4 inference APIs timing out
09:02:00 Customer SLA violations triggered
09:08:00 Manual investigation begins
09:22:00 Attack manually mitigated
09:25:00 Training job restarted from checkpoint
Restart cost: ~$12,000 in compute
Inference downtime: 22 minutes
SLA credits owed: 4 customers
How Flowtriq Helps
GPU utilization stays at 98% while attacks are filtered at the kernel
The FTAgent runs on each GPU node, monitoring network statistics every second at the kernel level. When traffic crosses a dynamic threshold, the agent classifies the attack and applies nftables rules in the same second. Malicious packets are dropped before they reach the network stack, let alone the GPU workload.
Training jobs keep running. Inference APIs keep responding. Your customers never notice the attack because it is filtered before it can impact application-layer performance. When the attack subsides, firewall rules are automatically withdrawn.
Your operations team sees every node, every cluster, and every incident in a single dashboard. Prometheus metrics feed your existing Grafana stack. The REST API integrates with Kubernetes and SLURM orchestrators so monitoring is provisioned automatically when new GPU nodes come online.
09:00:01 PPS=94,000 BPS=4.1Gbps THRESHOLD
T+0.1s Incident opened · UDP Flood · 98%
T+0.3s Auto-mitigation · nftables rule applied
T+0.5s Alerts fired · Slack · PagerDuty
GPU utilization: 98% (unaffected)
Training job: running 67% → 68%
Inference APIs: 4/4 responding
09:00:02 PPS=3,410 BPS=128Mbps MITIGATED
09:14:00 Attack subsides · rules withdrawn
Interrupted training jobs: 0
Lost compute: $0
_
Key Features
Purpose-built for GPU cloud and AI infrastructure
Bare-metal compatible
No VM overhead, no hypervisor dependency. The FTAgent installs directly on bare-metal GPU servers and monitors network traffic at the kernel level. Works on any Linux distribution without requiring virtualization or container runtimes.
Zero GPU impact
The agent uses less than 0.1% CPU and zero GPU resources. Kernel-level packet filtering happens before traffic reaches userspace, so your CUDA and ROCm workloads run at full capacity even during active mitigation. Training jobs and inference pipelines are completely unaffected.
Auto-mitigation
When an attack is detected, Flowtriq's multi-level defense chain activates. Kernel-level firewall rules via nftables drop attack traffic instantly. For larger floods, BGP FlowSpec filters traffic at the network edge. Rules auto-withdraw when the attack ends at every level.
API endpoint protection
Deploy the agent on inference API gateways to monitor request patterns and auto-mitigate floods targeting your endpoints. Keep response times stable for customers running real-time AI applications where latency spikes trigger downstream SLA violations.
Multi-cluster monitoring
Monitor H100 clusters, A100 pools, and inference fleets from a single dashboard. Group nodes by cluster, region, or customer. Role-based access lets your infrastructure team, customer success, and end users each see exactly what they need.
PCAP forensics
Every incident includes a full packet capture starting from pre-attack traffic. Download PCAPs for forensic analysis, share them with upstream providers, or use them to identify attack patterns targeting your specific infrastructure.
Kubernetes & SLURM integration
The REST API and Terraform provider let you provision monitoring automatically when new GPU nodes come online. Integrate with Kubernetes operators or SLURM job schedulers so every node is protected from the moment it joins the cluster.
Prometheus & Grafana metrics
Export per-node traffic metrics to Prometheus for custom Grafana dashboards. Monitor bandwidth utilization, attack frequency, and mitigation effectiveness alongside your existing GPU utilization and job performance metrics.
BGP FlowSpec for upstream filtering
For volumetric attacks that exceed local link capacity, Flowtriq signals your upstream provider via BGP FlowSpec to filter traffic at the network edge. Integrates with Cloudflare Magic Transit, Path.net, and other scrubbing providers.
Flexible alerting & escalation
Route alerts to the right team at the right time. Send Slack notifications for minor incidents, page your NOC for critical attacks. Escalation policies ensure nothing falls through the cracks during off-hours when GPU jobs are running unattended.
Getting Started
Deploy across your GPU fleet in minutes
Rolling out Flowtriq to your GPU infrastructure takes less time than restarting a single failed training job. Here is how it works from signup to full coverage.
Create your workspace
Sign up at flowtriq.com and create a workspace for your GPU cloud. Add your infrastructure team with admin access. The 7-day free trial starts immediately with no credit card required.
Install the FTAgent on each GPU node
The agent installs with pip install ftagent and runs as a lightweight systemd service. It reads kernel-level network statistics with near-zero CPU overhead and zero GPU impact. Deploy across your fleet with Ansible, Terraform, or your existing provisioning pipeline.
Configure alert channels
Connect Flowtriq to your existing incident response workflow. Send alerts to Slack, Discord, PagerDuty, OpsGenie, email, SMS, or custom webhooks. Set up escalation policies so the right people get notified based on severity and time of day.
Enable auto-mitigation
Define mitigation policies per node or per cluster. Choose which attack types trigger automatic firewall rules, set rate limits, and configure how long rules persist after an attack ends. Start with conservative settings and tune as you see real traffic patterns.
Monitor and optimize
Within hours, Flowtriq learns your normal traffic baselines and sets dynamic thresholds automatically. Review the analytics dashboard to understand traffic patterns, tune thresholds for inference API nodes, and export Prometheus metrics to your existing Grafana stack.
By the Numbers
The impact on your GPU cloud operations
Before & After
How Flowtriq transforms your DDoS response
Without Flowtriq
- Attacks detected minutes after they saturate the link
- Training jobs interrupted and must restart from checkpoint
- Inference APIs return timeouts during flood
- Manual investigation to identify attack vector
- Bare-metal servers exposed with no CDN layer
- $12,000+ in wasted compute per interrupted job
- SLA credits owed to multiple customers
With Flowtriq
- Detection in under 1 second per node
- Training jobs continue running uninterrupted
- Inference APIs respond normally during mitigation
- Automatic attack classification with confidence score
- Kernel-level filtering protects bare-metal directly
- $0 in lost compute during attacks
- SLA commitments maintained at 99.99% uptime
Pricing
Simple per-node pricing. No surprises.
Unlimited team seats included. Monitor 1 GPU node or 1,000 nodes at the same price per node. No bandwidth fees, no overage charges, no contracts. Cancel anytime. Flow sources (sFlow/NetFlow/IPFIX from routers) available from $19/source/month with volume discounts.
Compatibility
Works with your existing GPU stack
The FTAgent runs on any Linux server with kernel 3.10 or later. It supports all major distributions including Ubuntu, Debian, Rocky Linux, and AlmaLinux. Whether you run bare-metal H100 servers, A100 clusters, or mixed GPU fleets, the agent works the same way.
For Kubernetes-orchestrated GPU clusters, deploy the agent as a DaemonSet or install it directly on the host. For SLURM-managed HPC clusters, the agent runs alongside your job scheduler without interfering with GPU job allocation or NCCL communication.
Flowtriq integrates with your existing observability tools. Export incident data via webhooks to your SIEM or ticketing system. Pull Prometheus metrics into Grafana. Use the REST API and Terraform provider to automate provisioning as your fleet scales.
• Ubuntu 20.04, 22.04, 24.04
• Debian 11, 12
• Rocky Linux 8, 9
• AlmaLinux 8, 9
GPU Platforms
• NVIDIA H100 / A100 / L40S
• NVIDIA CUDA 11.x, 12.x
• AMD Instinct (ROCm)
Orchestration
• Kubernetes + GPU Operator
• SLURM / PBS Pro
• Ray / Anyscale
Firewalls
• iptables / ip6tables
• nftables
• ufw (Uncomplicated Firewall)
FAQ
Common questions from GPU cloud providers
Does the agent impact GPU performance?
No. The FTAgent runs at kernel level monitoring network statistics. It uses less than 0.1% CPU and zero GPU resources. Your CUDA and ROCm workloads are completely unaffected. The agent does not interact with the GPU driver, GPU memory, or any compute APIs.
Can I deploy on bare-metal servers?
Yes. The agent runs on any Linux server with kernel 3.10 or later. Bare-metal GPU servers running Ubuntu, Rocky Linux, or Debian are fully supported. No hypervisor, container runtime, or VM layer is required. Install with pip, configure with a single command, and the agent runs as a systemd service.
How does it protect inference APIs?
Deploy the agent on your API gateway and inference servers. It monitors traffic patterns and auto-mitigates floods targeting your API endpoints. Firewall rules drop malicious traffic at the kernel level before it reaches your inference framework, keeping response times stable for your customers.
Can I integrate with my orchestration layer?
Yes. The REST API and Terraform provider let you provision monitoring automatically when new GPU nodes come online. Prometheus metrics export feeds your existing observability stack. The agent integrates with Kubernetes DaemonSets, SLURM job hooks, and Ray cluster lifecycle events.
White-Label
Use it internally or resell it under your brand.
You don't have to choose. Run Flowtriq as an internal tool for your infrastructure team, or white-label it and offer DDoS protection as a branded service to your GPU cloud customers. Same platform, two business models.
Internal use: Deploy the agent across your GPU fleet at $9.99/node. Your team monitors everything from one dashboard. Customers never see it.
White-label: Rebrand the entire platform under your company name for a one-time $200 deposit (applied as billing credit). Custom domain, logo, colors, fonts, and login page. Per-node cost drops to $7.99/node/month. Bill your customers whatever you want.
Your clients log into dashboard.yourcompany.com, see your logo, your colors, and your support contact. No mention of Flowtriq anywhere.
Domain dashboard.yourcompany.com
Logo ✓ Custom uploaded
Colors ✓ Brand primary + accent
Login ✓ Custom heading + text
Branding ✓ All Flowtriq refs removed
Cost $7.99/node/month
Deposit $200 (applied as credit)
Seats Unlimited (no per-user fee)
Related Use Cases
Flowtriq for infrastructure providers
Schedule a Fit Assessment
30-minute call to discuss your GPU infrastructure and see if Flowtriq is the right fit. No sales pressure.
Book a CallGet the Implementation Guide
Step-by-step deployment guide tailored to GPU cloud infrastructure. Sent straight to your inbox.