The Collateral Damage Problem
Most people think of DDoS attacks as a one-to-one problem: an attacker targets a server, the server goes down, and that is the end of the blast radius. For attacks that hit a server directly on a dedicated link, that model is roughly correct. But for ISPs, the reality is far worse.
ISP networks are shared infrastructure. Hundreds or thousands of customers share the same upstream transit links, the same edge routers, and the same peering connections. When a DDoS attack targets a single customer but generates enough volume to saturate the transit link that customer sits behind, every other customer sharing that link suffers the consequences.
This is not a theoretical concern. It is one of the most common operational headaches for ISPs of every size. A 10 Gbps transit link serving 500 customers gets hit with a 12 Gbps volumetric attack targeting one customer's /24. The link saturates. All 500 customers experience packet loss, degraded throughput, and increased latency. Your NOC's phone lights up with hundreds of tickets, and only one of them mentions a DDoS attack. The rest just say "the internet is slow."
How ISP Networks Work: Peering, Transit, and the Last Mile
To understand why transit link saturation is so damaging, you need to understand the basic topology of an ISP network. There are three layers that matter here.
Transit links
Transit links connect your network to the broader internet via upstream providers. These are the pipes that carry traffic between your network and destinations you do not peer with directly. A regional ISP might have two or three transit providers, each connected via 10G or 100G links. Transit is expensive, so ISPs typically provision just enough capacity for peak traffic plus a reasonable margin. A 10G transit link running at 6 Gbps during peak hours has 4 Gbps of headroom. That sounds like a lot until a 5 Gbps attack arrives and pushes the link to 11 Gbps.
Peering links
Peering connections at internet exchange points (IXPs) carry traffic between your network and other networks you have direct relationships with. Peering is usually settlement-free (no per-Gbps cost), so ISPs route as much traffic as possible over peering. But peering links have the same saturation vulnerability. A DDoS attack sourced from or amplified through a network you peer with can saturate the peering link and degrade service for all traffic flowing over that connection.
Last mile and aggregation
The last mile connects customers to your nearest POP. Aggregation routers combine traffic from hundreds of last-mile connections onto shared uplinks. Even if your transit links have headroom, a saturated aggregation uplink at a POP has the same collateral damage effect on a smaller scale. Every customer connected to that aggregation router loses service.
The bottleneck is always the shared link. Whether it is a transit link, a peering port, or an aggregation uplink, any shared link that saturates takes down every customer behind it. DDoS attackers do not need to target your infrastructure directly. They just need to generate enough traffic to fill a pipe.
Transit Link Saturation Explained
Here is the math that makes transit link saturation so dangerous. Consider a typical regional ISP with a 10G transit link to a Tier 1 provider. During normal peak hours, the link carries 6.5 Gbps of legitimate traffic for all customers combined. That leaves 3.5 Gbps of headroom.
An attacker launches a 12 Gbps UDP amplification attack targeting a single customer's IP address. The attack traffic arrives at the transit provider's edge, crosses the transit link, and immediately saturates it. The link can only carry 10 Gbps. The remaining 2 Gbps has to be dropped.
But here is the critical part: the transit link does not distinguish between attack traffic and legitimate traffic. It does not know which packets are part of the DDoS and which are your customers' Netflix streams, VoIP calls, and business applications. Packets are dropped based on queue depth and scheduling algorithms, not based on whether they are malicious. The result:
Transit Link Saturation: 10G Link Example ────────────────────────────────────────────────────────────────── Legitimate traffic (all customers) 6.5 Gbps Attack traffic (one customer) 12.0 Gbps Total arriving at link 18.5 Gbps Link capacity 10.0 Gbps Dropped traffic 8.5 Gbps ────────────────────────────────────────────────────────────────── Effective packet loss for ALL traffic: ~46% Every customer on this link loses nearly half their packets.
At 46% packet loss, TCP connections stall, retransmit, and time out. VoIP calls become unintelligible. Video streams buffer endlessly. DNS queries fail, making it look like "the internet is down" even though the link is technically still up and forwarding traffic. From the customer's perspective, service is effectively dead.
The attack is targeting one customer. All 500 customers behind that transit link are experiencing an outage.
Why This Is Worse Than Single-Target Attacks
A DDoS attack against a single server with a dedicated link has a blast radius of one. The target goes down, everyone else is fine. Transit link saturation has a blast radius of every customer sharing that link. For a regional ISP, that can mean hundreds or thousands of businesses and residential customers losing service simultaneously.
The operational impact scales accordingly. Instead of one customer opening a ticket, you get hundreds. Instead of one SLA violation, you have dozens of business customers checking their contracts. Instead of a targeted mitigation action, you need to triage a network-wide event that looks like a transit outage to everyone except the one engineer who happens to check the traffic profile on the saturated link.
The financial exposure is significant. Business SLAs typically include credit clauses for outages exceeding 15 or 30 minutes. A single transit saturation event lasting an hour can trigger SLA credits across hundreds of accounts. For an ISP with 200 business customers on a $500/month plan with a 10x credit clause, a one-hour outage could cost $100,000 or more in credits alone, not counting churn from customers who decide the ISP cannot protect them.
The Cascade Effect
Transit link saturation does not always stay contained to a single link. When one transit link saturates, several cascade effects can make the situation worse.
- BGP convergence issues: If the saturated link starts dropping BGP keepalives, the BGP session with your transit provider can flap or go down entirely. When the session drops, routes are withdrawn, and traffic that was flowing over that link needs to reconverge onto your remaining transit links. If those links are already near capacity, the redistributed traffic can push them into saturation too.
- Route flaps: A congested link that intermittently drops BGP keepalives causes route flapping. Each flap triggers BGP updates that propagate across the internet, potentially triggering route dampening at upstream providers. Once your prefixes are dampened, traffic to your network is black-holed even after the original congestion clears.
- ECMP redistribution: If you use equal-cost multi-path (ECMP) routing across multiple transit links, a saturated link that drops its BGP session causes all its traffic to shift to the remaining paths. A two-link ECMP setup where one link saturates means the surviving link now carries double the traffic, which may push it past capacity as well.
- Adjacent link overload: Even without BGP session loss, traffic engineering (TE) or internal routing changes in response to congestion can shift load to adjacent links. If your network was provisioned with the assumption that no single link would saturate, the capacity planning model breaks down the moment one link does.
In the worst case, a DDoS attack targeting a single customer can cascade through BGP reconvergence and load redistribution to degrade service across your entire network, not just the customers behind the originally saturated link.
Detection Challenges for ISPs
Detecting transit link saturation before it causes widespread customer impact is harder than it sounds. The tools most ISPs rely on were not designed for this problem.
- SNMP polling at 5-minute intervals: Most ISPs monitor link utilization via SNMP polls to their NMS (Cacti, LibreNMS, PRTG, Zabbix). With a 5-minute polling interval, you have a maximum detection time of 5 minutes. In practice, it is worse because the poll captures an average over the interval, meaning a 30-second burst that saturates the link might show as a modest utilization increase in the 5-minute average. By the time the next poll shows sustained saturation, customers have been impacted for 5-10 minutes.
- NetFlow sampling at 1:1000: NetFlow with high sampling rates is the standard for traffic analysis at ISP scale, but it has blind spots. A 1:1000 sampling rate on a 10G link processing 1 Mpps means you see approximately 1,000 sampled packets per second. That is enough to detect a sustained 5 Gbps flood, but short bursts (10-30 seconds) may not produce statistically significant samples before they end. And NetFlow export intervals add another 1-5 minutes of latency.
- Per-customer monitoring is expensive: To detect which customer is being targeted before the link saturates, you need per-prefix traffic monitoring on every transit-facing interface. At 4,000 customer prefixes across 200 edge routers, that is a lot of state to track. Traditional NetFlow collectors struggle with this cardinality.
- Saturation hides the evidence: Once a link is saturated, NetFlow export packets themselves may be dropped. The very data you need to diagnose the problem is lost because of the problem. SNMP polls may time out for the same reason. You are blind precisely when you most need visibility.
The fundamental problem is time. SNMP and sampled NetFlow give you visibility in minutes. Transit link saturation causes customer impact in seconds. By the time your monitoring detects the saturation, hundreds of customers have already been affected for several minutes.
The Null-Route Dilemma
When an ISP finally detects that a DDoS attack is saturating a transit link, the standard response is to null-route (blackhole) the target IP or prefix. This is fast, effective at clearing the congestion, and supported by every router on the planet. It is also exactly what the attacker wanted.
The entire point of many DDoS attacks is to take a target offline. When you null-route the target to save the rest of your network, you are completing the attacker's mission for them. The target customer loses all connectivity. Their website is down, their email is unreachable, their VPN tunnels drop. You have traded widespread degradation for a targeted outage, which is usually the right call, but it is not a solution. It is triage.
For ISPs offering DDoS protection as a service, null-routing is especially problematic. You cannot tell a customer paying for DDoS protection that your mitigation strategy is to take them offline when an attack arrives. They expect you to filter the attack traffic while keeping their legitimate traffic flowing.
The challenge is that null-routing at the ISP edge only stops the traffic from reaching the customer. The attack traffic still crosses the transit link to reach the ISP border router where the null route is applied. To actually relieve transit link saturation, the null route (or RTBH community) needs to propagate upstream to the transit provider's network, so traffic is dropped before it enters your link. This depends on your transit provider supporting RTBH communities and honoring them promptly.
Carpet Bombing: When the Whole Subnet Is the Target
Transit link saturation becomes even more difficult to handle when attackers use carpet bombing techniques. Instead of concentrating all attack traffic on a single IP, carpet bombing spreads the attack across an entire prefix or multiple prefixes belonging to the same ISP.
In a carpet bombing attack, each individual IP might receive only 5-10 Mbps of attack traffic. That is well below any per-host detection threshold. But spread across a /16 (65,536 IPs), 5 Mbps per IP adds up to over 300 Gbps of aggregate attack traffic. The transit link saturates, but no single destination shows an obvious spike.
This defeats null-routing entirely. You cannot blackhole a /16 containing thousands of active customers. You cannot even identify a single target to blackhole because the attack is intentionally distributed to avoid triggering per-prefix thresholds. The only way to detect carpet bombing is to monitor aggregate traffic at the interface level (total BPS and PPS on the transit-facing port) combined with per-prefix analysis that looks for coordinated low-volume increases across many destinations.
Detection Strategies That Actually Work
Effective detection of transit link saturation requires monitoring at multiple layers simultaneously. No single data source is sufficient.
- Interface-level PPS/BPS at sub-second intervals: Monitor every transit-facing and peering-facing interface at 1-second or sub-second granularity. Not SNMP polls. Streaming telemetry (gNMI, gRPC) or direct packet counting on the monitoring host. When interface utilization exceeds 80% of capacity, you need to know within seconds, not minutes.
- Per-prefix aggregate monitoring: Track BPS and PPS per destination prefix on every edge interface. This requires more state than interface-level monitoring, but it is the only way to identify which customer is being targeted before you resort to looking at flow data. Flowtriq agents do this natively for every prefix they are configured to monitor.
- Protocol and packet-size distribution: Monitor the ratio of UDP to TCP traffic, the distribution of packet sizes, and source port entropy on each interface. A sudden shift toward small UDP packets from port 53 (DNS amplification) or port 11211 (memcached amplification) is a strong early signal of an incoming volumetric attack.
- BGP community-based signaling from customers: Sophisticated customers (hosting providers, enterprises with their own ASN) can signal DDoS conditions to you via BGP communities. When a customer detects an attack on their end, they tag the affected prefix with a pre-agreed community that triggers your mitigation workflow. This provides a customer-initiated early warning system.
- Correlation across multiple links: If you have multiple transit links, compare utilization patterns across them. A DDoS attack typically arrives via one transit provider (the one with the best path to the botnet or reflector pool). If one transit link spikes while others remain stable, that asymmetry is a strong indicator of attack traffic.
Mitigation: Beyond Null-Routing
Once you detect transit link saturation or an impending saturation event, there are several mitigation strategies beyond the null-route hammer.
Upstream scrubbing
Divert traffic for the targeted prefix through a cloud scrubbing provider (Path.net, Voxility, Akamai Prolexic, etc.) by announcing more-specific routes or activating GRE tunnels. The scrubbing provider filters attack traffic and returns clean traffic to your network. This keeps the customer online and reduces load on your transit link. The downside is cost (scrubbing is priced per Gbps) and added latency from the traffic detour.
RTBH at the border with upstream propagation
If you must blackhole the target, propagate the RTBH community to your upstream transit provider so traffic is dropped at their edge, not yours. This clears the congestion on your transit link within seconds. Most Tier 1 and Tier 2 transit providers support RTBH communities. The limitation is that it still takes the target offline.
BGP FlowSpec
FlowSpec lets you install granular traffic filters via BGP that match on protocol, port, packet size, DSCP, and other fields. Instead of blackholing all traffic to a destination, you can drop only UDP traffic from source port 53 with packet size greater than 512 bytes (DNS amplification signature) while allowing all other traffic through. FlowSpec rules propagate to every router in the BGP domain within seconds. If your upstream transit provider supports FlowSpec, the rules can be propagated upstream to filter traffic before it enters your network. See our FlowSpec deep dive for configuration details.
Traffic engineering and re-routing
If you have multiple transit links, you can use BGP traffic engineering (prepending, MED manipulation, or selective announcements) to shift traffic away from the saturated link onto links with available capacity. This does not filter the attack traffic, but it distributes the load across more capacity, potentially keeping all links below saturation. This works best as a short-term measure while you deploy FlowSpec or activate scrubbing.
How Flowtriq Helps ISPs Catch Saturation Early
Flowtriq was built for exactly this problem. By deploying lightweight agents on edge routers and customer-facing nodes, ISPs get the visibility they need to detect transit link saturation before customers notice.
- Per-second interface monitoring: Flowtriq agents report interface utilization at 1-second granularity. When a transit link starts climbing toward capacity, you know immediately. No waiting for the next 5-minute SNMP poll.
- Per-prefix PPS and BPS tracking: Every agent monitors traffic per destination prefix, so you can identify which customer is being targeted within seconds of an attack starting. This works for both concentrated attacks and carpet bombing patterns, because the agent tracks both per-prefix and aggregate metrics simultaneously.
- Automatic FlowSpec and RTBH triggering: When Flowtriq detects an attack, it can automatically inject FlowSpec rules or RTBH announcements via its BGP adapter. The response fires in seconds, not the minutes it takes a NOC engineer to log into a router and type commands manually.
- Escalation chain: Flowtriq's 4-level auto-escalation starts with local firewall rules, escalates to FlowSpec, then RTBH, then cloud scrubbing. Each level is triggered automatically based on attack volume and type. The ISP defines the thresholds; Flowtriq executes the response.
- No sampling, no blind spots: Flowtriq agents analyze every packet, not a 1:1000 sample. Short-duration bursts, low-volume application-layer attacks, and carpet bombing patterns that are invisible to sampled NetFlow are detected in real time.
The Business Case
Transit link saturation is not just a technical problem. It is a business risk that scales with the number of customers behind the affected link.
Consider a single event: a 15 Gbps DDoS attack saturates a 10G transit link for 45 minutes before the NOC detects it (via SNMP) and applies a null route (which takes another 10 minutes to propagate upstream). Total customer impact: 55 minutes.
- SLA violations: 200 business customers with 99.9% uptime SLAs. A 55-minute outage in a month pushes uptime below 99.87%. At a 10x credit clause on a $500/month average bill, that is $100,000 in SLA credits.
- Customer churn: If 5% of affected business customers switch providers after the event, that is 10 customers at $500/month = $60,000 in annual recurring revenue lost.
- NOC labor: 3-4 NOC engineers spending 2 hours on triage, customer communication, and post-incident review. At fully loaded cost, roughly $800-1,200.
- Reputation damage: Social media posts, forum threads, and review site complaints from residential customers. Hard to quantify but real.
A single transit saturation event can cost an ISP $100,000-$160,000 in direct financial impact. Flowtriq's per-node monitoring at $9.99/month ($7.99 on annual billing) can detect the same attack in seconds and trigger automated mitigation before most customers notice anything. For a 50-node deployment covering your transit-facing infrastructure, that is under $5,000 per year. The ROI from preventing a single saturation event pays for years of monitoring.
Ready to protect your transit links? Start a free 7-day trial and deploy Flowtriq agents on your transit-facing edge routers. Per-second visibility, automatic FlowSpec and RTBH mitigation, and per-prefix monitoring across your entire customer base. No contracts, no hardware, no sales calls.