Quad9 kaput

An update from Quad9 on the recent issue noted here:
A routing leak by a peer of one of our upstream providers caused our location in Kinshasa (DRC) to receive inappropriate volumes of traffic to that anycast destination (site code: FIH)

Summary of issue:

Between ~12:15 - ~13:40 UTC, it appears STE MDICIS SAS (AS328552) leaked
Quad9 prefixes to Liquid Telecom (AS30844) and onwards, causing unexpected traffic volumes and slow responses to small number of African, European, and North American networks who chose to listen to that new path announcement.
Detailed report:
STE MDICIS SAS (AS328552) announced Quad9 prefixes and possibly others, due to what would seem to be an outbound filter problem which caused them to re-announce prefixes heard on their peering session with Quad9 transit provider Packet Clearing House (PCH - AS42.) It is not known if other peers of STE MDICIS SAS were also leaked in the same event. STE MDICIS SAS is either a customer or a peer of Liquid Telecom (AS30844) in Africa.
Liquid Telecom then re-announced those prefixes/routes on their network and to their transit providers, resulting in some networks preferring that route as the shortest AS path. Most notably, it appears that there was misdirected traffic received at that Quad9 location in the DRC from BT (AS2856), Verizon (AS701) and several Orange Telecom (AS3215, AS5511) networks, among others. Packets were delivered to the Quad9 FIH location and were apparently not black-holed or delivered to an incorrect destination but the volume received for that site was far higher than typical across a sub-optimal path, which may have resulted in lost or very delayed DNS responses for the users whose networks chose that path.
Liquid Telecom is still investigating this matter, and we are in communication with them to ensure that this does not occur in the future. They were informed of the problem and rapidly replied to us with a modification to resolve the problem, which we assume to be a filter definition. The originating leak network operator (AS328552) has not yet updated us on their status.
Quad9 signs all of its prefixes with RPKI ROAs, which are meant to prevent incorrect announcements from having wide-reaching implications, though it does require all organizations in the routing path to implement ROA validation. Quad9 also has wide distribution of anycast nodes (more than 185 locations) to minimize these types of outages to smaller sets of effected networks. Quad9 is also expanding the number of upstream transit providers and geographies of those transit providers to further minimize the radius of damage due to routing leaks like this which are out of our control. (We will shortly have other upstream transit in South Africa, as an example.)
Investigation into the exact cause and characteristics of the pathing problem is ongoing, but there have been no reported incorrect path issues since AS30844 made their changes at ~13:40 UTC.
Please send questions or any further results or observed issues to [email protected]
 
  • Like
Reactions: Yuu
Top
Sign up to the MyBroadband newsletter
X