DNS Slowdown Investigation

Overview:

DNS is the “phone” directory of the Internet. Without DNS, many services that run the backbone of the Internet will not work. The major outages of Amazon Web Services (AWS), Google, and Facebook which affected hundreds of millions of users in 2021 and 2022 were primarily caused by DNS issues. 

A web services company is experiencing a slowdown of its services due to a partial outage. They need to determine whether the issue is a network (switch, routing protocol) or application (HTTP, DNS) )-related issue. By being able to correlate which clients are creating truncated DNS requests and the clients (or services) that are experiencing issues, the services provider can further determine whether the root cause is derived from the network or application. 

The Problem:

  • Truncated DNS request packets are sent by TCP instead of UDP due to a large request, this increases the chances of the 11-second timeout for the DNS protocol.

  • There needs to be a way to get DNS statistics without having to search packets for every response or query code.

  • There is no easy way of finding out truncated DNS requests and timeouts within a specific time range.

  • Slower remediation since it can be difficult to find out which DNS server is getting the truncated DNS requests and which clients are connected to it.

The Solution:

  • The dedicated DNS Reports Dashboard shows a breakdown of key statistics for the issues within the DNS protocol such as truncated requests and errors/timeouts.

  • PureInsight interactive search can show DNS servers and clients where truncated DNS requests took place within a time range.

  • Results can be downloaded for external tool usage.

Workflow:

1. Interactive Search (Search: TCP Port 53)

  • Run a Search with TCP port 53. (A DNS server listens for requests on port 53).

  • Interactive Search can be used to categorize captured traffic data based on IP addresses, port numbers, protocols, and regular expressions. You can visualize the network traffic as a Nodal Graph with details of each node displayed. 

2. DNS Reports

  • Go to DNS Reports and select the Interactive Search Results output file as the input file.

  • DNS Reports clearly list the number of timeouts and errors in the network.

3. Look for Truncated Request

  • You can see that there are 17 truncated requests and 11 resumes errors.

4. Check for DNS Server Node

  • You can expect 64.209.17.30 to be a DNS server because it consumes a lot of traffic.

5. Check DNS Client Nodes

  • Interactive Search can show DNS servers and clients where truncated DNS requests took place within a time range.

6. Download Results and Verify that Truncated

  • Return to the interactive search and download the PCAP file from Results; in WireShark, open this file and you will see that it contains many TCP retransmissions.

7. Packets are Causing Delay

  • Our customers use this feature because DNS causes a lot of network delays. When delays occur, the issue is typically with either the network or the application. Network engineers can use this feature to prove that the issue is not with the network, but with the application.