This is the second post in a multi-part series. The previous post is here.
In this installment of Bitsquatting PCAP analysis we will make an educated guess about the prevalence of IPv6 on the Internet, which services DNS is used for, and identify some mysteries in the bitsquatting PCAPs.
All of this information is going to come from just one field: the requested record type of each DNS query.
First, some background on DNS record types. DNS is essentially a distributed hierarchical database. Values are retrieved by specifying a location and a record type. The location is a fully qualified domain name. The record type is one of several defined record types. The most commonly requested record type is A, which means IPv4 address. When you are using IPv4 and translate www.google.com to an IP address, you are retrieving the A record for www.google.com.
The dig command is used to manually query for DNS records. The following command will retrieve the A record for www.google.com:
The above command says: ask my local name server (usually specified in /etc/resolv.conf) for the A record for www.google.com. And output the result in short form. Note: the IP addresses returned for you will likely be different. Google attempts to direct you to a physically closer server based on the geo-ip location of the requesting DNS server. This is one part of how most content delivery networks work. More in a future blog post.
One more common record type is AAAA, which is used to retrieve IPv6 addresses. Why is the record type called AAAA? Because IPv4 addresses are 32 bits wide, and IPv6 addresses are 128 bits wide. If A is 32-bit, then AAAA would be 32+32+32+32=128-bit. Interestingly there used to be another record type for retrieving IPv6 addresses, A6, that has since been deprecated. Even if you are using IPv4, you can still retrieve the AAAA record of wwww.google.com:
By tallying the frequency of requested record types, we can determine the popularity of DNS uses. The requested record type is specified by the query type field of each DNS request. We can retrieve the query type from each packet using tshark. Lets get a list of all requested record types, and how often each record type was requested:
The full record type frequency table is available: (all_qtypes.txt, 408B, text).
The table below shows the top 15 requested record types. Amazingly, the most requested DNS record type is IPv6 address resolution! Considering that other places measure IPv6 DNS traffic at only 15% of web traffic, something is definitely amiss. More on this after the discussion of DNS use.
Name resolution is by far the most popular use of DNS. Name resolution is responsible for the first, second, fourth, and seventh most frequently requested record types. Amazingly there is a very high frequency of deprecated A6 records. Can there really be that many old BIND servers out there?
The second most popular use of DNS is for email related services. The third most requested record type is MX, which is used for determining the incoming mail servers for a domain. MX records can be viewed from the command line as well:
Along with MX, the other records commonly used for email are TXT (to hold SPF and DKIM data) which is the tenth most frequently requested, and SPF (used for SPF data) which is the fifteenth most frequent.
The fifth, sixth, and eighth most frequently record types are used all used for DNS infrastructure purposes. The ANY record type simply retrieves all available records, the SOA record type specifies who is the primary source for information about the domain, and the NS type specifies nameservers that can be used to answer queries about the domain.
The next most commonly requested record type, SRV, is used for custom protocol related records. In practice, most SRV queries are used to retrieve information for Jabber/XMPP and other messaging services, including VoIP/Videoconferencing services.
Finally PTR records are used for reverse DNS lookups. A reverse lookup is performed when you want to map an IP address to a domain name. This is one of the few (maybe the only?) time when you will encounter the .arpa TLD. ARPA originally stood for the Advanced Research Projects Agency, the US Government agency that funded the creation of the Internet. These days .arpa has been backronymed to Address and Routing Parameter Area, and what used to be ARPA is now DARPA.
To request a PTR records for an IPv4 address, the octets of the IP are reversed, and .in-addr.arpa is appended. This is because IP addresses are hierarchical from left to right but DNS is hierarchical from right to left. For example, to see what domain 173.194.75.99 (one of the IPs for www.google.com) corresponds to, we would use the following command:
The returned domain is not www.google.com, but this is due to Google's infrastructure. There is a clever easter egg in the domain: 1e100 means 1.0 × 10100, which is one googol.
Before we jump to conclusions about IPv6, we should remember that there are outliers in the bitsquatting PCAPs. If you recall from the previous post, there were numerous queries for 0mdn.net because that domain was an authoritative name server. Queries for 0mdn.net might be affecting the record type distribution. Lets filter out these queries:
The full list of record types and their frequencies is available: (nomdn_qtypes.txt, 379B, text).
This command works using the -R option of tshark. The -R option specifies a wireshark display filter that is applied when reading PCAPs. The filter of !(dns.qry.name contains 0mdn.net) will match all packets where the query name field does not contain 0mdn.net. Lets examine the new results:
The new table is a much different picture with regards to IPv6, but there is still a large amount of AAAA record requests.
Lesson Learned: There are enough AAAA record requests to indicate IPv6 connectivity is important. If you are attempting to re-do the bitsquatting experiment, have IPv6 connectivity and answer AAAA requests!
Why were there so many AAAA record requests for the authoritative nameservers, and how do these compare to other domains? Lets use tshark to retrieve all AAAA record requests, and which domain was the request was for:
The full list of AAAA query frequencies is available: (AAAA_queries.txt, 17KB, text).
The two authoritative name servers receive the most AAAA requests, but there are other domains with numerous IPv6 lookups. Maybe these domains are just popular?
The full list of A query frequencies is available: (A_queries.txt, 99KB, text).
The full list of A:AAAA ratios is available: (ratio_of_a_to_aaaa.txt, 18KB, text).
Different domains exhibit a wildly different ratio of IPv4 to IPv6 lookups! Some actually have more IPv6 resolutions than IPv4 resolutions. The mystery is, why is this the case?
IPv6 connectivity is important. When removing outliers, there were almost as many IPv6 resolution requests as IPv4 requests. When investigating in more detail, some domains actually receive more IPv6 resolution requests than IPv4 resolution requests. I do not know why. If you have suggestions, please contact me.
Update:
Part 3 is now up, Bitsquatting PCAP Analysis Part 3: Bit-error distribution.
In this installment of Bitsquatting PCAP analysis we will make an educated guess about the prevalence of IPv6 on the Internet, which services DNS is used for, and identify some mysteries in the bitsquatting PCAPs.
All of this information is going to come from just one field: the requested record type of each DNS query.
Background
First, some background on DNS record types. DNS is essentially a distributed hierarchical database. Values are retrieved by specifying a location and a record type. The location is a fully qualified domain name. The record type is one of several defined record types. The most commonly requested record type is A, which means IPv4 address. When you are using IPv4 and translate www.google.com to an IP address, you are retrieving the A record for www.google.com.
The dig command is used to manually query for DNS records. The following command will retrieve the A record for www.google.com:
$ dig +short www.google.com a 173.194.75.99 173.194.75.147 173.194.75.104 173.194.75.103 173.194.75.105 173.194.75.106
The above command says: ask my local name server (usually specified in /etc/resolv.conf) for the A record for www.google.com. And output the result in short form. Note: the IP addresses returned for you will likely be different. Google attempts to direct you to a physically closer server based on the geo-ip location of the requesting DNS server. This is one part of how most content delivery networks work. More in a future blog post.
One more common record type is AAAA, which is used to retrieve IPv6 addresses. Why is the record type called AAAA? Because IPv4 addresses are 32 bits wide, and IPv6 addresses are 128 bits wide. If A is 32-bit, then AAAA would be 32+32+32+32=128-bit. Interestingly there used to be another record type for retrieving IPv6 addresses, A6, that has since been deprecated. Even if you are using IPv4, you can still retrieve the AAAA record of wwww.google.com:
$ dig +short www.google.com aaaa 2607:f8b0:400c:c01::67
What is DNS used for?
By tallying the frequency of requested record types, we can determine the popularity of DNS uses. The requested record type is specified by the query type field of each DNS request. We can retrieve the query type from each packet using tshark. Lets get a list of all requested record types, and how often each record type was requested:
$ tshark -n -r completelog.pcap -o column.format:'"QTYPE", "%Cus:dns.qry.type"' | tr '[:upper:]' '[:lower:]' | sort | uniq -c | sort -rn > analysis/all_qtypes.txt
The full record type frequency table is available: (all_qtypes.txt, 408B, text).
The table below shows the top 15 requested record types. Amazingly, the most requested DNS record type is IPv6 address resolution! Considering that other places measure IPv6 DNS traffic at only 15% of web traffic, something is definitely amiss. More on this after the discussion of DNS use.
Rank | Query Count | Record Type |
---|---|---|
1 | 2050660 | aaaa |
2 | 1132372 | a |
3 | 359779 | mx |
4 | 47335 | a6 |
5 | 38404 | any |
6 | 25954 | soa |
7 | 8155 | cname |
8 | 5130 | ns |
9 | 4835 | 513 |
10 | 2622 | txt |
11 | 1149 | srv |
12 | 698 | 1025 |
13 | 232 | 257 |
14 | 144 | ptr |
15 | 141 | spf |
Name resolution is by far the most popular use of DNS. Name resolution is responsible for the first, second, fourth, and seventh most frequently requested record types. Amazingly there is a very high frequency of deprecated A6 records. Can there really be that many old BIND servers out there?
The second most popular use of DNS is for email related services. The third most requested record type is MX, which is used for determining the incoming mail servers for a domain. MX records can be viewed from the command line as well:
$ dig +short gmail.com mx 10 alt1.gmail-smtp-in.l.google.com. 30 alt3.gmail-smtp-in.l.google.com. 5 gmail-smtp-in.l.google.com. 20 alt2.gmail-smtp-in.l.google.com. 40 alt4.gmail-smtp-in.l.google.com.
Along with MX, the other records commonly used for email are TXT (to hold SPF and DKIM data) which is the tenth most frequently requested, and SPF (used for SPF data) which is the fifteenth most frequent.
The fifth, sixth, and eighth most frequently record types are used all used for DNS infrastructure purposes. The ANY record type simply retrieves all available records, the SOA record type specifies who is the primary source for information about the domain, and the NS type specifies nameservers that can be used to answer queries about the domain.
The next most commonly requested record type, SRV, is used for custom protocol related records. In practice, most SRV queries are used to retrieve information for Jabber/XMPP and other messaging services, including VoIP/Videoconferencing services.
Finally PTR records are used for reverse DNS lookups. A reverse lookup is performed when you want to map an IP address to a domain name. This is one of the few (maybe the only?) time when you will encounter the .arpa TLD. ARPA originally stood for the Advanced Research Projects Agency, the US Government agency that funded the creation of the Internet. These days .arpa has been backronymed to Address and Routing Parameter Area, and what used to be ARPA is now DARPA.
To request a PTR records for an IPv4 address, the octets of the IP are reversed, and .in-addr.arpa is appended. This is because IP addresses are hierarchical from left to right but DNS is hierarchical from right to left. For example, to see what domain 173.194.75.99 (one of the IPs for www.google.com) corresponds to, we would use the following command:
$ dig +short 99.75.194.173.in-addr.arpa ptr ve-in-f99.1e100.net.
The returned domain is not www.google.com, but this is due to Google's infrastructure. There is a clever easter egg in the domain: 1e100 means 1.0 × 10100, which is one googol.
What can we learn about the prevalence of IPv6?
Before we jump to conclusions about IPv6, we should remember that there are outliers in the bitsquatting PCAPs. If you recall from the previous post, there were numerous queries for 0mdn.net because that domain was an authoritative name server. Queries for 0mdn.net might be affecting the record type distribution. Lets filter out these queries:
$ tshark -n -r completelog.pcap -R '!(dns.qry.name contains 0mdn.net)' -o column.format:'"QTYPE", "%Cus:dns.qry.type"' | tr '[:upper:]' '[:lower:]' | sort | uniq -c | sort -rn > analysis/nomdn_qtypes.txt
The full list of record types and their frequencies is available: (nomdn_qtypes.txt, 379B, text).
This command works using the -R option of tshark. The -R option specifies a wireshark display filter that is applied when reading PCAPs. The filter of !(dns.qry.name contains 0mdn.net) will match all packets where the query name field does not contain 0mdn.net. Lets examine the new results:
Rank | Query Count | Record Type |
---|---|---|
1 | 550892 | a |
2 | 509605 | aaaa |
3 | 358926 | mx |
4 | 26829 | any |
5 | 25039 | soa |
6 | 7729 | cname |
7 | 4835 | 513 |
8 | 4728 | ns |
9 | 2597 | txt |
10 | 1148 | srv |
11 | 698 | 1025 |
12 | 232 | 257 |
13 | 222 | a6 |
14 | 143 | ptr |
15 | 138 | spf |
The new table is a much different picture with regards to IPv6, but there is still a large amount of AAAA record requests.
Lesson Learned: There are enough AAAA record requests to indicate IPv6 connectivity is important. If you are attempting to re-do the bitsquatting experiment, have IPv6 connectivity and answer AAAA requests!
What is the nature of IPv6 traffic (AAAA record requests)?
Why were there so many AAAA record requests for the authoritative nameservers, and how do these compare to other domains? Lets use tshark to retrieve all AAAA record requests, and which domain was the request was for:
$ tshark -n -r completelog.pcap -R '(dns.qry.type == AAAA)' -o column.format:'"QTYPE", "%Cus:dns.qry.name"' | tr '[:upper:]' '[:lower:]' | sort | uniq -c | sort -rn > analysis/AAAA_queries.txt
The full list of AAAA query frequencies is available: (AAAA_queries.txt, 17KB, text).
AAAA Queries | Domain |
---|---|
794921 | ns2.0mdn.net |
774496 | ns1.0mdn.net |
77181 | static.ak.dbcdn.net |
77053 | support.doublechick.net |
66595 | gmaml.com |
58634 | g.mic2osoft.com |
28107 | s0.0mdn.net |
16327 | www.amazgn.com |
13401 | mail.gmaml.com |
6367 | www.micro3oft.com |
5678 | amazgn.com |
4924 | www.mic2osoft.com |
4789 | www.eicrosoft.com |
4578 | pop.gmaml.com |
4346 | static.ak.fbgdn.net |
The two authoritative name servers receive the most AAAA requests, but there are other domains with numerous IPv6 lookups. Maybe these domains are just popular?
Ratio of IPv4 to IPv6 address lookups
The ratio of IPv4 address resolutions to IPv6 address resolutions will show the proportion of IPv6 traffic for each domain. This measurement should completely disregard popularity, as it uses ratios instead of absolute numbers. My hypothesis was that the ratios should be approximately the same for all domains, as none of the domains I bitsquatted were IPv6 related. Lets calculate the ratios.
Step 1: Calculate A record frequency
The following command will tabulate the frequency of A record requests for each domain:$ tshark -n -r completelog.pcap -R '(dns.qry.type == A)' -o column.format:'"QTYPE", "%Cus:dns.qry.name"' | tr '[:upper:]' '[:lower:]' | sort | uniq -c | sort -rn > analysis/A_queries.txt
The full list of A query frequencies is available: (A_queries.txt, 99KB, text).
Step 2: Massage Data
The following commands will prepare both the A record frequency and AAAA record frequency tables to be joined on the domain name field.$ sort -f -k2 analysis/A_queries.txt > a_q_for_join.txt $ sort -f -k2 analysis/AAAA_queries.txt > aaaa_q_for_join.txt
Step 3: Calculate the ratio of A to AAAA record requests
Amazingly, the POSIX standard specifies a relational join command that operates on specially delimited text files. The join command below will join the first file on the second field (-1 2), with the second file also on the second field (-2 2). The second field of both files is the domain name. The output of join is then piped to awk to calculate the ratio of A to AAAA record requests.$ join -1 2 -2 2 a_q_for_join.txt aaaa_q_for_join.txt | awk '{printf "%d\t%2.2f\t%s\n", $2+$3, $2/$3, $1}' | sort -rn >analysis/ratio_of_a_to_aaaa.txt
The full list of A:AAAA ratios is available: (ratio_of_a_to_aaaa.txt, 18KB, text).
Total Query Count | A to AAAA Query Ratio | Domain |
---|---|---|
1095763 | 0.41 | ns1.0mdn.net |
1072642 | 0.35 | ns2.0mdn.net |
93208 | 0.40 | gmaml.com |
80862 | 0.05 | static.ak.dbcdn.net |
77147 | 0.00 | support.doublechick.net |
70140 | 4.23 | mail.gmaml.com |
59500 | 0.01 | g.mic2osoft.com |
53969 | 2.31 | www.amazgn.com |
43270 | 6.62 | amazgn.com |
28694 | 0.02 | s0.0mdn.net |
20575 | 8.63 | micro3oft.com |
13585 | 9.32 | miarosoft.com |
12175 | 0.91 | www.micro3oft.com |
10762 | 1.19 | www.mic2osoft.com |
9032 | 26.62 | u2s.micro3oft.com |
Different domains exhibit a wildly different ratio of IPv4 to IPv6 lookups! Some actually have more IPv6 resolutions than IPv4 resolutions. The mystery is, why is this the case?
Conclusion
IPv6 connectivity is important. When removing outliers, there were almost as many IPv6 resolution requests as IPv4 requests. When investigating in more detail, some domains actually receive more IPv6 resolution requests than IPv4 resolution requests. I do not know why. If you have suggestions, please contact me.
Update:
Part 3 is now up, Bitsquatting PCAP Analysis Part 3: Bit-error distribution.
No comments:
Post a Comment