Wednesday, November 14, 2012

Bitsquatting PCAP Analysis Part 2: Query Types, IPv6


This is the second post in a multi-part series. The previous post is here.

In this installment of Bitsquatting PCAP analysis we will make an educated guess about the prevalence of IPv6 on the Internet, which services DNS is used for, and identify some mysteries in the bitsquatting PCAPs.

All of this information is going to come from just one field: the requested record type of each DNS query.

Background


First, some background on DNS record types. DNS is essentially a distributed hierarchical database. Values are retrieved by specifying a location and a record type. The location is a fully qualified domain name. The record type is one of several defined record types. The most commonly requested record type is A, which means IPv4 address. When you are using IPv4 and translate www.google.com to an IP address,  you are retrieving the A record for www.google.com.

The dig command is used to manually query for DNS records. The following command will retrieve the A record for www.google.com:

$ dig +short www.google.com a
173.194.75.99
173.194.75.147
173.194.75.104
173.194.75.103
173.194.75.105
173.194.75.106

The above command says: ask my local name server (usually specified in /etc/resolv.conf) for the A record for www.google.com. And output the result in short form. Note: the IP addresses returned for you will likely be different. Google attempts to direct you to a physically closer server based on the geo-ip location of the requesting DNS server. This is one part of how most content delivery networks work. More in a future blog post.

One more common record type is AAAA, which is used to retrieve IPv6 addresses. Why is the record type called AAAA? Because IPv4 addresses are 32 bits wide, and IPv6 addresses are 128 bits wide. If A is 32-bit, then AAAA would be 32+32+32+32=128-bit. Interestingly there used to be another record type for retrieving IPv6 addresses, A6, that has since been deprecated. Even if you are using IPv4, you can still retrieve the AAAA record of wwww.google.com:

$ dig +short www.google.com aaaa
2607:f8b0:400c:c01::67

What is DNS used for?


By tallying the frequency of requested record types, we can determine the popularity of DNS uses. The requested record type is specified by the query type field of each DNS request. We can retrieve the query type from each packet using tshark. Lets get a list of all requested record types, and how often each record type was requested:

$ tshark -n -r completelog.pcap  -o column.format:'"QTYPE", "%Cus:dns.qry.type"' | tr '[:upper:]' '[:lower:]' | sort | uniq -c | sort -rn > analysis/all_qtypes.txt

The full record type frequency table is available:  (all_qtypes.txt, 408B, text).

The table below shows the top 15 requested record types. Amazingly, the most requested DNS record type is IPv6 address resolution! Considering that other places measure IPv6 DNS traffic at only 15% of web traffic, something is definitely amiss. More on this after the discussion of DNS use.

Rank Query Count Record Type
12050660aaaa
21132372a
3359779mx
447335a6
538404any
625954soa
78155cname
85130ns
94835513
102622txt
111149srv
126981025
13232257
14144ptr
15141spf


Name resolution is by far the most popular use of DNS. Name resolution is responsible for the first, second, fourth, and seventh most frequently requested record types. Amazingly there is a very high frequency of deprecated A6 records. Can there really be that many old BIND servers out there?

The second most popular use of DNS is for email related services. The third most requested record type is MX, which is used for determining the incoming mail servers for a domain. MX records can be viewed from the command line as well:

$ dig +short gmail.com mx
10 alt1.gmail-smtp-in.l.google.com.
30 alt3.gmail-smtp-in.l.google.com.
5 gmail-smtp-in.l.google.com.
20 alt2.gmail-smtp-in.l.google.com.
40 alt4.gmail-smtp-in.l.google.com.

Along with MX, the other records commonly used for email are TXT (to hold SPF and DKIM data) which is the tenth most frequently requested, and SPF (used for SPF data) which is the fifteenth most frequent.

The fifth, sixth, and eighth most frequently record types are used all used for DNS infrastructure purposes. The ANY record type simply retrieves all available records, the SOA record type specifies who is the primary source for information about the domain, and the NS type specifies nameservers that can be used to answer queries about the domain.

The next most commonly requested record type, SRV, is used for custom protocol related records. In practice, most SRV queries are used to retrieve information for Jabber/XMPP and other messaging services, including VoIP/Videoconferencing services.

Finally PTR records are used for reverse DNS lookups. A reverse lookup is performed when you want to map an IP address to a domain name. This is one of the few (maybe the only?) time when you will encounter the .arpa TLD. ARPA originally stood for the Advanced Research Projects Agency, the US Government agency that funded the creation of the Internet. These days .arpa has been backronymed to Address and Routing Parameter Area, and what used to be ARPA is now DARPA.

To request a PTR records for an IPv4 address, the octets of the IP are reversed, and .in-addr.arpa is appended. This is because IP addresses are hierarchical from left to right but DNS is hierarchical from right to left. For example, to see what domain 173.194.75.99 (one of the IPs for  www.google.com) corresponds to, we would use the following command:


$ dig +short 99.75.194.173.in-addr.arpa ptr
ve-in-f99.1e100.net.

The returned domain is not www.google.com, but this is due to Google's infrastructure. There is a clever easter egg in the domain: 1e100 means 1.0 × 10100, which is one googol.

What can we learn about the prevalence of IPv6?


Before we jump to conclusions about IPv6, we should remember that there are outliers in the bitsquatting PCAPs. If you recall from the previous post, there were numerous queries for 0mdn.net because that domain was an authoritative name server. Queries for 0mdn.net might be affecting the record type distribution. Lets filter out these queries:

$ tshark -n -r completelog.pcap -R '!(dns.qry.name contains 0mdn.net)' -o column.format:'"QTYPE", "%Cus:dns.qry.type"' | tr '[:upper:]' '[:lower:]' | sort | uniq -c | sort -rn > analysis/nomdn_qtypes.txt

The full list of record types and their frequencies is available: (nomdn_qtypes.txt, 379B, text).

This command works using the -R option of tshark. The -R option specifies a wireshark display filter that is applied when reading PCAPs. The filter of !(dns.qry.name contains 0mdn.net) will match all packets where the query name field does not contain 0mdn.net. Lets examine the new results:

Rank Query Count Record Type
1550892a
2509605aaaa
3358926mx
426829any
525039soa
67729cname
74835513
84728ns
92597txt
101148srv
116981025
12232257
13222a6
14143ptr
15138spf

The new table is a much different picture with regards to IPv6, but there is still a large amount of AAAA record requests.

Lesson Learned: There are enough AAAA record requests to indicate IPv6 connectivity is important. If you are attempting to re-do the bitsquatting experiment, have IPv6 connectivity and answer AAAA requests!

What is the nature of IPv6 traffic (AAAA record requests)?


Why were there so many AAAA record requests for the authoritative nameservers, and how do these compare to other domains? Lets use tshark to retrieve all AAAA record requests, and which domain was the request was for:

$ tshark -n -r completelog.pcap -R '(dns.qry.type == AAAA)' -o column.format:'"QTYPE", "%Cus:dns.qry.name"' | tr '[:upper:]' '[:lower:]' | sort | uniq -c | sort -rn > analysis/AAAA_queries.txt

The full list of AAAA query frequencies is available: (AAAA_queries.txt, 17KB, text).

AAAA Queries Domain
794921ns2.0mdn.net
774496ns1.0mdn.net
77181static.ak.dbcdn.net
77053support.doublechick.net
66595gmaml.com
58634g.mic2osoft.com
28107s0.0mdn.net
16327www.amazgn.com
13401mail.gmaml.com
6367www.micro3oft.com
5678amazgn.com
4924www.mic2osoft.com
4789www.eicrosoft.com
4578pop.gmaml.com
4346static.ak.fbgdn.net


The two authoritative name servers receive the most AAAA requests, but there are other domains with numerous IPv6 lookups. Maybe these domains are just popular?

Ratio of IPv4 to IPv6 address lookups

The ratio of IPv4 address resolutions to IPv6 address resolutions will show the proportion of IPv6 traffic for each domain. This measurement should completely disregard popularity, as it uses ratios instead of absolute numbers. My hypothesis was that the ratios should be approximately the same for all domains, as none of the domains I bitsquatted were IPv6 related. Lets calculate the ratios. 

Step 1: Calculate A record frequency

The following command will tabulate the frequency of A record requests for each domain:

$ tshark -n -r completelog.pcap -R '(dns.qry.type == A)' -o column.format:'"QTYPE", "%Cus:dns.qry.name"' | tr '[:upper:]' '[:lower:]' | sort | uniq -c | sort -rn > analysis/A_queries.txt

The full list of A query frequencies is available: (A_queries.txt, 99KB, text).

Step 2: Massage Data

The following commands will prepare both the A record frequency and AAAA record frequency tables to be joined on the domain name field.

$ sort -f -k2 analysis/A_queries.txt  > a_q_for_join.txt
$ sort -f -k2 analysis/AAAA_queries.txt  > aaaa_q_for_join.txt

Step 3: Calculate the ratio of A to AAAA record requests

Amazingly, the POSIX standard specifies a relational join command that operates on specially delimited text files. The join command below will join the first file on the second field (-1 2), with the second file also on the second field (-2 2). The second field of both files is the domain name. The output of join is then piped to awk to calculate the ratio of A to AAAA record requests.

$ join -1 2 -2 2 a_q_for_join.txt aaaa_q_for_join.txt | awk '{printf "%d\t%2.2f\t%s\n", $2+$3, $2/$3, $1}' | sort -rn >analysis/ratio_of_a_to_aaaa.txt

The full list of A:AAAA ratios is available: (ratio_of_a_to_aaaa.txt, 18KB, text).

Total Query Count A to AAAA Query Ratio Domain
10957630.41ns1.0mdn.net
10726420.35ns2.0mdn.net
932080.40gmaml.com
808620.05static.ak.dbcdn.net
771470.00support.doublechick.net
701404.23mail.gmaml.com
595000.01g.mic2osoft.com
539692.31www.amazgn.com
432706.62amazgn.com
286940.02s0.0mdn.net
205758.63micro3oft.com
135859.32miarosoft.com
121750.91www.micro3oft.com
107621.19www.mic2osoft.com
903226.62u2s.micro3oft.com

Different domains exhibit a wildly different ratio of IPv4 to IPv6 lookups! Some actually have more IPv6 resolutions than IPv4 resolutions. The mystery is, why is this the case?

Conclusion


IPv6 connectivity is important. When removing outliers, there were almost as many IPv6 resolution requests as IPv4 requests. When investigating in more detail, some domains actually receive more IPv6 resolution requests than IPv4 resolution requests. I do not know why. If you have suggestions, please contact me.

Update:
Part 3 is now up, Bitsquatting PCAP Analysis Part 3: Bit-error distribution.

No comments:

Post a Comment