Friday, November 23, 2012

Bitsquatting PCAP Analysis Part 4: Source Country Distribution

This is part 4 of a multipart series, the previous post is Bitsquatting PCAP Analysis Part 3: Bit-error distribution.

This blog post will examine the source country distribution of packets in the bitsquatting PCAPs. To map a source IP address to a physical location, we will use MaxMind's free GeoLite Data (available at as the data source, and write a quick Python script using pygeoip to do the IP-to-location translation.

IP to Location Translation

First, lets download and decompress the free GeoLite City Database provided by MaxMind:

$ wget
$ gunzip GeoLiteCity.dat.gz

Next, we will install pygeoip. The installation procedures for Python packages vary, but its likely that pygeoip can be installed by setuptools:

# easy_install pygeoip

The pygeoip page on github provides all the necessary usage examples to create an IP-to-country  script. My script, which reads in IPv4 addresses line-by-line on from a file (or stdin) and outputs an "ip:country:city" mapping is available here:

The example usage:

$ ./ --help
usage: [-h] [-d GEOIPDB] [ipfile]

Show city and country of IP addresses using MaxMind GeoIP Database

positional arguments:
  ipfile      a file from which to read IP addresses (default: stdin)

optional arguments:
  -h, --help  show this help message and exit
  -d GEOIPDB  Path to the GeoIPCity database (default: GeoLiteCity.dat)

$ echo '' | ./ View

Source Address Frequency

The first step to mapping source country frequency is to identify source address frequency. While the source address frequency is only an intermediate step to gather source country distribution, it is very handy for a manual analysis of where queries are coming from.

$ tshark -n -r completelog.pcap -o column.format:'"SOURCE", "%s"' | sort | uniq -c | sort -rn > analysis/ips_all.txt

A read-filter can be applied to get the source IPs with the outliers removed:

$ tshark -n -r completelog.pcap -R '!( contains' -o column.format:'"SOURCE", "%s"' | sort | uniq -c | sort -rn > analysis/ips_nomdn.txt

The results for the frequency of all source IPs (ips_all.txt, 848KB, text) and only IPs not requesting (ips_nomdn.txt, 740KB, text) are available for download.

These intermediate results show how many packets were received from each IP. The list is interesting in its own right. The top few results are an unresponsive IP in Poland,  IPs with PTR records pointing to subdomains of (possibly in related to ?), an open-recursive namserver at a Russian ISP, a resolver for LeaseWeb, and an MTA for WindStream Communications. Feel free to investigate more on your own.

Source Country Frequency

To find the frequency of source countries, each address will be mapped to its origin country. Only unique addresses, not how many packets were received from each address, will be counted for the distribution. Some shell commands and the script will identify the source countries. In the commands below, gcut, the GNU version of cut is used since the default cut on Mac OS X cannot handle non-ASCII characters.

$ awk '{print $2}' analysis/ips_all.txt | ./ > analysis/ip_all_location_mapping.txt

$ gcut -f 2 -d ':' analysis/ip_all_location_mapping.txt | sort | uniq -c | sort -rn  > analysis/all_country_frequency.txt

$ awk '{print $2}' analysis/ips_nomdn.txt | ./ > analysis/ip_nomdn_location_mapping.txt

$ gcut -f 2 -d ':' analysis/ip_nomdn_location_mapping.txt | sort | uniq -c | sort -rn > analysis/nomdn_country_frequency.txt

The all country frequency table (all_country_frequency.txt, 1.5KB, text) and the frequency table sans requests for (nomdn_country_frequency.txt, 1.5KB, text) have very similar distributions, only the magnitude changes. This is easier to see in graph form:

Number of packets vs. source country ( all queries )

Number of DNS Packets vs. Source Country (excluding

The <error> field means the MaxMind GeoLite database did not have an entry for the particular IP address.

The large numbers for the US is likely due to the US-centric nature of many of the domains I bitsquatted, such as, and the fact that the US just has considerably more IP allocations than other countries. The extensive world coverage of bitsquatting queries is really quite amazing; there are queries from 192 of the 250 countries in the MaxMind database.

1 comment:

  1. Great Series of articles, thank you. i guess the extracted IP addresses with the corresponding number of how many times source packet started a conversation not the number of packets.

    Thanks again.