Friday, December 6, 2013

I Hate (General Purpose) Computers

I hate computers. More specifically, general purpose computers. They cause me many hours of frustration, mostly due to malware.

Most people don't need or want the freedom to run the malware of their choice. They need a nice computing appliance with a well-designed GUI that "just works". General computing is important, it just shouldn't be the default option.

I propose appliance-default computers with a big red FTC mandated 'general computing' switch. It would save millions of hours in security and support costs, while protecting consumer freedom.

Anger and Frustration


It all started over Thanksgiving. Once again, it was time to answer family computer questions.

My father asked, "How can I be absolutely sure I don't get infected with CryptoLocker?". He was very concerned. It was on the news, and there was a warning email at work.

Unfortunately, there was nothing more I could tell him. He already does everything right, and could still be infected with CryptoLocker. There's nothing I can do: he has a computer and it can run malware. Sure there are precautions, but these are mostly useless.

Malware Precautions: Largely Useless


These (largely useless) precautions to avoid "being a victim" just happened to be on the news as I was drafting this blog post. The news report was about the recent social media password leak.

The precautions:

These precautions try to mask the core issue: malicious code can run on a computer, and there is nothing you can do about it except live in fear of every website and email attachment.

Even when following every single precaution, you could still be infected with malware.

Computers vs. Computing Appliances


The problem is that my father has a computer. A computer is a platform that permits arbitrary code execution. This encompasses pretty much all desktops and laptops.

What he needs is a computing appliance with a large monitor and a keyboard. A computing appliance is a platform that only permits execution of pre-approved code, like iOS or Windows on ARM.

In fact, the vast majority of people only need a computing appliance. They will never, ever develop software. They have no interest in running arbitrary, unapproved applications. The only unapproved code they will ever run is ZeuS or CryptoLocker.

A Computing Compromise


Every time OS vendors try to move into a direction of computing appliances, a vocal minority screams bloody murder. Just look at what happened when Microsoft introduced Secure Boot with Windows 8.

To some extent, these people have a point.

Computing appliances have many faults:

Of these, the last is the most important and can't easily be solved by competition between vendors.

It is important to let those who want to modify their computer and their software to do as they see fit. It just shouldn't be the default option.

The best execution of this I've heard of is the Developer Mode switch on Google's chromebooks. You have to physically flip a switch that allows unrestricted code execution. Additionally, flipping the switch wipes all local data.

It's a beautiful solution: there is no accidental enabling, and it prevents 'evil maid' attacks.

There is, of course, little profit in having a general computing mode in appliances. Most customers wouldn't use it, and it would cost time and effort to maintain. The only purpose would be to protect consumer freedoms.

Which is why computing appliances are a perfect target for government regulation. The FTC can require all computing appliances to ship with a 'general computing' switch to protect consumers from malware and controlling vendors. The millions of hours in saved frustration and tech support would be well worth it.

Saturday, November 9, 2013

A Bit Flip That Killed?

During my bitsquatting research I was amazed how many critical RAM chips in a typical PC lack error correcting memory.

It turns out that ECC is missing from an even more critical device: cars.

Details from the recent Toyota civil settlement show that the drive-by-wire control of Toyota cars was lacking error detection and correcting RAM.

From EDN.com:
Although the investigation focused almost entirely on software, there is at least one HW factor: Toyota claimed the 2005 Camry's main CPU had error detecting and correcting (EDAC) RAM. It didn't. EDAC, or at least parity RAM, is relatively easy and low-cost insurance for safety-critical systems.

I can't fathom why that would ever be the case. The amount of RAM required is relatively small, and the extra cost is inconsequential to the total cost of a car. Oh, and the software runs next to a car engine.

"We've demonstrated how as little as a single bit flip can cause the driver to lose control of the engine speed in real cars due to software malfunction that is not reliably detected by any fail-safe," Michael Barr, CTO and co-founder of Barr Group, told us in an exclusive interview. Barr served as an expert witness in this case.

Drive-by-wire systems aren't the only critical control systems susceptible to bit-errors. There is some speculation that a bit-error caused a sudden altitude drop in a Qantas A330. Amazingly, airplane software systems did not have to consider single or multiple bit errors until 2010 (see page 222) to achieve certification.

Monday, October 21, 2013

Git and Bit Errors

Finally, a topic to unite my two most popular blog posts: git failures and bitsquatting.

A friend recently pointed me to an amazingly detailed investigation of a corrupted git repository. The cause of the corruption? A single bit flip. To quote the source:

As for the corruption itself, I was lucky that it was indeed a single
byte. In fact, it turned out to be a single bit. The byte 0xc7 was
corrupted to 0xc5. So presumably it was caused by faulty hardware, or a
cosmic ray.
And the aborted attempt to look at the inflated output to see what was
wrong? I could have looked forever and never found it. Here's the diff
between what the corrupted data inflates to, versus the real data:
  -       cp = strtok (arg, "+");
  +       cp = strtok (arg, ".");


It is quite amazing to see evidence of a bit error resulting in a perfectly innocuous, syntactically valid and yet completely erroneous change in a real program and a real codebase.

How many times does this happen without anyone noticing?


Friday, September 13, 2013

Bitsquatting at DEFCON21 and More

I was very excited to see that several researchers are investigating bitsquatting and writing about it. There were two presentations about bitsquatting at DEFCON 21, a presentation at ICANN 47, and a research paper presented at WWW2013.

Jaeson Schultz - DEFCON 21 - Examining the Bitsquatting Attack Surface
Jaeson presented some excellent ways to exploit bitsquatting that I did not think of -- such as using bitsquats in URL delimeters to target otherwise unexploitable domains. As an example taken from the paper, ecampus.phoenix.edu can become ecampus.ph/enix.edu/.

Additionally Jaeson presents a great mitigation that can be implemented at the local level -- Response Policy Zones. From the paper:
An RPZ is a local zone file which allows the DNS resolver to respond to specific DNS requests by saying that the domain name does not exist (NXDOMAIN), or redirecting the user
to a walled garden, or other possibilities. To mitigate the effects of single bit errors for users of a DNS resolver the resolver administrator can create a Response Policy Zone that protects against bitsquats of frequently resolved, or internal-only domain names.  
 

Robert Stucke - DEFCON 21 - DNS Has Been Found To Be Hazardous To Your Health
Robert demonstrated some new vectors for bitsquatting, such as web applications and hosted email providers. Speifically, he bitsquatted gstatic.com (a site that serves static content for Google properties). Not only was he able to return arbitrary content to people using Google's search services, he could also affect web applications, such as feed readers, that rely on correct resolution of gstatic.com. Robert also bitsquatted psmtp.com, a hosted email provider. This allowed him to potentially receive other people's email.

Nigel Roberts - ICANN 47 - Bitsquatting
Nigel (who runs .gg) presented about bitsquatting to ICANN. Hopefully this will result in more research at the ccTLD level. 

Nick Nikiforakis, et al. - WWW2013 - Bitsquatting: Exploiting Bit-flips for Fun, or Profit?
Nick and his coauthors did a measurement study about the prevalence of bitsquatting and what content appears on bitsquatted domains. They identified several that are used for adverstising, affiliate programs, and malware distribution.  There is also a great graph in the paper where you can see a huge spike in bitsquat domain registration after my Blackhat presentation :).

Thursday, August 1, 2013

Introducing Binfuzz.js

Tomorrow morning I will be giving a demonstration of Binfuzz.js at Blackhat Arsenal 2013. Please stop by the Arsenal area from 10:00 - 12:30. The slides are already available on the Blackhat website.

The Binfuzz.js page on dinaburg.org is now live, and all the code is uploaded to Github.

What is Binfuzz.js?


Binfuzz.js is a library for fuzzing structured binary data in JavaScript. Structured binary data is data that can be easily represented by one or more C structures. Binfuzz.js uses the definition of a structure to create instances of the structure with invalid or edge-case values. Supported structure features include nested structures, counted arrays, file offset fields, and length fields. The live example uses Binfuzz.js to generate Windows ICO files (a surprisingly complex format) to stress your browser's icon parsing and display code.

Features


Binfuzz.js includes support for:

  • Several predefined elementary types: Int8, Int16, Int32 and Blob.
  • Nested structures
  • Arrays
  • Counter Fields (e.g. field A = number of elements in Array B)
  • Length Fields (e.g. field A = length of Blob B)
  • File Offsets (e.g. field A = how far from the start of the file is Blob B?
  • Custom population functions (e.g. field A = fieldB.length + fieldC.length)

The ICO fuzzing example includes uses of all of these because I needed them to implement ICO file generation.

Combinatorics


Binfuzz.js calculates the total number of combinations based on how many possible combinations there are for each individual field. It is then possible to generate a structured data instance corresponding to a specific combination number. It is not necessary to generate prior combinations. This way random combinations can be selected when fuzzing run time is limited.

Why?


The best way to learn is by doing, and I wanted to learn JavaScript. So I decided to create an ICO file fuzzer in JavaScript. I chose ICO files because of favicon.ico, a file browsers automatically request when navigating to a new page. After starting the project, I realized I got a lot more than I bargained for. Icons are a surprisingly complex format that has evolved over time. There are several images in one file, each image has corresponding metadata, there are internal fields that refer to offsets in the file, and the size of the icon data for each image depends the metadata. All of these interlinked reationships need to be described and processed by Binfuzz.js.

Monday, July 29, 2013

A Travel Story

I travel frequently, and not just to the usual tourist destinations. I've gone to places like Singapore, Japan, Ukraine, and India. This is a story of trying desperately to get home from a recent trip.

Be Aware of You Surroundings


The first sign of trouble was when the roof started leaking. The storm outside had only been raging for half an hour before the two droplets landed on my head. The buckets and signs on the floor captured most of the leaks, but there were numerous unmarked drips. You don't want this kind of water in your hair or food. Always be aware of your surroundings.

I arrived at my gate to find a delayed flight. At least there was now time to eat, since all the good food options were in the international terminal. Casually I trekked over, scarfed down some ethnic food and began meandering back to the gate.

An Ominous Sign


The blaring alarm noises and the flashing emergency lights of the fire alarm told me something was wrong. Neither the airport staff, nor security, nor the airline staff knew what was happening. Just that there wasn't a fire. Maybe. There was no smoke, no firemen, and everyone was calm. Carefully and slowly, I continued towards the gate. Then the power went out.

It was still daytime, and the sunlight and fire alarm (probably on backup power) provided enough lights to get by. The gate was in complete chaos. None of the computers worked, and the staff tried their best to assuage angry passengers. Some were just grumpy, others in tears, but all wanted some answers. There were no answers, no air conditioning, and it was getting hot.

The darkened terminal was like an impoverished refugee camp. Uniformed staff handing out bottles of water to angry men, crying women and screaming children. Mobs of people begged staff for answers. It was dark, hot, loud, and no one knew what was going on. This went on for two hours.

Then the plane arrived, but we couldn't board. The jetbridges were electric and wouldn't extend. Other parts of the airport had power, but the airline couldn't or wouldn't use the working jetbridges. The flight was cancelled, but we weren't rebooked to a new flight because there was no power. We had to call the central reservation office. This was never announced, of course. I happened to overhear another passenger talk with the gate agent.

So I called. After 30 minutes, a man with an accent answered. He sounded legitimately concerned, but all the flights for the day were sold out. I asked if he could rebook me on a competing airline; he typed something on a keyboard and then told me that all flights on all airlines to my destination were sold out. No inventory; The earliest possibility was the next night. Begrudgingly, I agreed. I asked about hotel vouchers. "Of course, the gate agents can print them for you", he replied. The power was still out.

Perseverance


I wasn't about to spend another night in this place. There was still one flight on a competing airline, and it was leaving soon. Sure I was told there were no seats, but one can't blindly trust a company to do something that lowers profit. And besides, maybe someone wouldn't show up. The competing airline was naturally in the furthest possible terminal from where I was. It was a long walk, but they probably power.

Turns out there were seats on the flight. There would be a cost, but I would get home, today. Life was also better in this part of the airport. There were no leaks in the roof and the air conditioning was on. Nobody was crying. I sat down near the new gate.

There was a current flight there, delayed indefinitely. Must be weather, I figured. That assumption was shattered when I heard two airline employees talking next to the gate entrance. Turns out the plane needed fuel. They sent for a fuel truck, but it arrived without fuel. For the past half hour they were trying to find either fuel, a new fuel truck, or whomever got them into the boondoggle. No one was answering. After 15 more minutes, they found a new truck. Two hours later my plane arrived.

Don't Count Your Chickens...


As we boarded, I realized there were plenty of seats. The flight was only half full. So much for "no seats available." As we taxied to the runway, I was thankful to finally be out of this wretched place.

My enthusiasm was premature. Literally as we were next in line to depart, our plane was directed to stop. After an hour on the tarmac, the full story unfolded. The original flight path had unexpected weather. We had an approved alternate flight path, but in the time it took us to taxi (about 30 min in the rain at this airport), the alternate flight path also had unexpected weather. It was 30 more minutes before we moved again.

Departure At Last


This time, it was for real. Our wheels touched off the ground and we ascended into the stormy sky. The plane shook violently as we passed through the rain, wind, and lightning. I clutched my seat and thought about this abhorrent place: the fire alarm, the crying passengers, the hot, dark, sweaty terminal, helpless staff, the leaky roof and fuel-less fuel trucks. I just wanted to go home and to never again fly into Philadelphia International Airport.

Sunday, July 21, 2013

Readability Improvements

In preparation for new content relating to my Blackhat 2013 Arsenal presentation, I made some readability improvements to the blog and to dinaburg.org in general.

The layout is wider and the font size for text is now 16 pixels.

Please let me know what you think.

Friday, July 12, 2013

Git Fails On Large Files

Turns out git fails spectacularly when working with large files. I was surprised, but the behavior is pretty well documented. In typical git fashion, there is an obscure error message and an equally obscure command to fix it.

The Problem


A real-life example (with repository names changed):

artem@MBP:~/git$ git clone git@gitlab:has_a_large_file.git
Cloning into 'has_a_large_file'...
Identity added: /Users/artem/.ssh/devkey (/Users/artem/.ssh/devkey)
remote: Counting objects: 6, done.
error: git upload-pack: git-pack-objects died with error.
fatal: git upload-pack: aborting due to possible repository corruption on the remote side.
remote: Compressing objects: 100% (5/5), done.
remote: fatal: Out of memory, malloc failed (tried to allocate 1857915877 bytes)
remote: aborting due to possible repository corruption on the remote side.
fatal: early EOF
fatal: index-pack failed

I pushed the large file without issues, but couldn't pull it again because the remote was dying. The astute reader will notice the remote was running gitlab. The push also broke the gitlab web interface for the repository.

From my Googling, the problem is that the remote side is running out of memory when compressing a large file (read more about git packfiles here). Judging by the error, git attempts to malloc(size_of_large_file) and the malloc fails.

This situation raises conundrums that may only be answered by Master Git:
  • Why was I able to push a large file, but not pull it?
  • Why would one malloc(size_of_large_file) ?
  • What happens when you push a >4Gb file to a 32-bit remote?

I was curious enough about the last one to look at the code: it will likely die gracefully (see line 49 of wrapper.c). Integer overflow likely avoided; would need to read more code much more carefully to be sure.

The Solution


In theory, the solution is to re-pack the remote with a smaller pack size limit. That requires ssh access to the remote repository, which I don't have. So the following fix is untested, and taken from http://www.kevinblake.co.uk/development/git-repack/. The obscure command in question (must be run on the remote):

git repack -a -f -d

Of course, repacking the remote but having non-repacked local repositories around may cause other problems.

Just For Fun


Here is another large file fail:

artem@MBP:~/temp/largerandomfile$ dd if=/dev/urandom of=./random_big_file bs=4096 count=1048577
1048577+0 records in
1048577+0 records out
4294971392 bytes transferred in 437.836959 secs (9809522 bytes/sec)

artem@MBP:~/temp/largerandomfile$ git add random_big_file
artem@MBP:~/temp/largerandomfile$ git commit -m "Added a big random file"
[master 377db57] Added a big random file
1 file changed, 0 insertions(+), 0 deletions(-)
create mode 100644 random_big_file

artem@MBP:~/temp/largerandomfile$ git push origin master
Counting objects: 4, done.
Delta compression using up to 2 threads.
Compressing objects: 100% (2/2), done.
error: RPC failed; result=22, HTTP code = 413 KiB/s
fatal: The remote end hung up unexpectedly
Writing objects: 100% (3/3), 4.00 GiB | 18.74 MiB/s, done.
Total 3 (delta 0), reused 1 (delta 0)
fatal: recursion detected in die handler
Everything up-to-date

Everything up-to-date, indeed.

Sunday, June 16, 2013

OptimizeVM: Fast Windows Guests for VMware

Do you make Windows VMs? Are they slow? OptimizeVM will make them fast(er).

I got tired of re-googling and re-typing the same commands over and over make my Windows VMs fast, so I collected them on Github: https://github.com/artemdinaburg/OptimizeVM.

OptimizeVM is based on the commands provided in the official VMware View Optimization Guide.

The goal of OptimizeVM is to minimize disk access and remove fancy graphical effects. Certain Windows features, such as Windows Search, System Restore, Windows Updates, and Registry Backup will cause constant background disk access. The disk access makes your VM slow and increases virtual disk size. These features are also unnecessary for VMs that get reverted to a clean snapshot every couple of days.

The script also removes some annoyances like the Action Center, Network Location Wizard, hidden file extensions, and so on.

The disabling of some features, such as Windows Firewall, Windows Defender, and Windows Update do lower the security of your system. If you are very worried, turn them back on. I leave them off since in my workflow VMs get reverted to a clean state every few days.

If you want to speed up your Windows VMs, here are a few more useful links:



Tuesday, April 30, 2013

JavaScript Frustrations and Solutions

Since there's no better way to learn than by doing, I've been teaching myself JavaScript by writing a structured binary data fuzzer. The fuzzer currently generates Windows ICO files, and will soon be released. In the meantime, I wanted to describe some frustrating experiences learning JavaScript and include solutions to them.

Object Orientation in JS is Confusing 


Some of this may be because I am used to class-ical inheritance, but considering the number of JavaScript OOP libraries (e.g. oolib, dejavuKlass, selfish), I'm not alone.

The first confusing thing is that objects are functions declared via the function keyword and instantiated via the new operator. The overloaded use of function doesn't let you know right away if the code you are reading is an object a traditional function. The use of the new operator gives a false impression of class-ical inheritance and has other deficiencies. For instance, until the introduction of Object.create it was impossible to validate arguments to an object's constructor. The deficiency is shown in the following example.

In this motivating example, we want to create an object to encapsulate integers and validate certain properties in the object's constructor. The initial code could look something like this:

function Int(arg) {
    console.log("Int constructor");
    this.name = arg['name'];
    if(this.name === undefined)
    {
        alert('a name is required!');
    }
    this.size = arg['size'];
};
Int.prototype.getName = function() {
    console.log("Int: " + this.name);
};
var i = new Int({'name': 'generic int'});
i.getName();

Running this code would print:

Int constructor
Int: generic int

But now lets say I want to write something to deal specifically with 4-byte integers. The initial code to inherit from the Int object would look similar to the following:

function Int4(arg) {
    arg['size'] = 4;
    Int.call(this, arg);
    console.log("Int4 constructor");
};
Int4.prototype = new Int({});
Int4.prototype.constructor = Int4;
Int4.prototype.getName = function() {
    console.log("Int4: " + this.name);
};
var i4 = new Int4({'name': '4-byte int'});
i4.getName();

This code will alert with 'a name is required'! To set Int4's prototype chain we need to create a new Int object. Arguments to the constructor cannot be validated since they are not known when new Int({}) is called. Luckily this has been fixed by use of Object.create:

function Int4(arg) {
    arg['size'] = 4;
    Int.call(this, arg);
    console.log("Int4 constructor");
};
Int4.prototype = Object.create(Int.prototype);
Int4.prototype.constructor = Int4;
Int4.prototype.getName = function() {
    console.log("Int4: " + this.name);
};
var i4 = new Int4({'name': '4-byte int'});
i4.getName();

All Functions are Function Objects and all Objects are Associative Arrays.


All functions are actually Function objects, all objects are associative arrays. There are also Arrays, which are not functions and but are also associative and also objects. Sometimes you want Arrays to be arrays, and sometimes you actually want Objects to be arrays. Confused yet?

Scoping Rules and Variable Definition Rules that Lead to Subtle Bugs


Scoping rules are a bit confusing, since there is at least three ways to declare variables: assignment, var, and let. Of course, all of these have different semantics. The biggest problem for me was that creating a variable by assignment adds it to the global scope, but using var will keep it in function scope. And when using identically named variables, a missing var in one function will make that function use the global variable instead of the local. Using the wrong variable will lead to lots of frustrating errors.

The solution is to always "use strict" to force variable definitions. Of course, doing this globally will break some existing libraries you are using. Such is life.

Type Coercion With the Equality Operator (==)


Its amazing what is considered equal in JavaScript via ==. Instead of restating all these absurdities, I'll just link to someone else who has:
http://javascriptweblog.wordpress.com/2011/02/07/truth-equality-and-javascript/

When I started my project, I didn't realize that the Strict Equality (===) existed. It should be used anywhere you would expect == to work. It seems more sane to have == be Strict Equality, and another Coercive Equality operator (something like ~= or ~~), but what is done is done.

Problems Modularizing and Importing Code


C/C++ has #include, Python has import, JavaScript has... terrible hacks. There is sadly no standard way to import new code in a .js file, making modularization of your code difficult. I resorted to simply including prerequisite scripts in the HTML where they will be used, but I wish there was a way to include JavaScript from JavaScript.

Browser Compatibility Issues


Not all browsers have Object.create. Not all browsers have console.log in all situations. Not all browsers support "use strict". Turns out every browser is slightly different in a way that will subtly break your code, but of course the main culprit is usually IE.

Wednesday, March 20, 2013

Solution to Printing Blank Pages Problem in Linux

This isn't an overly technical post but I hope it saves someone hours of frustration printing on Linux. 

In my case the problem was a combination of broken generic printer drivers and a bad default value for the "Print Quality" setting. As a word of caution, according to the Anna Karenina Principle odds are your problem is its own unique snowflake and this wont help you print.

Problem 

  • You are trying to print from Linux. 
  • The printer starts, makes printing noises, but only a blank page (i.e. one with no ink on it) comes out.
  • You verified your printer works by printing from another OS. If you have not, do this. If your printer still prints blanks on Windows/MacOS, you have a printer problem, not a Linux problem.

Solution

The solution is two part; both parts were needed to actually see ink on paper.
  1. Install printer-specific software.

    The drivers that came with CUPS and claimed to support my printer didn't work. For HP printers, you need to sudo apt-get install hplip, and run hp-setup. If you have another brand printer, look here for help.

  2. Change the "Print Quality" setting to something else.

    The setting is in the CUPS web interface. Go to http://localhost:631 (you may need to log in with a local account) -> Administration -> Manage Printers -> Your Printer's Name -> Administration Selection Box, pick "Set Default Options". Clicking that will take you to the following page:
Change the Print Quality setting to something else. Try all the values. For me Normal Grayscale worked, Normal Color did not.

Try all the Print Quality options. Hopefully one of them prints. Yes, the setting is hard to find to and obscure, but hey, at least you didn't have to edit another config file!

My next post may be about trying to get network printer sharing to work between Linux and Mac OS X Mountain Lion, which was its own struggle.

Monday, February 18, 2013

Your Missing Package: When Address Correction Fails

Amazon address correction is wrong for large parts of Chicago. This leads to late and missing packages. This handy map shows areas most affected by address correction failure. To avoid delivery problems always use your full ZIP+4 when placing online orders. You can find the full ZIP+4 for your address via the USPS ZIP Code (TM) Lookup Tool.

I don't mean to pick on Amazon -- this problem has happened with several other retailers. I used Amazon because it was easy to cross-check their address verification with USPS. If you are an online retailer, make sure you have a working address correction system. If Amazon can get it wrong, what makes you think yours works? Bad address correction is costing you customers.

The Problem

Have your Amazon packages ever been late or missing?
Have you ever gotten a "notice left" email but no notice?
Did USPS confirm delivery but there was no package?
Do you only use a 5 digit ZIP code when filling out your address?

You may be a victim of address correction failure. And you are not alone.

Here is how to check:

First, go to "manage addresses" and look at your address on Amazon.
Now, go to the USPS ZIP Code (TM) Lookup Tool and check your address.

If the full 9 digit ZIP Codes do not match, there is a problem. If you live in Chicago, I made a heat map of where verification failures are most likely to occur.

Address Verification Failures

Mailers validate your address prior to shipment to save money on shipping costs. The address validation step is called Delivery Point Validation (DPV), and it requires a complete mailing address including a full 9 digit ZIP Code. Since few people know their full ZIP Codes, a suite of software called Coding Accuracy Support System (CASS) will correct an address into one that can be checked via DPV. The correction step can fail, and "correct" your address to a different building. To find out why, its time for a quick lesson on DPV, CASS, and ZIP Codes.

Note: I am not an expert on mailing, this information is what I have learned from judicious searching. It may be wrong. If I am, please correct me.

DPV and CASS

Mailers use DPV to ensure an address is deliverable before passing the mail to USPS. In return, they receive discounted postage rates for reducing the work USPS has to do. From The History of Worksharing Discounts and CASS Certified™ Software:

In 1983, the United States Postal Service (USPS) implemented a program that provided mailers a postage discount for sharing the work to prepare the mail for processing. This allowed the USPS to provide more cost-efficient mail processing based on the advance work performed by the mailer in providing high-quality addresses for their mail.

People are notoriously bad typers and spellers, and tend to omit information. Before a delivery point is verified, an address has to go through a Coding Accuracy Support System (CASS) check. The CASS software will fix an address to one that can be validated by DPV. From the Wikipedia page:

The input of:
1 MICROWSOFT
REDMUND WA
Produces the output of:
1 MICROSOFT WAY
REDMOND WA 98052-8300

CASS software has to be certified by the USPS and has to undergo certification testing every two years. The caveat is that CASS validation only checks address matching, not the accuracy of the matched address. From the USPS:

However, CASS processing does not measure the accuracy of ZIP + 4, delivery point, 5-digit ZIP, or carrier route codes in a mailer’s address file.

If the mailer's ZIP+4 database is wrong, CASS can't fix it.

Why do ZIP+4 Codes matter?

In a city, a ZIP+4 will determine the building or even the floor or group of apartments a piece of mail goes to. From the USPS website (emphasis mine):

The ZIP+4 Code was introduced in 1983. The extra four numbers allow mail to be sorted to a specific group of streets or to a high-rise building. In 1991, two more numbers were added so that mail could be sorted directly to a residence or business. Today, the use of ZIP Codes extends far beyond the mailing industry, and they are a fundamental component in the nation’s 911 emergency system.

If the ZIP+4 code is wrong, your mail goes to the wrong building. Your mailman might not catch this. Mail with electronic mailing information (i.e. pretty much all packages from online retailers) is automatically sorted and binned by machines. On busy urban routes the mailman doesn't know everyone and they aren't going to check every single piece of mail. They're going to take machine sorted mail bin, deposit it at the address they always do, and move on. If you're lucky, you may get a redelivery notice.

... but Amazon ships via UPS/Fedex?

UPS and FedEx may do hand-off to USPS for final delivery. This is a part of USPS work-share programs that UPS calls a mailing innovation.


The Address Verification Failure Map

The following map shows differences between ZIP+4 Codes returned by USPS and ZIP+4 Codes corrected by Amazon for 1,857 addresses in the City of Chicago. Green markers mean a match, blue markers represent ZIP+4 Codes from USPS, and yellow markers represents ZIP+4 codes from Amazon. A red connecting line associates the USPS and Amazon results for the same address.


There are correction mistakes throughout the City, with the most mistakes in the Loop and the area immediately to the north and northwest. This correlates pretty well with the number of large apartments and condos, and hence specificity of ZIP+4 codes.

I chose Chicago addresses because thats where I live. The addresses were a random sampling from the City of Chicago business license holders. The City of Chicago has an excellent open data site at https://data.cityofchicago.org/. This research would not have been possible without it.

I sampled 2000 addresses out of a possible 381677. Of these, 143 (~7%) addresses were not found -- that is, either the USPS or Amazon had a failure in obtaining a ZIP+4 for the address. There were 519 (~26%) addresses with a different ZIP+4 between USPS and Amazon, and 1338 (67%) addresses with the same ZIP+4.

I am making available the addresses used to generate this map.

File Metadata Description
zip_diffs.txt41KB, textZIP+4 Differences
zip_equals.txt100KB, textZIP+4 Matches
zip_fails.txt11KB, textFailure to get ZIP+4 for an address

My verification scripts would select the first suggested address or the automatically corrected address (assuming no address was suggested) given by Amazon. For some streets, the suggested address was very far from the initial input. No human would have selected it, so the most egregious correction errors would likely have been caught. The places where the yellow and blue marker are close together are the most dangerous -- it is likely only a +4 digit difference which most users (like myself) would never notice.

To map ZIP+4 addresses to latitude/longitude and to create the map, I used the MapQuest API. MapQuest may seem like an odd choice, but it had great documentation and examples, and it was the first service I could find with support for mapping a ZIP+4 to latitude/longitude.

Backstory

I recently moved to Chicago with only what I could fit in my car, which meant I had to buy a lot of household items. I do most of my shopping online since I hate the crowds, salesmen, and poor selection at brick and mortar stores. This means I buy a lot of stuff on Amazon.

I first became suspicious when I received the following email:

Fool me once, shame on you.
It is impossible to leave an unattended package at my address. The building has 24/7 front desk staff and a dedicated package receiving room. I dutifully filled out the re-delivery form, and received my package a few days later. I thought nothing of it until I received this second email:

Fool me twice, shame on me.
Around the same time my fiancee had several packages (not from Amazon, but other vendors) never arrive, despite USPS confirming delivery. Something was wrong, it was time to investigate.

The addresses on the re-delivered package labels, order confirmation, and amazon.com all seemed correct. The front desk staff hadn't noticed any delivery attempts, and no packages had been left for me.

I was stumped and considered just not shopping online, until I had a thought: USPS re-delivery worked, but original delivery sent it to a mystery address. Was there a difference between the USPS address and the Amazon.com address?

Sure enough, there was. The ZIP+4 code had the wrong +4 digits. Searching online for the ZIP+4 Code from USPS results only in matches with my building's address. Searching for the ZIP+4 Code from Amazon results only in matches from buildings a few numbers down, with no front desk staff.

Mystery solved.

I immediately emailed Amazon with the problem. This was in mid January. As of February 18th, my address is still corrected to the wrong ZIP+4 Code.

A Bigger Problem

Did I just live at the wrong address, and this was an isolated case, or if there was a more systematic address correction problem?

That is why I made the map. Turns out some areas are more affected than others, and that my address is not the only one. I hope that by exposing this publicly I can help others avoid the hassle and headaches of online ordering. 

Conclusion

Major vendors, including Amazon, get address correction wrong. In my sample of Chicago business addresses, 26% had a ZIP+4 that did not match the one returned by USPS.

If you are an online retailer, please check your CASS and DPV software. Don't just assume it works, but write some scripts to test it yourself. Your customers will thank you. If your customers complain about missing packages, check that their address corrects properly.

If you buy things online, memorize your ZIP+4 Code and use the full code where you can. If you live in an urban area, and the vendor only accepts a 5 digit ZIP, shop somewhere else because you may never get what you bought.

Saturday, January 5, 2013

The Internet Sign


The Internet. It enhances communication, enables global commerce, and has become an indispensable part of people's daily lives. The Internet disseminates information around the globe and helps bypass censorship in repressive regimes. It is a great force for good, and some have said, has resulted in the largest legal creation of wealth on the planet.

What commemorates the creation of the Internet? There is a plaque at Stanford University. And near a "No Parking" sign outside the former ARPA building in Arlington County, Virginia there is a sign.

The Internet Sign.

I refer to the sign as the Internet Sign to make its significance is more obvious, but more technically it is the ARPANET Sign.

The sign is near the corner of Oak St. and Wilson Boulevard in Arlington, Virginia. It is not (yet) visible on Google Street View. The location of the sign is the old ARPA building. ARPA moved to the Wilson Boulevard location from the Pentagon, then as DARPA it moved to 3701 N. Fairfax Dr. DARPA recently moved again, still within Arlington County, to 675 N. Randolph Street.

The following text appears on the sign:

ARPANET
THE ARPANET, A PROJECT OF THE
ADVANCED RESEARCH PROJECTS AGENCY
OF THE DEPARTMENT OF DEFENSE,
DEVELOPED THE TECHNOLOGY THAT
BECAME THE FOUNDATION FOR THE
INTERNET AT THIS SITE FROM 1970 TO
1975. ORIGINALLY INTENDED TO SUPPORT
MILITARY NEEDS, ARPANET TECHNOLOGY
WAS SOON APPLIED TO CIVILIAN USES,
ALLOWING INFORMATION TO BE RAPIDLY
AND WIDELY AVAILABLE. THE INTERNET,
AND SERVICES SUCH AS E-MAIL,
E-COMMERCE AND THE WORLDWIDEWEB,
CONTINUES TO GROW AS THE UNDER-
LYING TECHNOLOGIES EVOLVE. THE
INNOVATIONS INSPIRED BY THE
ARPANET HAVE PROVIDED GREAT
BENEFITS FOR SOCIETY.
ERECTED IN 2008 BY ARLINGTON COUNTY, VIRGINIA 

Below the main text is a smaller plaque with binary digits:

The binary (01000001 01010010 01010000 01000001 01001110 01000101 01010100) spells ARPANET in ASCII.

The Internet Sign wasn't actually erected in 2008; the unveiling ceremony happened in 2011. ARLnow has reasons for the delay:
According to Arlington spokeswoman Diana Sun, the county was unable to get permission from the building owner to put the sign on their property, so they had to go through a lengthy process of getting the sign installed in the public right-of-way (sidewalk). By the time all the pieces were in place, and by the time they could organize a small ceremony at a County Board meeting, it was 2011 — three years later than originally planned.

Which building owner that didn't want the sign on their property? A glance at Google maps will show the adjacent land is used by the US State Department. Why would the State Department refuse to commemorate a tool that has allowed uncensored information to reach the oppressed masses? I imagine security concerns about tourists congregating so close to a government building.

While I am sure the State Department's reasons for not hosting the Internet Sign are sound, the result is a rather sad commemoration. Surely there is a more tactful way to acknowledge the creation of the Internet than by a sign on the sidewalk.