The Cloud Saves Big Data For NARA

on April 04, 2012 at 5:24 PM


A cloud environment came to the rescue to expand capacity and speed up access for the National Archives and Records Administration this week as part of a contingency plan that can be used by any federal agency anticipating a big demand for huge amounts of data from the IRS to the Department of Agriculture.

This incident, spawned by the 1940 Census records release online April 2, underscores the importance and flexibility of cloud computing.

The release was such a smash hit that its response time slowed down to a snail’s pace within three hours after the launch. The site logged more than 22.5 million hits during that time frame as a frenzied public demanded more and more data. And in the first 48 hours, it had over 200 million hits.

NARA was suddenly thrust into an emergency mode to move quickly to make the response time faster.

The server never crashed, but the load caused a long queue that resulted in lengthy delays in returning returns, NARA CIO Michael Wash told Breaking Gov in an email.

The data was in an Amazon-hosted, Inflection managed cloud environment, according to Wash. This environment provided the flexibility to scale up computing capabilities to adjust to access demands. It was this capability that allowed the NARA and Inflection team to quickly adjust the environment and deliver the high performance.

NARA contracted with Inflection, a web company, to develop the 1940 website and used Amazon Web Services to host the site. But no one anticipated how high the demand would be on Day 1.

“The access demand far exceeded our expectations, which were already very large,” Wash said. “We modeled the initial load after the experience the United Kingdom had when they released their 1911 census in 2009. We scaled our expected load based on population differences and added a significant safety factor, but even with this intensive planning effort, the realized load was higher than expected.”

NARA was able to increase the capability by adding additional computing capabilities in the Amazon cloud environment, Wash said.

And it was helped by extensive improvements to the archives.gov website in preparation for the 1940 release. The site was moved to a cloud environment and a content delivery network was added to handle the additional load to the NARA site, he said.

One piece of additional planning for every federal agency to include: The site has been reviewed and confirmed to comply with NARA’s IT security requirements, Wash said.