Not since the Truman administration more than 60 years ago has a U.S. president decided to overhaul how the federal government manages its records.

But according to records management experts, newly emerging technologies will likely be needed in order for agencies to meet the president’s policy goals.

In the meantime, agencies racing to move email and other documents off their own servers and into the cloud could be looking at a train wreck, warns a senior official at the National Archives and Records Administration. That is if they don’t take greater measures to identify and segregate documents that must legally be preserved.

Recognizing that agencies were failing to keep up with the explosive shift from paper to digitally-generated information across the federal government, the Obama administration issued a landmark directive a year ago Nov. 28 aimed at moving federal record-keeping fully into the digital age.

The directive calls for federal agencies, by the end of the decade, to manage all permanent electronic records in an electronic format, and to start the transition by the end of next year.

“We are the first government archive in the world to demand this,” said NARA’s director of litigation, Jason R. Baron.

The White House also directed the chief executives at every federal department and agency to appoint a senior agency official (at the assistant secretary level or equivalent), and give them broad responsibilities and funding, by Nov. 15 of this year, to ensure their agencies were complying with records management regulations and policies.

To the surprise of government skeptics, a majority of agencies completed that directive ahead of schedule. More than 100 newly-appointed executives and other officials gathered together for the first time at a Nov. 28 event held by Archivist of the United States David Ferriero, Chief Records Officer for the Federal Government Paul Wester, Director of Policy Analysis and Enforcement Don Rosen, and the head of the Project Management Office for the directive, Preston Huff.

NARA’s Don Rosen, Paul Wester and Preston Hall

But the hard part is just beginning as agencies grapple with how to process the tsunami-like inundation of government data and documents produced annually by federal employees, according to Baron.

The exponential increase in White House email over the past three administrations alone is indicative of the problem agencies are facing.

The Clinton administration, for instance, generated 35 million emails that needed to be archived with NARA. That number grew to 250 million for the Bush administration (included the emails thought to be lost, but which were later recovered). And it is expected the Obama administration will generate a billion emails, according to Baron (pictured in inset), speaking during an industry conference in Washington last week.

The problem in segregating official email, which must be kept permanently, from other emails such as an employee’s vacation request, is that employees – and people in general – are notoriously poor at cataloging content, said Baron. That’s compounded by the fact that “across government, the IT world has decided to save everything.”

“The idea that we could get people to care about complicated retention schedules, and drag and drop documents into folders is ridiculous,” said Baron.

The train wreck Baron warns about is the potential of petabytes of information being pushed into the cloud, which, if not properly segregated at the outset, would leave NARA with no choice but to reject it when the time comes to archive it, and force agencies to correct massive amounts of data.

“If we don’t get it right on email,” warned Baron, the problem will only grow more challenging harvesting social media streams and other documents. “For those agencies who’ve gotten off to the cloud, I don’t know any agency that is managing email acceptably” with regard to archiving policies, said Baron.

The good news, according to Baron and others, is the emergence of increasingly sophisticated e-discovery and predictive coding software that now makes it possible for documents to be evaluated for context and meaning, using big data analytics.

Earlier this year, in fact, for the first time in history, the use of such tools was deemed permissible in a court of law, when, in the case of Da Silva Moore v Publicis Group, New York Magistrate Judge Andrew Peck declared that computer-assisted review in eDiscovery is “acceptable in appropriate cases.”

One example of that technology is a tool produced by Recommind. The technology relies on the use of “probabilistic latent semantic analysis,” which uses a sophisticated algorithm that “pieces together from each document what the document is about, not just based on what words are in the document, but on other documents that its related to,” said Howard Sklar senior counsel at San Francisco-based Recommind.

“The benefit of PLSA,” said Sklar, speaking from his office in New York, “is it doesn’t rely on what words are in the document, so if you do a concept search on food, it will bring up a document that says ‘I love cheese.”

Originally developed for law firms to discover relevant information from thousands of documents used in litigation cases, the technology has since been adopted by government agencies – most notably, the Department of Energy – to help categorize vast volumes of data.

We see a lot more companies worried about information governance than ever before,” said Sklar, who noted that more and more companies are establishing chief data officer positions. The White House directive on record management, however, gave a noticeable push for the need for such data management tools, he said.

Agencies can expect to see a lot more on the subject of automated tools from NARA in the next few weeks, according to Baron.

NARA is expected to issue a new set of simplified document disposition policies that will include the use of automated tools. The new policy represents an effort to make it easier for agencies and employees to separate permanent documents from temporary ones.