Gus Hunt

The fact that the Department of Defense got its budget cut and the Intelligence Community got its budget increased in the White House’s 2013 budget request of Congress is indicative of more than the need to roll back a decade of military growth. It’s also indicative of a shift in IT focus–and a reflection that DoD’s network-centric focus is being overtaken by the IC’s big data-centric focus.

There are probably many reasons for such a shift. One is the world’s population. The U.S. Census Bureau estimates the world population passed 7 billion mark this past weekend. The rapidly growing number of people who will eventually have smartphones with multiple sensors (your iPhone has them now for GPS position, etc.) promises a future where there will be massive streams of real-time data that the IC will want to mine, looking for lone-wolf terrorists (who are relatively but easy to stop) who I have written about previously.

For companies like Google and Facebook, big data is big business, and for other companies big data is becoming their business as they mine their large swaths of data to improve their services and develop new business activities. The IC may not come out and say it, but it has to love the fact that Facebook will soon have 1/7th of the world’s population using it’s platform to share what’s going on. Or that Google is almost everyone’s favorite search engine because they can keep track of what people are posting and searching for much easier than many in government can.

The IC also has to love big data, and the rapid evolution of systems used to ingest and process it, because it helps push the technology wave, as Gus Hunt, CIA chief technology officer (pictured above), described it at the recent Government Big Data Forum.

Hunt said that in every aspect of their workflow at the CIA, from sensors to finished intelligence, massive, multiple, real-time sensor data streams cause bottlenecks on current networks that swamp current storage devices and overwhelm current query, analytics, and visualization tools, that are needed to produce finished intelligence.

So he wants his cake and to eat it too: He wants real-time analytics and visualizations that he says a few start-ups are trying to achieve. He also wants the Federal Cloud Computing Initiative to add two more services to Platform-, Software-, and Infrastructure-as-a-Service, namely, Data-as-a-Service and Security-as-a-Service.

Part of the solution is emerging from Google’s MapReduce, which is a parallel data processing framework that has been commercialized as Apache Hadoop (developed by Doug Cutting who named it after his son’s toy elephant) by Cloudera so one can store and compute big data at the same time.

Amr Awadallah, founder and CTO of Cloudera, calls Apache Hadoop a data operating system in contrast to Windows and Linux, which are essentially file operating systems (they store and manage all the files you create and are needed for your software applications). He points out that Apache Hadoop provides the three essential things: velocity, scalability, and economics, that are needed to handle big data.

So the IC, Gus Hunt, Amr Awadalla, and others at the Government Big Data Forum are leading the next technology wave and gave us a glimpse of both the technology infrastructure and the business organization with chief data officers and data scientists that will be needed to implement and succeed with big data.

More details about what was said can be found at CTOVision and at my wiki document, Data Science Visualizations Past Present and Future.

It is clear to me that the CIA needs big data, like Zettabytes (10 to the 21st power bytes), and the ability to find and connect the “terrorist dots” in it. As of 2009, the entire Internet was estimated to contain close to 500 exabytes which is a half zettabyte.

Recently I have listened to three senior CIA officials — two former and one current — talk about this and the need for data science and data scientists to make sense of it.

Gen. Michael Hayden, former director of the CIA and National Security Agency, and Principle Deputy Director of National Intelligence, and Bob Flores, former chief technology officer at the CIA, spoke about this at the MarkLogic Government Summit; and Gus Hunt, current CTO at CIA, spoke about this at the Amazon Web Services Summit that I wrote about recently.

General Hayden framed the problem as follows: Cold War Era — easy to find the enemy, but hard to stop them (e.g. Soviet tanks in Eastern Germany); versus the Global War on Terrorism — hard to find the terrorist, but easy to stop once their found (e.g. the underwear bomber on the airplane). He said we live in an era where it is not a failure to share data, but with processing the shear volume and variety of data with velocity that is the result of sharing.

He shared his experience meeting with former Egyptian President Mubarak before the recent Arab awakening due to social media that resulted in his overthrow and then meeting with the President of Twitter, Jack Dorsey, whom he asked: How does it feel to overthrow a government–something the CIA, when Hayden was director, was never able to do?

Hayden also said we need tools to predict the future from social media and data scientists to use them.

I told him about my work with Recorded Future that was also the subject of an Breaking Gov story.

Bob Flores, former CIA CTO, said that Recorded Future was a new, fantastic technology and that the old model of collect, winnow, and disseminate fails spectacularly in the big data world we live in now. He used the recent movie “Moneyball” as an example of how the new field of baseball analytics called Sabermetrics has shown there is no more rigorous test (of a business plan) than empirical evidence.

He said that in this time of budget cuts and downsizing the creme will rise to the top (those people and organizations can solve real problems with data) and survive. And Flores agrees with Gen. Hayden that while all budgets are on a downslope (including for defense, intelligence, and cyber), that cyber is on the least down slope of all the rest because it is realized that limiting the analysis of big data would be equivalent to disarmament in the Cold War era.