Make the rounds with government agency CTOs or at any public sector technology conference these days and two words are likely to emerge in the conversation very quickly: big data.
Whether the discussion is around Hadoop, MapReduce or other big data tools, the focus to date has been largely on data analytics and how to rapidly and efficiently analyze extremely large and unstructured datasets. big data analytics is important for government agencies seeking to gain better insights into their data and make more informed decisions based on this insight, but analytics represents the tip of the iceberg in making big data work.
How do you end up with all of that big data in the first place? Ingest-whether it’s video, documents, photos, machine-generated data, or something else-is the necessary starting point. Then, there is searching and indexing, archiving for future use, or maybe transcoding it into other formats and distributing it. Consider the National Oceanic and Atmospheric Administration (NOAA), which collects enormous amounts of weather data every day. Fast ingest and archive is essential. NOAA’s modeling processes discern weather patterns and generate accurate predictions. Then, it distributes this information to other agencies, airports, television and radio stations, and hundreds of other locations.
Peeling back the sub-layers of big data becomes even more relevant with increased funding and resources. The $200 million big data initiative unveiled by the White House in March 2012 is a substantive effort to build out better tools and techniques to harness the mounting volume of information agency decision makers confront on a daily basis. This pool of funds should not be narrowly distributed to, for example, data analytics, because each step in the big data process is vital. If any one of these steps breaks down, the whole system fails.
Dealing with Big Data requires capabilities not only in big analytics, but also in big bandwidth and big content–the big data “ABC’s.” Those are the building blocks of an agile data infrastructure. Without the bandwidth to ingest properly and quickly, the analysis, archiving, and distribution cannot happen on schedule. If the archive is inadequate for the demands, the organization will lose track of data. If analysis functions-like searching and indexing-are insufficient, the data will never become useful. Finally, if the content cannot be distributed to those who need it when they need it, the whole process is useless. It’s all about getting the right information to the right people at the right time to accelerate actionable decisions in mission critical environments.
As is the case with most emerging technologies, rhetoric often outpaces adoption. A recent survey of more than 150 federal IT professionals conducted by Meritalk on behalf of NetApp highlights enthusiasm within the federal government to leverage big data to support government mission outcomes, but finds that most agencies lack the storage, power and personnel to fully benefit from the efficiencies and improved decision making the technology can deliver. “The Big Data Gap” survey reveals that just 60 percent of IT professionals say their agency is analyzing the data it collects and a modest 40 percent are using data to make strategic decisions. All of this despite the fact that a whopping 96 percent of those surveyed expect their agency’s stored data to grow in the next two years by an average of 64%.
The survey results reinforce a simple reality for technology decision-makers in government agencies who are evaluating their big data challenges, including where to start and how to solve them: It is critical to develop an approach that extends beyond data analytics and weaves in the other key layers. Here are some of the steps agencies can take:
Identify the optimal starting point – It is essential to determine where a particular system is most likely to break down as data grows. If an organization can address the most problematic stage first, then the other problem areas can be addressed as issues emerge. This bottleneck may very well surround analytics, but it could just as well be bandwidth or content archiving or dissemination. The critical point is to know what you are trying to solve before diving in.
Recognize that archiving may be your biggest challenge – For many organizations, data storage is the biggest IT expense, and it is no wonder, given the huge explosion in data volumes. But here’s the bad news: it’s only getting worse. The world’s data volume is predicted to increase from today’s consumption levels by more than a factor of 50 over the next decade. In comparison, new networking technologies will only increase bandwidth by about ten times over the same time period. Every agency will need new strategies for acquiring, storing and analyzing data.
Take advantage of opportunities to consolidate – The National Science Foundation, for example, is encouraging researchers to look to their universities for cloud services. This, in turn, is driving universities to develop research consortiums to share the risk and cost. There are similar opportunities for efficiencies in deploying Big Data technology. The National Archives and Records Administration, as another example, could provide archival services for the entire federal government, rather than having each agency independently responsible for their own records.
Find your data scientists and listen to what they have to say – Data Scientists have been around since the beginning of the technology revolution. Performance and High performance computing has long been a leading indicator for methods and technology that eventually migrates into the enterprise. Fifteen years ago, Data Scientists were figuring out how to ingest more data and sift through it – now results are expected immediately.
These certainly aren’t small challenges, but they are the building blocks to a true agile data infrastructure that allows big data earn its potential across the government.
Mark Weber is President, U.S. Public Sector, NetApp.