The White House’s recently launched “Future First” initiative marks a milestone in the federal government’s effort to invigorate the implementation of new technologies. As Federal CIO Steven VanRoekel begins to roll out new initiatives like “Shared Services First,” agencies should ask themselves “What technology will help us better manage systems amidst the current data explosion?”
The answer lies in the ability to handle large volumes of machine-generated data, also known as big data. Agencies need to automate how they manage large volumes of machine data because the growth of data is outpacing human capacity to monitor and understand its relevance.
Machine data is the fastest growing, most complex and most valuable component of big data. This data comes from computers, applications, sensors, mobile devices or anything that is running within an IT infrastructure. The ability to capture and analyze data from multiple sources is the core part of the big data challenge. This is a daunting task for many agencies made even more complex by recent budget cuts.
A “Big Data First” initiative should revolutionize the way government operates. A successful “Big Data First” policy would require agencies implement IT architectures to have access to all IT data in real time. This would push agencies to leverage cutting edge technologies to reduce costs and deal with the recent data explosion effectively for improved security and operations. However, before they make the transition, agencies to understand how to get the most value out of a seemingly endless pool of diverse data.
Making Sense of Machine Generated Data
According to recent reports from International Data Corporation (IDC), big data will earn its place as the next “must have” competency in 2012 as the volume of digital content grows to 2.7 zettabytes (ZB), up 48% from 2011. Emerging big data technologies are taking the private sector by storm, and government needs to make sure it’s not left behind. While a number of solution providers are gunning for a piece of the big data pie, agencies need to carefully select an offering that helps them best confront their unique big data challenges head on.
Some emerging open source big data technologies underscore the rapidly growing awareness and interest in solving the big data challenge. However, many of these open source technologies solve a portion of the problem and increase data management complexity.
The key to handling big data cost effectively is to deploy proven solutions that are architected for that need. One example is MapReduce technology. which our firm offers, but there are others.
These solutions map machine generated data from multiple data sources and leverage real time search technology across information silos thereby providing agencies with greater situational awareness. Moreover, leveraging a scalable infrastructure automates big data management and reduces long-term costs for data centers.
For example, if an analyst needs to investigate a rogue IP address he or she needs to retrace the transaction and analyze key transaction flows across many locations, sites and applications. Traditional data management solutions require pre-defined schemas to gain access to each relevant data source, site or user. However, the days of predefining schemas and receiving answers to previously determined questions are the ways of the past and do not address big data complexities. Agencies need an architecture that can analyze terabytes of data in real time by utilizing an IT search engine that instantly scans any type of machine data within in a single dashboard.
Continuous Monitoring, Compliance and Big Data
The most significant big data challenges agencies face include understanding the dynamic nature of cyber threats and meeting compliance goals. In order to effectively address security concerns, agencies are looking to continuous monitoring as the best line of defense.
However, complying with continuous monitoring standards like the Federal Information Security Management Act (FISMA) can prove challenging for many agencies because their internal operations models support information silos that separate IT operations from security and compliance functions.
These disparate environments make it nearly impossible to analyze all machine data with a single scalable solution. Big data technologies that leverage technologies like MapReduce architecture not only reduce complexity by providing real time visibility without reliance on relational databases and schemas, but they also reduce cost.
Independent organizations such as the Center for Regulatory Effectiveness (CRE) are taking notice of the benefits of big data technology. The CRE’s report “FISMA Focus at the Center for Regulatory Effectiveness” calls for agencies to adopt a data-driven approach to cybersecurity so that federal IT managers can identify known and unknown cyber threats. This is a step in the right direction, but agencies can do more when it comes to continuous monitoring and compliance.
A “Big Data First” policy would set guidelines and best practices for agencies. The Administration needs to endorse this technological revolution of eliminating unnecessary information silos and recommend “best of breed” solutions to agencies. A “Big Data First” policy would modernize agency data centers and allow them to scale effectively to the increasing volumes of machine generated data.
Tony Ayaz is vice president at Splunk Federal.