Government agencies are flooded with a tidal wave of data. But a number of healthcare agencies are facing particularly challenging obstacles to achieving their missions in a digital and data interoperable world. This is particularly true for regulatory healthcare agencies such as the Food and Drug Administration (FDA).

As outlined in the agency’s “FDA Science and Mission at Risk” report dating back to 2007, the FDA anticipated many of these challenges. The report detailed new data sources coming from new digital sciences including the use of molecular data for medicine (e.g., genomics, proteomics, and pan-omics), wireless healthcare, nanotechnology, medical imaging, telemedicine platforms, electronic health records and more.

In addition, the FDA faces mounting pressure to react and make decisions under increasingly tight time constraints. This includes near real-time response and interactions with other agencies on a global scale to address food safety, imported products and pesticide issues that may present serious health risks.

At the same time, consumer groups and advocates continue to press for rapid new drug reviews to accelerate their access to medical innovations. The public also demands risks associated with food, drugs and devices to approach “zero.” In response, the agency has wisely undergone a massive reorganization and infrastructure upgrade to prepare for its expanded regulatory responsibilities.

However, FDA scientists are still faced with an overwhelming amount of health data from a multitude of sources. The data are often presented in disparate, non-standardized formats. According to Graham Hughes, healthcare data reached 150 exabytes in 2011. An exabyte is 1 billion gigabytes. For context, 5 exabytes of data would contain all words ever spoken by human beings.

The huge quantity and the different sources of unstructured health data (e.g., imaging, structured data bases, free text and even old paper files) makes it difficult for cross-study analyses and application reviews, which in turn impacts the speed of response to health crises, slows down the decision-making process and the speed of regulatory science research.

Ultimately, this results in a decrease in the speed and clarity of communications to patients and consumers.

The FDA is stepping into the Big Data area with two new projects: Janus and Sentinel. Janus is focused on collecting pre-market assessment data from drugs and medical devices while Sentinel will focus on post-market surveillance of these products. The smart deployment of these big data tools will help FDA meet its expanding obligations to making quicker decisions and track products for their safety as patients use them.

To make these big Data tools work, there is a critical need to link together relevant data from multiple sources including adverse events, concomitant medication information, patients’ prior medical history, lab data and demographic data.

In addition, as agencies utilize Big Data to further their missions, it is important to consider and dedicate resources to systems and data architecture.

A solid architecture should include analytical tools to interpret the data as well as ensure petabytes and larger amounts of data are transferred securely and stored accurately. System architects can steadily upgrade the high performance computing hardware to deliver results faster.

The FDA realizes it is at a critical juncture in its history. They face huge amounts of unstructured data, pressure for quick and flawless decisions, and the need to support new digital sciences. This is a challenge for the FDA and the utilization of innovative Big Data best practices will be a great step forward in improving public health and safety.

Roger Foster is a senior director at George Mason University. He has worked big data problems for scientific computing in fields ranging from large astrophysical data sets to health information technology. He can be reached at [email protected].