“The liberation of government datasets is important in itself, but data are truly powerful when used in the development of informative apps.” So proclaimed Todd Park, Brian Forde and Jo Strang in a recent White House Blog, Safety Data Jam connects Tech Innovators with Public Safety Officers.

That safety jam was part of a broader initiative to challenge developers featuring new Data.gov Safety “data sets” that gained fresh exposure this month at the White House’s Safety Datapalooza.

I took a deeper look to see what innovation with government data is possible.

One of the new Data.gov Safety special features is a link to a Federal Register article calling for new data on automotive safety. The other two are links to a SaferBus App that I could not readily find the actual data for and the Data.gov catalog query for Natural Disasters which appears to be a subset (152 items) of the Data/Tools section with 874 items.

But how many of these items are actual data sets (e.g. tables of data that can be readily used to build apps)?

To answer that questions, one must first wade through lots of text at Data.gov that I believe
needs to be actual tables. This is problem that ProQuest and Bernan Press solve with 100,000s of tables and for the forthcoming 2013 Statistical Abstract of the United States that I wrote about recently. So I built a spreadsheet that contains table versions of my SafetyData.gov Knowledge Base, the Top Ten Tweets, 874 Data.gov Safety data sets, and Data.gov’s 153 Natural Disaster data sets. The results are visualized in a dashboard elsewhere.

The most obvious omission given all the emphasis by Federal CTO Todd Park and Federal CIO Steven VanRoekel is there only one API given in the Data/Tools section for any of the 874 items. The next most obvious result is that there are many KML (Google’s proprietary map data format) files by state. The bottom line is that there are very few actual “data sets” in table format (CSV, XLS) and many that require data extraction, and more, to effectively build apps.

One benefit of my doing this data mining is it revealed an interesting and useful collection of data sets for analysis and visualization that citizens can understand and use, namely:
  • FEMA Public Assistance Funded Projects Detail – Open Government Initiative – This program provides supplemental federal disaster grant assistance for debris removal, emergency protective measures, and the repair or restoration of disaster-damaged, publicly owned and certain non-profit organizations. This dataset lists all public assistance recipients, designated as applicants in the data, and every funded project.
  • FEMA Hazard Mitigation Program Summary – Open Government Dataset – This program provides grants to states and local governments to implement long-term hazard mitigation measures after a major disaster declaration. Its purpose is to reduce the loss of life and property due to natural disasters and to enable mitigation measures to be implemented during the immediate recovery from a disaster. This dataset lists all HMGP Projects stored in the National Emergency Management Information System (NEMIS).
From these datasets, it was possible to set up summary results, visualized in a dashbaord elsewhere, including:
  • FEMA Disaster Declarations – Top 10 States with Disasters: Texas, Oklahoma, Louisiana, Florida, Missouri, Iowa, Illinois, Kentucky, Virginia, and North Carolina
  • FEMA Disaster Declarations by Number of Incident Types: Largest Number-Floods and Severe Storms
  • FEMA Hazard Mitigation Projects – Top Ten States by Total Project Costs: California, Texas, Louisiana, Arkansas, Florida, Iowa, Illinois, North Carolina, and New York (only 9)
  • FEMA Hazard Mitigation Projects – Total Project Costs by Incident Type: Highest Costs-Hurricanes and Severe Storms
  • FEMA Public Assistance Projects (Summary) – Top Ten State Total Federal Share Obligated: California, Texas, Oklahoma, Kansas, Louisiana, Arkansas, Florida, North Carolina, Iowa, and New York
  • FEMA Public Assistance Projects (Summary) – Part 1 Total Federal Share Obligated by Incident Type: Largest Number-Hurricanes
  • FEMA Public Assistance Projects (Summary) – Part 2 Total Federal Share Obligated by Incident Type: Largest Number-Severe Storms
So my conclusion for developers after auditing the data sets available on SafetyData.gov is this: One still has to wade through lots of text to find the data; and then there are not many data tables for building apps. The government will need to collect new data and mine other data sources to help the community of developers reach their full potential with government safety data.

I’ll be interested to see how well the data stacks up for a similar audit of energy and education Data.gov data sets expected to be showcased at upcoming datapalooza events next month.