This headline above – from a Commerce Department apps challenge hosted on Challenge.gov – attracted my attention. So I decided to take the challenge to develop apps using the 2010 Census Summary File 1 and the American Community Survey (five-year data).
These initial APIs are part of a longer-term effort to open Census Bureau statistics and spur innovation in how they are used. Beta testers are told they can visit to learn about using the API and register for a key. Census expects to finalize beta testing in June and then officially launch the API.
But I soon discovered that as with many of these programs, things are never quite as simple as they seem.
I joined the Census API Developers Forum to submit questions, provide feedback, and share ideas to help the Census Bureau as it makes data available through APIs, but my request is still awaiting moderation.
I requested a new API key and activated it and got the message: Congratulations! Your key has been activated. You may now use it to query the data API. Happy querying!
I tried the beta APIs and found some problems with using them. I built my own “Age Finder” tool from the Census App Gallery. This app uses the 2010 Census Summary File 1 API to allow users to retrieve age data for defined and custom age ranges. I just scraped the data into a spreadsheet.
I looked for actual data under the Data tab and found it at 9 of the 11 sub-tabs. One of the most prominent was the Statistical Abstracts, that I have worked with extensively before. See Annual Statistical Abstract 2012 (Excel Spreadsheets) as an example of what Data Driven Documents should look like.
Unfortunately there was this note: The U.S. Census Bureau is terminating the collection of data for the Statistical Compendia program effective October 1, 2011. The Statistical Compendium program is comprised of the Statistical Abstract of the United States and its supplemental products — the State and Metropolitan Area Data Book and the County and City Data Book.
So one of the best sources of Census data for the citizens is no longer being collected.
The next easiest Census data to work with seemed to be the USA Quick Facts, but closer inspection showed that to not be true. I followed the detailed instructions to merge the data sets, but this was not straight forward and something that Census should expect neither the citizen nor the data scientist/data journalist to do.
This falls under my new mission in life to audit many of the new Open Government Data sites in preparation for my presentation at the upcoming 2012 International Open Government Data Conference, July 6-12, 2012.
So I put “my imagination to work and unleashed my creativity” and combined the new Census API, which itself was not useful (but the Age Finder data was), with the USA Quick Facts, which itself was again not useful. But when the multiple data sets were parsed and combined, they produced a much better application for both the citizen and the data scientist/journalist.
Even the spreadsheet will save other developers considerable time in pre-conditioning the raw data Census provides. The spreadsheets and dashboard show the results in multiple tabs labeled by data set name and app name.
The lesson on all of this is that making data available readily available doesn’t always mean it is readily usable. Agencies should to do a better job of vetting what they are presenting to the public to make sure the data they’re offering can be used without requiring extraordinary data science skills.
Now on to auditing the new EPA and FCC APIs to make them data services for developers and their data a first class citizen for citizens.