Putting The 2012 IOGDC Data To Work

on July 16, 2012 at 11:00 AM
Last week’s International Open Government Data Conference offered a lot of worthy ideas and insights. Among them was the Best of the Lightning Talks by Tariq Khokhar, World Bank Open Data Evangelist, and Jeanne Holm, Data.gov Evangelist, which summarized 29 different presentations that made virtually in the initial days of the conference.

I have been compiling and auditing the presentations and materials presented throughout the conference and found, however, some real deficiencies that suggest things presented are not as advertised. For instance:
  • Jim Hendler‘s catalogs do not link to the actual catalogs, but his own internal web pages so we are seeing “catalogs of catalogs.” Hendler (pictured at podium) has said his rationale is to use these intermediate Web pages for catalogs in case the actual catalogs moves or are reformatted. But it is really the actual data we want to preserve and work with, not the catalog.
  • The World Bank Catalog contains few APIs (only 25 of 93 items) and they link to a general Developer API page, not the specific APIs themselves.
  • The World Bank spreadsheets I have looked at are not all that well organized for use by the public. Even I, as a data scientist, have to reformat them to really use them in my dashboards.
  • The Open Kenya Data Catalog in Socrata is a specific country example where the catalog does not link directly to the actual data, much of the metadata is missing, and there are over 35,000 Health Facilities, Schools, and Administrative data rows each with non-standard geo-referencing information (not simple columns for Longitude and Latitude) for mapping.
One suggestion to the conference organization is to do more in the way of hands-on exercises or tutorials early on in the conference, or perhaps in advance of the conference. Attendees would have seen these shortcomings and learned from the originators how to deal with them and/or the originators would have seen what they need to improve for the users.
I’ve made an effort to show how the things I have compiled from the 2012 IOGDC can be turned into actual data in a spreadsheet, visualized in a dashboard and written up in a data story for the attendees and the public. I used a wiki to make the 2012 IOGDC content into data with well-defined Web addresses (over 200). I rebuilt the Work Bank Data Catalog into a simple spreadsheet and selected a few datasets to reformat and visualize. I rebuilt the Open Kenya Data Catalog and selected a few data dets to reformat and visualize. The results can be summarized as follows:
  • One can search 246 countries for 1257 indicators for 52 years – this is big data!
  • One can search five related tables of metadata associated with the indicator data set
One can select Kenya and individual indicators that then provide a broader context for the Open Kenya Data sets selected as follows:
  • Poverty Rate, by District
  • Kenya Primary Schools, 2007
  • Health Facilities
  • Census Volume 1 Question 1 Population, Households and Density by Sublocations – 2009
  • County Expenditures by Administration 2002-3 to 2008-9
This way we have connected the global data (World Bank) with local data (Open Kenya Data).
The Open Kenya data results show the following:
  • There are 516 types (data sets, maps, etc.) of information that are classified in 518 categories and topics
  • The five featured data sets are difficult to map because the geocoding is not straightforward Longitude and Latitude (as noted above)
  • Lamu County has the highest per capita expenditures in 2009-2010 ($420 dollars per citizen)
  • The Rifty Valley Provice has the largest number of Health Facilities (1645).
Tweeting at the conference: Peter Speyer @PeterSpeyer said: We are talking a lot about making data available, but need to talk more about putting data to work.
I hear you and have put the 2012 IOGDC data to work right now with some effort to inventory, clean up, reformat, and integrate the data to tell a story.