I have been attending the 2012 International Open Government Data Conference. The big announcement at the conference is that the number of Open Government data sets in the IODGS Catalog has surpassed 1 million. Specifically: 1,022,787 datasets from 192 catalogs in 24 languages representing 43 countries and international organizations.

Interestingly, I was not able to get the actual dataset to verify that and certainly quantity does not compensate for quality as the above statement says.

The interactive workshop workshop Introduction to Open Government Data gave participants the opportunity to learn more about how and why countries share open data and the tangible benefits to countries and citizens that come from open data.

The workshop also addressed the technologies, policies, and processes that make an open data effort successful. The interactive discussions touched on the following topics, which in many ways, reflect the state of the open government data community:

  • Understanding the Foundations of Open Data – Having some mandate or directive to do so
  • Making Data Open, Accessible, and Discoverable – Getting people to release their data
  • Creating an Open Data Architecture – Having a platform to access and discover data and build apps
  • Creating an Open Data Ecosystem – Dealing with change management (policies, culture, compliance)
  • Measuring the Benefits – Very difficult to do
  • Summary and Next Steps – Go out and build your own Data.gov
I had the honor of being invited to add to that perspective by presenting my take (slides) on the Department of Commerce App Challenge: Big Data Dashboards as part of the conference’s virtual session on best practices from around the world in putting data to work.

After hearing the first session I decided to provide the conference participants and readers some broader perspective on what Open Data and Data Science Analytics are.

Using the letters in OPEN DATA, I provided the following criteria to document the public benefits, and offer EPA Envirofacts Warehouse API as an example:

O: Not previously Open to the public. Lots of the “Open data” has already been available and is just being re-advertised such as EPA Envirofacts Warehouse APIs.

P: Serves a Purpose. There is a reason the data was collected that clearly serves a real purpose – e.g. Congressional redistricting. (EPA Envirofacts data, for instance, was Congressionally mandated for protection of human health and welfare.)

E: Educates citizens and politicians to take action with results that provide a valid basis for action. (EPA Envirofacts Web Site has over 2500 Web pages of actionable information.)

N: Made Newsworthy by journalists. Results are communicated objectively and effectively.

D: The plural of Datum. Something given or admitted especially as a basis for reasoning or inference. (EPA has data standards and quality assurance methods for these data.)

A: Actual numbers that a citizen, scientist, statistician, etc. can understand and work with.

T: Transparent. You can see where the data came from, how it was analyzed, where the results came from, etc., and where metadata is provided and combined with the new data APIs.

A: Answers questions posed by the above.

Then I created an example of IOGDS Data Science Analytics for the 2012 IOGDC that included:
  • IOGDC Conference Knowledge Bases
  • IOGDS Catalog Data Sets
  • IOGDS Data Analytics with BI Tools
The first is based on the Conference Agenda, Bios, Tutorial, and Virtual Conference content turned into data.

The second is based on the big announcement at the conference that the number of Open Government data sets in the IODGS Catalog has surpassed 1 million.

The last is based on joint work on Exploiting Linked Data with Business Intelligence Tools with Kingsley Idehen, CEO, OpenLink Software. Please note that the IOGDS Data Analytic graphics are images, not a live interactive dashboard like I have provided.

So next year’s conference theme will hopefully be the Emergence of Data Science to produce benefits from Open Government Data.

So even before the 2012 IOGDC ends this week, attendees and on-lookers have a tutorial on how to create open data products of value to citizens (Digital Agenda for Europe: Data a First Class Citizen), why better data beats more data, and why mature, industrial strength, business intelligence tools are needed for live interactive modeling and visualizations.