When Data.gov first launched, I thought it just was for tabular data sets. Then it expanded to include thousands of geospatial data sets. At the time, I thought it needed a geospatial data viewer so I created one that worked with both the tabular and geospatial data sets.

Then Data.gov’s caretakers must have decided that there was still not enough geospatial data at Data.gov so they announced recently that Geospatial One Stop (GOS), the national geospatial catalog, has migrated to Data.gov. So in essence, the government’s non-spatial and spatial data catalogs have been integrated. All that leaves is tools and people to work with all of that data.

Data.gov realized that and added Socrata and the Geospatial Platform. They have also added communities to work with all that data (Energy, Health, Law, Open Data, Semantic Web, and Restore the Gulf), but what I do not see is that those communities are actually working with the all these data.

So I looked around for how to do some match-making here and recalled that I participated in an excellent GeoData 2011 Workshop with lots of data scientists (like Professor Peter Fox, one of the co-chairs (pictured above), and other disciplines, that want to really work collaboratively to deliver results for the decision-makers and public.

So I say to Data.gov, if you build/aggregate it they may not come– unless you include them in the actual building so they feel part of it from the start and it is something they can actually use.

I learned that lesson again recently when I participated in the EarthCube Charrette and saw how collaboration (building trusted and value-driven relationships between data scientists) has to precede the building of things through the aggregation process like Data.gov has done.

The GeoData 2011 Workshop was all about three aspects to add real value and sustainability to working with both scientific and government data.

  • Data Life Cycle: The data life cycle is a term coined to represent the entire process of data management. It starts with concept study and data collection, but importantly has no end, as data is continually repurposed, creating new data products that may be processed, distributed, discovered, analyzed and archived (see Data Like Cycle). Fully supporting the different steps in the life cycle puts demands on metadata, standards, tools, and people.
  • Data Citation: Data citation is the process of uniquely identifying datasets in a manner that can be indexed and sustained over time.
  • Data Integration: Data integration occurs when the Data Discovery and Data Analysis steps of the data life cycle integrate data from multiple sources, for example to pull data from a variety of distributed, heterogeneous sources to address complex issues such as climate.
I have learned a lot about data science from Professor Peter Fox’s graduate class and admire the way he has tried to help multiple communities of practice get to data science and the data life cycle and think that Data.gov managers and our federal chief technology officer, Aneesh Chopra, who said recently “government needs data science and data scientists”, should seek his advice and leadership in this.