The latest data table at Data.gov featuring an interactive snapshot of the government’s progress in consolidating data centers is nice to look at (“eye candy” as some might say). But there are two big problems with it:

  • First, it is not real data that can be copied directly into a spreadsheet and reused (try highlighting it and copying to a spreadsheet – it fails); and
  • Second, when you do download the spreadsheet from the Socrata interface it has to be reformatted to map the data because the “Data Center Location” column is not formatted properly. Among other issues, the latitude and longitude data need to be in separate columns and without text).

Perhaps more importantly, the table still does not deliver a result that the public and decision makers can use without some additional work.

I have done a good deal of that work for two previous stories with details elsewhere. Over that time, the number of data centers listed in the table has grown:

  • 6/18/2011: “2010-2011” – 137 data centers (first story)
  • 7/21/2011: “2010-2012” – 373 data centers (second story)
  • 1/12/2012: “2010-2012” – 525 data centers (current data set)
I reviewed the current data table and it shows:

  • 525 rows in the table
  • 158 without locations all together
  • 33 without longitude and latitude

In addition it shows:

  • 149 data centers closed between initiative Kickoff 2/26/2010 and Report 11/15/2011
  • 310 to be closed between 1/1/2012 and 12/31/2012
  • 66 to be closed between 11/15/2011 and 12/31/2011

It appears that additional information about data centers continues to be released for the same or different years, but the data continues to suffer from the lack of two important features: missing locations and no cost savings data.

There is a real disconnect between this table and a statement in the recent GSA Office of Citizen Service and Innovation Technologies 2011 Annual Report, which claims:

“Data Center Consolidation savings by the end of 2015 are expected to be $3 billion, based on analysis of information provided in October, which shows that agencies plan to close 472 data centers by the end of next year (do they mean 2012 or 2013?).”
Note that 472 is yet another number different from 525 in the most recent data set.

And it would be nice to see a column of data for the cost saving by data center so citizens can see the individual closures and savings in their own locations.

So I say this is progress in accountability to taxpayers and transparency in reporting, but still not giving us real data that can be readily used to support decisions and understanding by me as a data scientist and by our readers I am working for.