data.gov

GSA has now launched the Digital Services Innovation Center, a key piece of the White House’s new digital government strategy released in late May. The strategy was designed to ensure federal agencies use emerging technologies to serve the American people as effectively as possible through improved web services and mobile applications.

Over the next 10 months, the center is charged with meeting a number of specific digital strategy milestones to deliver digital services and government information anywhere, anytime and on any device.
_____________________________________________________
This article originally appeared as a blog on GSA.gov. For more news and insights on innovations at work in government, please sign up for the AOL Gov newsletter. For the quickest updates, like us on Facebook.
_____________________________________________________
The Center will engage agencies across government by serving as a virtual hub to accelerate innovative digital services. Initial efforts are underway establishing shared solutions and training to support infrastructure and content needs across the federal government, and identifying and providing performance and customer service satisfaction measurement tools to improve service delivery. Keep reading →

Last week’s International Open Government Data Conference offered a lot of worthy ideas and insights. Among them was the Best of the Lightning Talks by Tariq Khokhar, World Bank Open Data Evangelist, and Jeanne Holm, Data.gov Evangelist, which summarized 29 different presentations that made virtually in the initial days of the conference.

I have been compiling and auditing the presentations and materials presented throughout the conference and found, however, some real deficiencies that suggest things presented are not as advertised. For instance: Keep reading →

World Bank Group President Dr. Jim Yong Kim today challenged governments and institutions to support a growing international movement to harness data in an effort to foster greater economic prosperity.

Speaking at an international conference on government data, in one of his public appearances since arriving at the World Bank July 2 in Washington, Kim pointed to the bank’s own efforts to make its vast catalog of data available to the public and a growing community of data harvesting groups as a way addressing some of the world’s pressing economic issues. Keep reading →

Dr. John P. Holdren, Assistant to the President for Science and Technology and Director of the White House Office of Science and Technology Policy, and Minister Shri Vilasrao Deshmukh from the India Ministry of Science, Technology and Earth Sciences, led the second U.S.-India Joint Commission Meeting on Science and Technology Cooperation on Monday in Washington, D.C.

“I am happy to report that the bilateral relations have increased measurably in the areas of maritime, agriculture and bio diversity, basic and applied sciences, advance telecommunications, energy and commercialisation of new technologies,” Holdren (pictured above at the World Science Festival last month) said at the State Department’s George Marshall Center.

Deshmukh added: “We look forward to the recommendations of the group on basic and applied sciences, health and medical sciences; and atmospheric sciences.”

The principal accomplishments announced were:

  • Establishment of the Monsoon Desk at the National Oceanic and Atmospheric Administration (NOAA)
  • Presentation by the Endowment Board of certificates to three grant winners from the first round, Sorin Grama and Sam White, Co-founders of Promethean Power Systems (US) and Rustom Irani, Manageing Director, Icelings (India)
  • Announcement by Chris Vein, White House Deputy Chief Technology Officer, of the Third Country Open Government Platform Partnerships (OGPL), and brief demonstration by Marion Royal (US GSA) and Samir Mitra (India PM Advisor’s Office)
The Open Government Platform (OGPL) is a bilateral effort to promote transparency and greater citizen engagement by making government data, documents, tools and processes publicly available.

The idea is that making this information available on an open-source platform in machine-readable format will allow developers, analysts, the media, and academics an opportunity to develop new applications and insights, which will ultimately give citizens more information to facilitate better decisions.

I have expressed reservations about this in a previous story (Data.gov Goes To India – But It Still Needs More Work) because in essence: technology and data people speak a different language, to me it is not about what one does to the data (technology), but what one does with the data (science, statistics, and visualizations).

Here is the situation: Data.gov was not built on open source software, but needed to be if anyone else was going to use it – especially poor third world countries that cannot afford their own developers or commercial software. But Data.gov needed a first rate team of developers that could convert old, complex software code into simplier, new simplier to use open source code. Enter the Government of India’s National Informatics Centre that produced an open source version of Data.gov that was made available on the third anniversary of Data.gov (May 2012). The open source product, called the Open Government Platform (OGPL), can be downloaded and evaluated by any national Government or state or local entity as a path toward making their data open and transparent. Today Samir Mitra (India PM Advisor’s Office) announced that Riwanda will be the first third world country to use the OGPL.

Now this OGPL is based on Drupal, an open source platform, already used by Data.gov.uk (see my Data.gov.uk – What’s Not to Like), and many others, including the new HealthData.gov launched by US Federal CTO Todd Park at his Health Datapalooza last week.

So where does this all leave us? We have Todd Park, the federal CTO, already using Drupal for his new HealthData.gov and announcing a series of developer challenges over the next year to build it out. We have Chris Vein, the Deputy Federal CTO, announcing that India has developed an open source version of Data.gov based on Drupal that Data.gov will upgrade to and Riwanda will use. So now we are converging on a platform that does the first of three things that we need: Data Catalog, Actual Data, and Data Results.

To illustrate my point, I took the challenge that Todd Park gave me at last week’s Health Datapalooza and made the new HealthData.gov do all three things in one portal where one sees the Data Catalog, the Actual Data, and the Data Results. This implements the 7 challenges that Todd Park announced to further develop HealthData.gov over the next year and my recommendations at the recent Data.gov Developer Community meeting. It is also an example of Building a Digital Government by Example.

So, I say enough with putting old wine (Data.gov) in new bottles (Open Source Drupal), and on to the real needs of citizens everywhere, namely to go from Data Catalog, to Actual Data, to Data Results, so they can use it to make informed decisions.


The latest data table at Data.gov featuring an interactive snapshot of the government’s progress in consolidating data centers is nice to look at (“eye candy” as some might say). But there are two big problems with it:

  • First, it is not real data that can be copied directly into a spreadsheet and reused (try highlighting it and copying to a spreadsheet – it fails); and
  • Second, when you do download the spreadsheet from the Socrata interface it has to be reformatted to map the data because the “Data Center Location” column is not formatted properly. Among other issues, the latitude and longitude data need to be in separate columns and without text).

Perhaps more importantly, the table still does not deliver a result that the public and decision makers can use without some additional work.

I have done a good deal of that work for two previous stories with details elsewhere. Over that time, the number of data centers listed in the table has grown:

  • 6/18/2011: “2010-2011” – 137 data centers (first story)
  • 7/21/2011: “2010-2012” – 373 data centers (second story)
  • 1/12/2012: “2010-2012” – 525 data centers (current data set)
I reviewed the current data table and it shows:

  • 525 rows in the table
  • 158 without locations all together
  • 33 without longitude and latitude

In addition it shows:

  • 149 data centers closed between initiative Kickoff 2/26/2010 and Report 11/15/2011
  • 310 to be closed between 1/1/2012 and 12/31/2012
  • 66 to be closed between 11/15/2011 and 12/31/2011

It appears that additional information about data centers continues to be released for the same or different years, but the data continues to suffer from the lack of two important features: missing locations and no cost savings data.

There is a real disconnect between this table and a statement in the recent GSA Office of Citizen Service and Innovation Technologies 2011 Annual Report, which claims:

“Data Center Consolidation savings by the end of 2015 are expected to be $3 billion, based on analysis of information provided in October, which shows that agencies plan to close 472 data centers by the end of next year (do they mean 2012 or 2013?).”

Note that 472 is yet another number different from 525 in the most recent data set.

And it would be nice to see a column of data for the cost saving by data center so citizens can see the individual closures and savings in their own locations.

So I say this is progress in accountability to taxpayers and transparency in reporting, but still not giving us real data that can be readily used to support decisions and understanding by me as a data scientist and by our readers I am working for.

I recently was involved in a discussion debating the successes of Open Government.


Some of the individuals in the discussion felt the success of the Open Government initiative was the creation of Data.gov, but I disagreed saying that it was only a data catalog and even the featured data sets are difficult to use and understand.

We really need data apps and data stories from those that the public and decision-makers could use to justify funding and claim success.

The 25 most popular apps at Data.gov were mostly XML feeds and only 5 were Excel that I could easily make into data apps. Keep reading →


This week at the SemTech Biz DC Conference, Jim Hendler, advisor to Data.gov, explained the history of the “friendly competition” between the US data.gov and data.gov.uk and said that the latter had about 6000 data sets that were in better shape than the former. So I decided to take another look and was very impressed.

Hendler also said that the UK Government has designed and made great use of standard Web address practices in their linked data and moved even further ahead of the US in open data with creation of the Open Data Institute. Keep reading →

When Data.gov first launched, I thought it just was for tabular data sets. Then it expanded to include thousands of geospatial data sets. At the time, I thought it needed a geospatial data viewer so I created one that worked with both the tabular and geospatial data sets. Keep reading →

Someone suggested I review the new IBM Center for The Business of Government report on Use of Dashboards in Government by Sukumar Ganapati, Florida International University, pointing out one irony off the bat: There aren’t a lot of examples of dashboard illustrations in this report. So I first decided to create a dashboard of this PDF report in my social knowledgebase and use it to analyze the report, and reference all of my dashboard work relating to most of the examples in this report.

The report lists the following 11 dashboards (with links to my 7 recreated dashboards added): Keep reading →


I was recently asked to present my Linked Open Data work to the Data.gov Semantic Web and Linked Open Data Team.

One of the examples I presented was work being done by The New York Times and its efforts to catalog headings and topics. It represents a best practice example of what government agencies could and should do and I wanted to share that with our readers to help you understand the value of doing this with high-quality data sets.

For the last 150 years, The New York Times has maintained one of the most authoritative news vocabularies ever developed. In 2009, they began to publish this vocabulary using a methodology known as linked open data (illustrated above). The New York Times also uses approximately 30,000 tags to power their Times Topics Pages.

It is their intention to publish all of these tags as linked open data. Linked open data enables all of us to use the NY Times data and other data. In the illustration above, each circle represents a source of linked data and the other sources of data it is linked (related) to.

I have published both NY Times data sets as linked open data in Spotfire, a software tool that captures data in convenient ways, so readers can more readily browse, search, and download these invaluable data sets! This Spotfire chart is published to the cloud as are the documentation of this story in the MindTouch Technical Communication Suite.


Please give me your feedback on this data chart and suggestions for future data charts and stories! bniemann@cox.net

Page 2 of 212