After my recent SafetyData.gov review, (“Long On Text, Short On Data Tables”), I was resolved to review the new Energy Data.gov Beta Web Site and check some of the claims presented at the White House’s Energy Datapalooza held earlier this month.

I decided the best place to start was the Energy Datapalooza fact sheet because it did not contain any links to actual energy data. Energy Data.gov says “data and insight are combined to facilitate public discussion and awareness of our Nation’s energy activities.”

So I tried to match the facts to the substance using a knowledge base found elsewhere. Here’s what the fact sheet contained – and what I found:

Administration Announcements

  • New Application Programming Interfaces (APIs): 4 – (My comment: I only found three in the Presidential Innovation Fellows Blog and they were not APIs – see below.)
  • New Data for Entrepreneurs and Innovators: 2 – (My comment: Energy.Data.Gov with more than 900 data sets – not really – see below, and 20 new datasets from DOE – actually only 19 that require closer inspection.)
  • New Events and Challenges: 2 – (My comment: At Apps for Energy there are 9 winners from 56 submissions with no new challenges.)
  • New Green Button Integration – (My question: Where do I find how to do this? See below for the work it requires.)
Private Sector Committments
  • Green Button Commitments: Future promises
  • New Consumer Data Protections: Future promises
The three “APIs”:

API for electricity generation, consumption and retail sales from the Energy Information Administration – (My comment: You download the data as Excel.)

API for the “Find and Compare Cars” data on the FuelEconomy.gov site – (My comment: You download the data as a ZIP of CSV.)

• API for biomass data from the “Billion Ton Report – (My comment: You register and wait for approval to access the data, but it is not an API. Two days later, I received access, downloaded and unzipped a file that gave me 1.1. GB of DAT files with no specific structure. Not what I want to do!)

These are not APIs, but the DoE Energy Information Agency has APIs in beta testing. So how is this post-Energy Datapalooza Blog statement true: “One of the most exciting announcements for the web technology community was the official unveiling of three new government APIs that provide rapid access to raw and frequently updated data.”

So on to the: “DOE announced that the number of datasets available in Energy.Data.Gov – a central discovery engine for federal government datasets, data visualization tools, mobile apps, and more – has doubled in less than three months. It now contains more than 900 federal datasets and technologies that support a growing open data ecosystem.”

As a data scientists I did the same thing I did to audit the new Safety.Data.gov data sets, namely rebuilt the Energy Data.gov catalog as linked data with faceted search so I could provide statistics and visualizations in a dashboard show elsewhere.

Some facts of interest for the “908 federal data sets and technologies are: Only 561 are CSV or XLS out of 908. Only 132 CSV and Excel are from DoE (DoE, DoE EIA, DoE NREL, and DoE OSTI). Many of the titles and descrption do not seem to relate to energy data sets and technolgoies at all! My favorite example was “U.S. Bell and Chile Pepper Statistics”.

Next I looked more closely at three real data sets to see if they were usable and useful as follows:

  • Federal Government Energy Use Trends: Total energy use and petroleum use have gone down during during 1975-2007 while use of other forms of energy (natural gas, coal, steam, and other) have remained nearly level.
  • Energy Generation: EPA’s Emissions and Generation Resource Integrated Database (eGRID) shows nearly 5500 power plants in the Unites States with their annual CO2 emissions generally increasing with plant capacity, except for one notable exception – the Grand Coulee Dam in Washington State with 6809 Megawatt Capacity (the largest by far) generates zero CO² emissions as do nuclear power plants.
  • Energy Consumption Green Button: This is the new initiative to help consumers see the details of their home energy consumption, but the problem is they have to be able to download and import an XML file into Excel, visualize it and decide what action to take. This is where app developers need to provide tools to simplify this and make the options to reduce energy consumption clear to the average citizen.
Finally, I examined the 183 Tweets and commented on some:

US CIO @StevenVDC: Have now doubled the number of #EnergyData sets on @USDataGov. “Over 900″ now. #opendata #egov

Yes, but not all are “energy data sets”

DepAdmin Bob Perciasepe @EPAgov has 40 datasets available on http://data.gov #opendata #energydata


Actually 92

EPA DepAdmin Bob Perciasepe We have a lot of data, but we can’t do this (exploit) ourselves. #energydata #opendata

Why not?

RT @digiphile: .@RiggsKubiak wants to build a “LinkedIn for buildings” at @HonestBuildings: http://bit.ly/GHaFaQ #opendata #energydata


Good idea!

#Lucid CEO #MichaelMurray describes his work to leverage building #EnergyData to create information dashboards.

Yes, I did that

This is huge. RT @aolgov: More about new energy open data portal at http://bit.ly/U10hmb #energydata #opendata #digitialgov #innovation


.@DataMarket to launch new #energydata vertical. New interview with CEO @hjalli: http://bit.ly/R78Y8D #strataconf #opendata

Good idea and need to take it a step further to deliver results like I did here!

So there are some usable and useful data sets if one goes looking and has the tools and experience to work with them.

However, this also reminds me of an earlier story on GSA’s Energy Usage Analysis System Illustrates Data.gov’s Limitations where I had difficulty finding the metadata and any useful results from this large data set featured at Data.gov

At the Health Datapalooza, Todd Park agreed with my comments on the need to audit new Data.gov community web sites to see if they delivered on their claims and could produce innovation.

So my fact checking of the new Energy Data.gov and the Energy Datapalooza found that they come up short on their claims and more work needs to be done to deliver on the facts and demonstrate the value of this activity to decision makers and citizens who are paying for this.