Big data can mean a lot of things to different federal agencies. To the Department of Energy, big data not only means managing an information sharing network to promote big science, but also making the results of that research available to the public.
This information can be blended together in a variety of ways, depending on the end users’ needs, explained Robert Bectel, CTO and senior policy advisor at the DOE’s Office of Energy Efficiency and Renewable Energy (EERE). Speaking at a recent federal IT event, he explained that as one of the department’s technology evangelists, his goal is to make sure that taxpayers get the most out of their money by allowing federal workers to do the most on the job.
“I want to mash that big data and put it into the hands of people when they want it, where they want it,” he said at the Lowering the Cost of Government with IT Summit last week in Washington D.C.
Part of Bectel’s job is to visit DOE laboratories, learns about the technologies being developed there and then promotes them.
“The taxpayers paid for it-they don’t even know it was made. Let’s bring it to those people. It’s big data,” he said.
One of the ultimate goals of big data is to be able to combine and share information on the fly. The Pacific Northwest National Laboratory is working on a system called the Precision Information Environment, which will allow users such as first responders to share and blend data in real time. In a video produced by the laboratory demonstrating a variety of information sharing technologies, Bectel noted that all of the technology presented currently existed, even if only in prototype laboratory form.
The DOE is also involved in a variety of big data efforts focused on public access. The EERE has a site called OpenEI.org. It was built as a Wikimedia community to acquire, curate and consume data, Bectel said. He described the site as an “electromagnetic rail gun” that allows him to collect data and shoot it out to other organization’s web sites. The goal of his site is to provide collected data sets and other content to make other organizations’ web sites better, he said.
Another site is BioEnergyKDF.net, which is run by the Oak Ridge National Laboratory and contains 4,000 sets of bio energy-related data. The goal of this site is consume and mash up massive amounts of information and to perform analysis that enables users to do their jobs better. Like OpenEI, the goal of the BioEnergyKDF site is to enable collaboration, Bectel said.
The DOE is also looking into ways to use social media more efficiently. One effort involves tracking reactions to social media. Bectel noted that the DOE produces volumes of social media commentary and messages, but they are essentially fired into the Internet without any means of tracking them.
The EERE is about to deploy a system that will map in real time any social public feed that is being used by the DOE. The capability is known as sentiment analysis and will cover any campaign or effort by the department and can be tracked by any type of search query. The user can define search criteria, apply smart tags and come up with nearly real time measurements of any queried subject matter. For example, it will allow department users to look up a query, put it in a widget available within the DOE firewall and perform real time analysis for the department’s marketing and communications staff. The goal of this capability is to allow government to do its job “better, faster, cheaper,” by being able to track and react to public sentiment in near real time, he said.
The DOE is developing a visual patent search tool. Bectel noted that when he became CTO, he inherited a technology transfer enterprise that had 27 different technology transfer web sites across the DOE enterprise. “That’s a nightmare,” he said. Out of this jumble, he found one site using a very powerful back end search engine.
One problem was that keyword search capability for the DOE’s patent database was inadequate to locate many technologies through a general search, Bectel said. Working with the goal of making patents “sexy”, his staff collaborated with the Pacific National Northwest Laboratory to build a search taxonomy that can seek documents out by their inventors, laboratories or the ratio of their intellectual property to all the other domains that have been created. This process applies to 16,000 patents and was coded for natural language processing.
The next step is to put software into this search taxonomy. Bectel noted that federal agencies create software tools every day, but few commercial groups know where to get this software or to license and use it. His goal is to make this accessible to citizens.
After software, Bectel plans to load projects into the database and make them searchable through natural language taxonomy. This would allow visual search capabilities on this data. Once data, software, projects and patent are available, this information will be combined information to create an energy data oncology.
“I want to make it possible for Harriet Homeowner to superscientist Nobel lauriates to come in and find what they’re looking for,” he said.