Brand Niemann

 

Posts by Brand Niemann

CTC recently released a cool analysis of the Osama bin Laden letters by Recorded Future. Basically, they took the raw documents and ran them through a unique instance of Recorded Future so that they could use all of their visualization tools against the 175 pages of letters that were released. Keep reading →


At a recent big data conference I realized that volume and value really matter if you are going to work with big data effectively. Keep reading →

Dr. John P. Holdren, Assistant to the President for Science and Technology and Director of the White House Office of Science and Technology Policy, and Minister Shri Vilasrao Deshmukh from the India Ministry of Science, Technology and Earth Sciences, led the second U.S.-India Joint Commission Meeting on Science and Technology Cooperation on Monday in Washington, D.C.

“I am happy to report that the bilateral relations have increased measurably in the areas of maritime, agriculture and bio diversity, basic and applied sciences, advance telecommunications, energy and commercialisation of new technologies,” Holdren (pictured above at the World Science Festival last month) said at the State Department’s George Marshall Center.

Deshmukh added: “We look forward to the recommendations of the group on basic and applied sciences, health and medical sciences; and atmospheric sciences.”

The principal accomplishments announced were:

  • Establishment of the Monsoon Desk at the National Oceanic and Atmospheric Administration (NOAA)
  • Presentation by the Endowment Board of certificates to three grant winners from the first round, Sorin Grama and Sam White, Co-founders of Promethean Power Systems (US) and Rustom Irani, Manageing Director, Icelings (India)
  • Announcement by Chris Vein, White House Deputy Chief Technology Officer, of the Third Country Open Government Platform Partnerships (OGPL), and brief demonstration by Marion Royal (US GSA) and Samir Mitra (India PM Advisor’s Office)
The Open Government Platform (OGPL) is a bilateral effort to promote transparency and greater citizen engagement by making government data, documents, tools and processes publicly available.

The idea is that making this information available on an open-source platform in machine-readable format will allow developers, analysts, the media, and academics an opportunity to develop new applications and insights, which will ultimately give citizens more information to facilitate better decisions.

I have expressed reservations about this in a previous story (Data.gov Goes To India – But It Still Needs More Work) because in essence: technology and data people speak a different language, to me it is not about what one does to the data (technology), but what one does with the data (science, statistics, and visualizations).

Here is the situation: Data.gov was not built on open source software, but needed to be if anyone else was going to use it – especially poor third world countries that cannot afford their own developers or commercial software. But Data.gov needed a first rate team of developers that could convert old, complex software code into simplier, new simplier to use open source code. Enter the Government of India’s National Informatics Centre that produced an open source version of Data.gov that was made available on the third anniversary of Data.gov (May 2012). The open source product, called the Open Government Platform (OGPL), can be downloaded and evaluated by any national Government or state or local entity as a path toward making their data open and transparent. Today Samir Mitra (India PM Advisor’s Office) announced that Riwanda will be the first third world country to use the OGPL.

Now this OGPL is based on Drupal, an open source platform, already used by Data.gov.uk (see my Data.gov.uk – What’s Not to Like), and many others, including the new HealthData.gov launched by US Federal CTO Todd Park at his Health Datapalooza last week.

So where does this all leave us? We have Todd Park, the federal CTO, already using Drupal for his new HealthData.gov and announcing a series of developer challenges over the next year to build it out. We have Chris Vein, the Deputy Federal CTO, announcing that India has developed an open source version of Data.gov based on Drupal that Data.gov will upgrade to and Riwanda will use. So now we are converging on a platform that does the first of three things that we need: Data Catalog, Actual Data, and Data Results.

To illustrate my point, I took the challenge that Todd Park gave me at last week’s Health Datapalooza and made the new HealthData.gov do all three things in one portal where one sees the Data Catalog, the Actual Data, and the Data Results. This implements the 7 challenges that Todd Park announced to further develop HealthData.gov over the next year and my recommendations at the recent Data.gov Developer Community meeting. It is also an example of Building a Digital Government by Example.

So, I say enough with putting old wine (Data.gov) in new bottles (Open Source Drupal), and on to the real needs of citizens everywhere, namely to go from Data Catalog, to Actual Data, to Data Results, so they can use it to make informed decisions.

Some of the “Rock Stars” of Health Innovation at the “Health Datapalooza”, more formally known as the Health Data Initiative Forum III, have been in town this week for the June 5-6 event at the Washington, DC, Convention Center.

Special Guest Jon Bon Jovi, world-famous musician, appeared in connection with the Project REACH Mobile App Challenge. And a number of U.S. Government’s leading proponents of innovation were on the billing, not all of whom would have thought of themselves as rock stars before this event, including Todd Park, U.S. chief technology officer; Kathleen Sebelius, secretary of the Department of Health and Human Services; W. Scott Gould, deputy secretary of the Department of Veterans Affairs; as well as Mitch Kapor, Partner, Kapor Capitol and Bill Frist, physician and former U.S. Senate majority leader.

Gould in particular was there to talk about Project REACH – a Real-time Electronic Access for Caregivers and the Homeless (REACH). Simply put, the goal of Project REACH is to provide a free, broadly accessible app that produces real or near real-time information on where someone can find a bed, a place to eat, or seek medical services.

Keep reading →

During the recent Big Health Data Palooza Tweet Up, Todd Park, the nation’s new Federal chief technology officer, tweeted: “Librarians becoming the new data liberators – check out what the NLM is doing.”

So I did, to see if I could readily use their data that the National Library of Medicine makes available.

What I found though is a problem that continues to plague many agency sites and their offerings of data to the public, namely, an collection of Application Programming Interface (APIs) that make it harder than it should be to get to their data.


Specifically, what I first found on NLM’s site was a table with three columns by 21 rows linking me to lots of technical information for developers to get the data. I was expecting a Web interface to the actual data. While the API provides direct, high-level access to data contained in databases, the user still has to do some programming to do thinks such as combine multiple data sources into new applications known as mashups.

I did just that, by creating a dashboard the shows the work required to mashup the RxNorm and RxTerms APIs, for instance, and the documentation and actual data, so that a non-developer, like our readers, might use this information more readily.

Betsy Humphrey, Deputy Director of NLM, recently hosted a “Showcase of NLM APIs” to provide a high-level introduction to eight of NLM’s Application Programming Interfaces (APIs), where she said:

“Todd Park, our current Federal CTO, has been known to say that the NLM was into open data before it was cool and we are proud of the fact that for more than four decades we have actually been making information that we collect, organize, and curate available for use by system developers to develop additional value added products that extend the usability and value of what we do here at NLM. We encorage you to make use of these APIs and create innovative and wonderful products from them and we hope to hear from many of you that attempt to use them.”

But as described in the “Showcase of NLM APIs”: APIs are fairly old utilities with a very simple interface where you simply post a URL to our services and get back a response. NLM has about 600 million records, gets about 60 million requests per day for about 0.5 Terabytes of data per day. This is a “big data” operation, but for mostly programmers.

So after considerable effort, I concluded that NLM has interesting data, but it needs more work to package it for broader consumption by non-programmers.

As I reported previously, NLM’s Semantic Medline, which does not use an API, but delivers the actual data and visualizations of it, is considered their “killer app”, but is not well-known yet. I have had a great experience with it so far and work in progress will hopefully make it more well-known. Keep reading →

Recognizing the importance of small businesses to the government IT community, ACT-IAC sponsored the 6th annual Small Business Conference called ConnectSB: Accelerate and Achive earlier this week. The event focused on the unique needs and benefits of small businesses and also tried to promote the value of small businesses to the government and large corporations seeking small business partners.


A frequent theme of government conferences like these is innovation, which has come to mean doing more (work) with less (federal employees).

But I decided what this conference should actually be called is “doing more (work) with more (talent).”

The idea for the suggestion came during the federal agency workshops part of the conference where attendees got to hear from three of nine leading agencies. I elected to listen to the Department of Health and Human Services, Department of Homeland Security – ICE , and Department of Veterans Affairs. During those sessions individuals from the CIO office, the program office, the acquisition office and the small business office, provided a panel that presented their experience and then answered questions.

Keep reading →

COMMENTARY: Yesterday Todd Park, Federal CTO, used Twitter to answers questions about “big data”. Well sort of because while it reportedly generated 413 tweets, reaching an audience of 3.5 million, I counted only 131 actual questions, only 9 actual answers, and 7 retweets – so it really was a big data event with small results like so many these days.


The highlights of Todd Park’s responses, in my opinion, were:

  • Librarians becoming the new data liberators – check out what NLM is doing
  • Great places for health/data startups to go: Health Challenges, HDI Forum and Code Fests
  • Key 2 do: make data liquid + accessible for beneficial use while rigorously protecting privacy. This is doable
To me, the most penetrating question he received was this: “How can small companies get ready to harness big data? It seems to be a big boys playground.”

Ironic, given the point of all of this health data activity by Todd Park and his predecessor, Aneesh Chopra, was to release lots of government data (big and small) to foster innovation investment and job growth.

Keep reading →

I just recently attended a meeting about something that many people are beginning to hear about but which most people do not understand — Ontology for Big Systems.

Even the words describing the outcome of meeting are misleading: Summit and Communique. A summit is usually a global meeting of world leaders and a communique is usually a short statement for the public and press. This year’s communique is 11 pages long, which has grown from previous years: 2006 – 1, 2007 – 4, 2008 – 8.4, 2009 – 8.4, 2010 – 8.6, and 2011 – 8 pages.


Ontology has two definitions — one from philosophy and another computer science. I won’t even bother you with their definitions because the Intelligence Community prefers to use the word Knowledge Base instead to describe a collection with a very large number of documents that can be analyzed and searched “for more needles in bigger haystack.” Keep reading →

First Todd Park, former Department of Health and Human Services chief technology officer, bet on health data in a big way; got his upcoming Health Data Palloza, and then became our new Federal CTO.

Then Gus Hunt, CIA CTO, bet on big data for the Intelligence Community and got its budget increased by Congress, reflecting a governmental shift in IT priorities, from a Defense Department style network-centric focus toward the IC’s big data-centric focus.

Now the Defense Department is in the big data game with their big bet to the tune of $250 million announced Thursday at the White House Office of Science and Technology Policy’s Big Data Research and Development Initiative.

The assistant secretary of Defense, in a letter released yesterday, said “We intend to change the game and plan to be the fist to leverage big data across the full scope of military operations in new and unconventional ways.”

There are five other agencies who were present at the AAAS Auditorium event which are contributing much smaller (or non-disclosed amounts) as follows:

  • National Science Foundation: $10 million, plus several smaller grants
  • DARPA: $25 million annually for four years
  • National Institutes of Health: No money, but the world’s largest set of data on human genetic variation freely available
  • Department of Energy: $25 million
  • USGS: New grants for unspecified amounts
But where does this new initiative leave us?

I think it leaves us with a disconnected federal big data program between the science and intelligence communities with the former considerably behind the latter.

The report, “Designing a Digital Future: Federally Funded Research and Development in Networking and Information Technology,” prepared by the President’s Council of Advisors on Science and Technology (PCAST), said: “Every federal agency needs to have a “big data” strategy.

I did not hear that today either from every agency or across all the agencies. The recent 2012 Big Data Government Forum provided a much more comprehensive view of best practices around Big Data technology, trends, and issues from senior government executives, data scientists, and vendors.

As Professor Jim Hendler, RPI Computer Scientist, commented during the meeting: “Computer scientists like us have to move to the social science side of things to really do big data.”

This new White House Initiative needs Todd Park’s entrepreneurial spirit, Gus Hunt’s experience, and DoD’s new money, spent in a coordinated way with the IC and civilian agencies to make big data across the federal government a reality.


Two of the leaders of the Federal Cloud Computing Initiative, Dawn Leaf, NIST Senior Executive for Cloud Computing, and David McClure, GSA Associate Administrator of the Office of Citizen Services and Innovative Technologies, said building trust between providers and suppliers of cloud computing is a top priority.

Speaking at the Quarterly Meeting of the Cloud Standards Customer Council Summit this week, they addressed members from consumer companies around the world who recognize that a common interoperable platform for the cloud is essential to meet corporate needs today and tomorrow. Keep reading →

Page 3 of 91234567...9