COMMENTARY: Yesterday Todd Park, Federal CTO, used Twitter to answers questions about “big data”. Well sort of because while it reportedly generated 413 tweets, reaching an audience of 3.5 million, I counted only 131 actual questions, only 9 actual answers, and 7 retweets – so it really was a big data event with small results like so many these days.
The highlights of Todd Park’s responses, in my opinion, were:
- Librarians becoming the new data liberators – check out what NLM is doing
- Great places for health/data startups to go: Health Challenges, HDI Forum and Code Fests
- Key 2 do: make data liquid + accessible for beneficial use while rigorously protecting privacy. This is doable
Ironic, given the point of all of this health data activity by Todd Park and his predecessor, Aneesh Chopra, was to release lots of government data (big and small) to foster innovation investment and job growth.
I asked him a related question: “Can App Challenges (developers) really handle big data?”
I did not get an answer from him. But I know what it is. They can’t. An example is the recent Department of Commerce App Challenge.
I had to search through three levels of Web pages to find where to download the Geographic Names Information System (GNIS) data and then found it was “big data” (2,221,269 rows by 20 columns) requiring “a big data in memory tool” to visualize.
The Patent (USPTO) data is definitely not what is realistic for developers and small companies. The Patent Green Book documentation is a 50 MB – 667 page PDF file and the recent bibliographic data is for approximately 4,000 patent grants per week in approximately 5 MB per week zip files. Even Google is struggling with making this GB’s of weekly data useful.
The DoC App Challenge objective is this: “We’re challenging developers to look for innovative ways to utilize DOC and other publically available data to help businesses identify opportunities, grow, enhance productivity and create jobs.”
So I suggest Todd Park write a White House blog and talk at the upcoming HDI Forum about the answers to all the good questions he received on Twitter, and especially how the DoC App Challenge with big data is going to foster innovation investment and job growth by developers and small companies.