During the recent Big Health Data Palooza Tweet Up, Todd Park, the nation’s new Federal chief technology officer, tweeted: “Librarians becoming the new data liberators – check out what the NLM is doing.”

So I did, to see if I could readily use their data that the National Library of Medicine makes available.

What I found though is a problem that continues to plague many agency sites and their offerings of data to the public, namely, an collection of Application Programming Interface (APIs) that make it harder than it should be to get to their data.


Specifically, what I first found on NLM’s site was a table with three columns by 21 rows linking me to lots of technical information for developers to get the data. I was expecting a Web interface to the actual data. While the API provides direct, high-level access to data contained in databases, the user still has to do some programming to do thinks such as combine multiple data sources into new applications known as mashups.

I did just that, by creating a dashboard the shows the work required to mashup the RxNorm and RxTerms APIs, for instance, and the documentation and actual data, so that a non-developer, like our readers, might use this information more readily.

Betsy Humphrey, Deputy Director of NLM, recently hosted a “Showcase of NLM APIs” to provide a high-level introduction to eight of NLM’s Application Programming Interfaces (APIs), where she said:

“Todd Park, our current Federal CTO, has been known to say that the NLM was into open data before it was cool and we are proud of the fact that for more than four decades we have actually been making information that we collect, organize, and curate available for use by system developers to develop additional value added products that extend the usability and value of what we do here at NLM. We encorage you to make use of these APIs and create innovative and wonderful products from them and we hope to hear from many of you that attempt to use them.”

But as described in the “Showcase of NLM APIs”: APIs are fairly old utilities with a very simple interface where you simply post a URL to our services and get back a response. NLM has about 600 million records, gets about 60 million requests per day for about 0.5 Terabytes of data per day. This is a “big data” operation, but for mostly programmers.

So after considerable effort, I concluded that NLM has interesting data, but it needs more work to package it for broader consumption by non-programmers.

As I reported previously, NLM’s Semantic Medline, which does not use an API, but delivers the actual data and visualizations of it, is considered their “killer app”, but is not well-known yet. I have had a great experience with it so far and work in progress will hopefully make it more well-known.