Big data science visualizations have evolved from the use of proprietary data (past), to difficult-to-obtain-and-use big data (present), to the hope that business, finance, media, and government big data will be more readily available and useable in the future.
That future appears a ways off, however, given my experience with several recent projects and judging from some of the presentations at the just-concluded O’Reilly Strata Conference.
The Strata Conference is billed as the home of data science, that brings together practitioners, researchers, IT leaders and entrepreneurs to discuss big data, Hadoop, analytics, visualization and data markets.
I was especially interested in learning more about public domain data sets and how far they’ve evolved in their ability to to be used.
I found this year’s conference had expanded dramatically with the total number of presentations having essentially doubled in the past year, reflecting the surge in interest in the conference theme “Putting Data to Work.” Data science, for instance had nearly doubled, to 26 sessions doubled over 15 sessions on data at the prior conference.
Here’s a quick snapshot for comparison in the number and type of presentations:
Third Conference (February 2012) – 141 total presentations:
- Data Science (26)
- Business & Industry (14)
- Visualization & Interface (13)
- Keynotes (16)
- Hadoop & Big Data: Applied (9)
- Domain Data (6)
- Deep Data (9)
- JumpStart (13)
- Hadoop & Big Data: Tech (11)
- Policy and Privacy (4)
- Sponsored Sessions (20).
Second Conference (September 2011) – 62 total presentations:
- Business (9)
- Data (15)
- In Practice (1)
- Interface (6)
- Keynote (15)
- Sponsored Sessions (11)
- Policy & Ethics (3)
- Real-time (2)
First Conference (February 2011) – 70 total presentations:
- Executive Summit (10)
- Business, Data, and Interfaces (1)
- Disruption & Opportunity (7)
- Interfaces (10)
- Practitioner (22)
- Real World (7)
- The Data Business (3)
- Other (10)
However the most recent conference was still mostly about methodology and technology and the promise of applications, except for the $3 million Heritage Health Data Prize.
The next conference will be held in connection with Hadoop World to reflect the considerable interest in Hadoop & Big Data and will focus on the big data issues that are shaping the business, finance, media, and government worlds.
Haddop includes the MapReduce algorithm, on which Google built its empire, and is explained in more detail for the “C-Suite” elsewhere.
Many experts believe Hadoop will subsume many of the data warehousing tasks presently done by traditional relational systems. This will be a huge shift in how IT apps are engineered.
For instance, another example of big data sets can be found in “Places & Spaces: Mapping Science Exhibit,” which started in 2005 and is currently in its 8th iteration this year with “Science Maps for Kids.”
I entered the 7th iteration (2011) “Science Maps as Visual Interfaces to Digital Libraries” last year because the subject interested me as a data scientist and digital library builder.
It is an impressive effort that has led to the book Atlas of Science, but I found that most of the maps were based on proprietary data so it is essentially impossible to get those “big data” sets to work in your own visualization tools.
This is certainly not what the Panton Principles for Open Data in Science and the White House Open Government Data Initiative are about.
Unfortunately those best data visualization projects were mostly of very big and difficult to obtain data as well.
So I’ve concluded that while big data science visualizations have evolved from use of proprietary data (past), to difficult to obtain and use big data (present), we still have quite a ways to go.
My recommendation, as I have discussed in an article elsewhere: Big data sets should “parsed and served up small.”