Managing any large project is a challenge. Imagine managing a project involving 15 different groups, spread across multiple university labs across the country, dealing with massive amounts of information.

This is the challenge facing the National Human Genome Research Institute, a part of the National Institutes of Health, which launched the third iteration of the ENCyclopedia Of DNA Elements (ENCODE) project early in October.

ENCODE’s goal is to expand the catalog of “functional” elements in the human genome. Those elements control how genes are expressed. For the fields of health and medicine, the catalog will ultimately help professionals better understand the genes, functional elements and “noncoding” information in our DNA.

Orchestrating the work through so many groups to accomplish that goal is a project in itself.

“The [seven] data production groups have multiple sub-groups, so coordinating them is even more complex than it appears,” said Elise Feingold She and Mike Pazin both scientist-administrators, are ENCODE’s program directors.

The previous ENCODE round recently published “one major integrated paper and 29 other affiliated papers from 442 consortium members at 32 institutions,” Feingold said. “The research groups had previously published more than 150 papers. This hints at the magnitude of the project.”

Besides the seven data production groups, which perform the fundamental research, six groups will develop new computational and statistical methods and software to analyze the researchers’ findings.

Another group will coordinate the research data and help disseminate it. Still another group will work with the researchers to perform integrated analyses of the data. The previous ENCODE round produced approximately 30 terabytes of genomic data. This new round is expected to generate at least double that amount.

To manage this consortium, “we use a number of different tools,” Feingold said. NIH research awards “are all cooperative agreements, so there is a large involvement by the program staff.” Because many of the people involved in this latest incarnation of ENCODE also participated in previous efforts, “most people know and trust each other,” she said.

Even with all the virtual communication, a few real meetings occur each year. “There is no substitute for face-to-face meetings, but once you know and trust each other, you can go to telecons,” Feingold said. “We have many, many telecons.”

A Wiki internal to the project “has 741 pages, with different sections at different technological levels and on different topics, including coordination and management, data releases and standards, data coordination and other resources,” Feingold said.

ENCODE also makes use of a Facebook page and a Twitter feed.. “We are trying to tap into social media to appeal to younger people,” Feingold said. “We do as much as we can to share things virtually.” Researchers in the greater scientific community can access the genomic data at ENCODEProject.org.

“We follow the progress of the individual groups and set milestones,” said Pazin. “We have quarterly reports. We look for bottlenecks and how to work through them. Sometimes they are due to legitimate scientific reasons and sometimes they happen because people are so busy.”

When problems occur, Pazin and Feingold step in to help, but they don’t dictate solutions. “It’s much more successful when it comes from the group,” Feingold said.

But perhaps the larger question is, why perform this research through a 15-group, multi-lab consortium instead of through independent labs?

The answer speaks volumes about the way organizations are trying to advance in the age of big data.

“For researchers, the ENCODE project is an incredible data set which no one lab could have generated in a reasonable fashion,” said Dr. Kelly A. Frazer, Chief of Genome Information Sciences at the School of Medicine at the University of California-San Diego.

“Without this coordinated activity, it would have taken the average lab an incredibly long time to generate the data.”

Scientists can start their research with the ENCODE data instead of having to generate it themselves. Thus, this data’s availability “can cut a year off [research] time in many cases, and in other cases, it can cut even more time,” Frazer said.

“This … incredibly rich data set … will allow us to better understand the role human genetic variation plays in susceptibility to various diseases.”

When problems occur, Pazin and Feingold step in to help, but they don’t dictate solutions. “It’s much more successful when it comes from the group,” Feingold said.