Innovation

DOE Takes Next Step Towards Best Supercomputer

By Bridget Mintz Testa on August 01, 2012 at 1:00 PM

Computers can’t simulate the Earth’s ever-changing climate in real time, the interaction of the human heart with each of thousands of different drugs, or the tiniest details of a nuclear weapon’s detonation.

But that could soon change.

Federal officials have a strong desire for technology to address complex scientific and national security problems at levels that can’t be achieved by the fastest supercomputer in the world, the Sequoia machine built by IBM for the National Nuclear Security Administration.

To that end, the Department of Energy recently launched the two-year, $62 million FastForward project, which will likely be the first of several short-term projects with the same mission. That mission is to develop a computer that will operate 1000 times faster than Sequoia, which can perform 16 quadrillion calculations per second, termed petaflops (10¹⁵, or 1 followed by 15 zeros).

If FastForward succeeds, the resulting supercomputer will perform in exaflops, which means quintillions of calculations per second (quintillion=10¹⁸,or 1 followed by 18 zeros). This is one million times as fast as your laptop or desktop computer. Eventually, FastForward could speed up personal computers, too, but for now, that’s not its focus.

In mid-July, DOE’s National Nuclear Security Administration and Office of Science awarded several tens of millions of dollars in FastForward contracts for developing processors, next-generation memory chips, ultra-fast input/output technology and programming models. Work on these elements must start now to produce exascale computers in the desired 2020 timeframe.

Achieving exascale computing would be easy, as groundbreaking computing projects go, if power were free. “Exascale computing is 1000 times as fast as petascale computing,” said Michael Garland, senior manager of programming systems and application research at Nvidia, which won a $12 million FastForward contract for work on processors and programming models. “Yet the total amount of energy we can provide to exascale computers is only about five times what we can provide to a petascale computer. So energy is driving the architecture and thus the programming.”

DOE wants its exascale computer to consume no more than 20 megawatts when operating. Sequoia consumes about five megawatts.

Many challenges must be met to develop hardware for an exascale computer, given the desired energy consumption level. It may be, however, that designing and writing the programs for such a computer will be the most difficult task of all.

“The main problem is the incredible complexity of the computers,” said James Sexton, IBM’s research program director for the Computational Science Center at IBM Watson Research Center. “A laptop has two to four cores [a core is one central processing unit] with a single computing engine. It does one task sequentially. It’s easy to design.”

By comparison, Sequoia has 1.6 million cores. “You must figure out how to think through a problem so that all the cores can collaborate,” Sexton said. “The real challenge is how do I build an application to do what I want and take advantage of all 1.6 million cores.”

Each of Sequoia’s 1.6 million cores can manage four hardware “threads,” or processing of pieces of a problem, at once. Thus, the supercomputer is running more than six million threads simultaneously, i.e., in parallel.

“We need a half-billion to four billion hardware threads to achieve the level of [exascale] capability that we want,” said Bronis de Supinski, co-leader of the Advanced Simulation Computing program’s Application Development Environment and Performance Team at Lawrence Livermore National Laboratory’s Center for Applied Scientific Computing.

Not only do supercomputer cores process many millions or, in the case of a future exascale supercomputer, billions of pieces of a problem in parallel, but they also do it on different types of processors. “In the past, computers were homogeneous with one type of processor,” Garland said. “Modern computers are heterogeneous with graphical processors and central processing units. Most programming systems are not designed for this.”

Breaking up a problem into many parallel pieces may be hugely difficult, but that is only the start. Even if you are able to split a problem up so that all the computer’s cores can work on it in parallel, at some point, pieces of the problem must communicate with other pieces to continue calculating. This becomes an enormous coordination task.

If you designate a 10-km-square area of the Earth’s surface as a single unit, then “assign 100 units to each processor, how do the cores interact?” Sexton asks. If one processor must perform more calculations than another because some of its units are experiencing thunderstorms, “then it takes longer, and the other processors are sitting there. Part of the software problem is making it as efficient as possible.”

A situation where processors are waiting on others can “ripple through the machine,” said de Supinsky.

Another coordination problem involves the data that processors must call from memory to perform calculations. “All the data a desktop computer needs is present or is copied locally from a cloud,” Garland said. “In a supercomputer, the data set is huge. It is carved up and distributed through all the different corners of this machine. So getting the data for a processor is a problem.”

That’s because, “as systems grow, for physical reasons, the amount of memory and the interconnection bandwidth doesn’t grow as fast as the number of cores,” said de Supinsky. In other words, the bigger the system, the longer it takes for processors to get information from a memory chip.

A third supercomputer coordination issue is that as the machine addresses a problem such as ever-evolving climate change, it is “simulating different kinds of physics at the same time, such as fluid flow [air movement] and photons [from sunlight],” said de Supinski. “Both are dependent on the same data, but the calculations are totally different.”

Resilience is yet one more challenge for supercomputer software development. “As we add more and more devices, it becomes more likely that something will fail,” de Supinski said. “So the software must cope with more failures.”

FastForward’s primary goals include solving these software problems. Co-design, where hardware and software engineers work together in designing and building the computer, will help. Making it easier for software developers will help, too. “We need tools for programmers to solve these problems and make sure their solutions can be used elsewhere,” said Garland. “So we must create the tools so the programmers can do this.”

When the problems are solved, and exascale computers become a reality, “we can do much more sophisticated analysis of a complex problem,” Sexton said. “Instead of looking at one heart drug, we can test and validate 1000. We can simulate 1000 prototypes of combustion [engines] in a month vs. building them, which is what we must do now. This gets us the most optimized design. This is where we can actually do research. Exascale computing will get us to that.”

Topics: @MainPageFeature, Bronis de Supinski, Department of Energy, IBM, Lawrence Livermore National Laboratory, Michael Garland, National Nuclear Security Administration, Nvidia, SuperComputer

Breaking Government

Government news, analysis and commentary

DOE Takes Next Step Towards Best Supercomputer

Our Sites

DOE Takes Next Step Towards Best Supercomputer

Sign up and get Breaking Gov news in your inbox.