[Published originally in the May 2007 edition of Computing Research News, Vol. 19/No. 3]
Scientific Computing at the Forefront – Los Alamos National Laboratory
By Bill Feiereisen, Chief Technologist
Large laboratories like Los Alamos (LANL) provide the opportunity to apply
high performance computing (HPC) to science problems at a scale scarcely matched
elsewhere. But perhaps more importantly, they have the assignment to answer the
questions posed by “missions,” the major responsibilities that each lab is
charged to answer.
Designing and Maintaining the Nation’s Nuclear Weapons
Los Alamos National Laboratory is the birthplace of nuclear weapons. Their care and feeding has been the mission for 63 years. During this time it has grown to encompass the basic science of high energy density physics and the issue of nuclear non-proliferation. Los Alamos is one of the originators of HPC for modeling and simulation. Our computing history begins at the end of World War II with Monte-Carlo methods on the Eniac at Pennsylvania, and ranges unbroken to the present-day quest for Petaflops (1015 floating point operations per second) in parallel computing.
Los Alamos, Livermore and Sandia have the responsibility to certify the quality of these weapons each year and take care of problems. However, since 1992 there have been no nuclear tests, so the task has fallen heavily to computation. This drove the founding of the Advanced Strategic Computing (ASC) program that has propelled so many of the developments in high performance computing these last ten years. Three-dimensional, time-dependent simulations of the complex physics in a nuclear weapon have become reliable enough for engineering judgments.
Simulation of our Public Infrastructure – What Happens When Bad Stuff Happens?
Agent-based simulations are now widely used to model the response of people and their infrastructure to threats and disasters. Much of this work is concentrated within the National Infrastructure Simulation and Analysis Center (NISAC), which builds detailed models for most of the seventeen infrastructure sectors as defined by the Department of Homeland Security (DHS). As hurricanes Katrina and Rita bore down on the Gulf Coast in 2005, LANL was asked to make predictions of the effects on infrastructure sectors, including telecommunications, electric power, natural gas, and water infrastructure. This helped decision-makers rank order the importance of infrastructure assets such as telecom switches. Daily updates to DHS and the White House stressed our supercomputing resources for response time.
NISAC's epidemics modeling sector has also been used to “unleash virtual plagues” in real cities to see how social networks spread disease. This can help in the fight against epidemics.
Questions of Climate Change Must Be Simulated – We Can’t Experiment
Modeling of the Global Climate is more important than ever. With the attention paid to greenhouse gases and their interaction with the biosphere, modeling holds the promise of understanding and perhaps knowing what to do.
Los Alamos collaborates with sister labs in building the national climate model and contributes the ocean and sea ice components. The “eddy-resolving” ocean models provide, at 10km resolution, the most detailed simulations of the global ocean circulation yet.
Los Alamos develops the ocean ecosystem and trace gas components, while collaborators at Oak Ridge, Livermore, Pacific Northwest and the National Center for Atmospheric Research supply the atmospheric chemistry and land biogeochemistry. These next generation models incorporate the carbon cycle and specify actual emissions that will lead to prediction of the ability of land/ocean ecosystems to sequester carbon.
Computing Scopes Out Possible Vaccines for HIV
Los Alamos maintains the HIV sequence database for the country. HIV continually evolves away from possible vaccines. It is a moving target and easily keeps ahead of experimental attempts to develop vaccines. Bette Korber, Tammoy Bhattacharya and their colleagues exploit computational techniques to watch for common features in the HIV genome that are conserved from mutation to mutation and give experimentalists a head start in constructing a vaccine. The combinatorial possibilities are overwhelming, demanding the use of our supercomputers.
Defense Against Radiological Attacks - Smuggled Nuclear Weapons? Dirty Bombs?
Detecting the radiation signature of nuclear materials is a tough job. Trying to quickly distinguish a real danger from medical isotopes, or the other myriad uses of radioactive material, demands knowledge of radiation transport. LANL has some of the world’s experts in radiation transport who apply their knowledge to model what detectors should see. Real time response requires on-demand supercomputing.
Astrophysics and Cosmology – Where is that Dark Energy?
Supercomputing at immense scale has become one of the central tools of cosmology. The mass function of dark matter halos is an indicator for dark energy, the enigmatic concept posed to help explain the observed acceleration of the expansion of the universe. The mass function describes the probability of finding an object of a given mass per unit volume of the Universe and can only be determined accurately via numerical simulation. Mike Warren and his colleagues performed a series of 16 different billion-particle simulations, and produced the most accurate determination of the mass function to date. Overall, these simulations required over 4 x 1018 floating point operations (4 exaflop!).
Lattice QCD – Physics Beyond the Standard Model?
Quantum ChromoDynamics (QCD) is the reigning theory of quarks and gluons, the elementary particles that constitute nuclear matter. Standard methods for calculating masses and their decay fail, but some numerical simulations have already reached 5 to 10 percent accuracy. With the advent of petascale computing, we anticipate first principle results with 1 percent accuracy, providing hints of new physics beyond the standard model.
These Applications Drive Our Computer Science
There are many more exciting science missions at LANL. Modeling and simulation has driven much of our computer science investment. Although we have played a major role with the computer companies in defining hardware architectures, we have invested heavily in software research.
Science Appliance – Making the Sysadmin’s Life Easier
One of the challenges in managing modern parallel supercomputers has been their efficient system administration. Processor and parts count has increased greatly. Mean time between failures has decreased and the sysadmin’s task of administering thousands of process spaces has become daunting. Starting as a research project in 1999, Ron Minnich and his colleagues developed the “Science Appliance” suite that attacks these problems in several ways.
Combined with the open source LinuxBIOS it allows the booting of thousands of nodes in seconds rather than minutes or hours. Science Appliance has now become production software on some of our HPC clusters. LinuxBIOS has been chosen for the “One Laptop per Child” project and has now appeared on over a million machines throughout the world. 
OpenMPI, MPI-IO and Data Storage – Handling Those Vast Amounts of Data
Although there has been much work on parallel languages, MPI (the Message Passing Interface) is still the workhorse of most scientific parallel computing. Los Alamos developed an implementation of MPI (LAMPI) for two special needs in very large clusters—scalable performance and reliability through hardware failure. LAMPI merged with three other implementations of MPI into the open source OpenMPI  which is now widely distributed and has proven its performance. It recently powered Sandia National Laboratory’s Thunderbird cluster to number 6 in the top 500.
Big simulations mean big data, but that data is only useful if you can get it out of the supercomputer, store it and analyze it. With Sandia, LANL has been very active in funding enhancements to the parallel I/O standard, MPI-IO for the ASC program and leads the High End Computing Interagency Working Group (HECIWG) on File Systems, I/O, and Storage. This is the technical advisory group that coordinates the research and development investments of all participating high end computing agencies, including DOD, NSF, DOE, NASA, and others.
Performance Modeling – How Fast Do These Computers Really Go?
The Performance and Architecture Laboratory (PAL) develops end-to-end models of the entire computing system, from applications through system software to the hardware itself. Performance models developed by PAL for a wide workload and supercomputer spectrum are the tools of choice for performance analysis, system design, system and application optimization, and accurate performance prediction for current and future applications and systems. The practical impact of this work is significant, given the cost of developing application software and architectures at this scale. PAL continues to apply these techniques for LANL and plays a central role in performance modeling for much of the HPC community.
Scientific Visualization – Picture That Instead of Numbers
Visualization ranging in scale from the desktop to large three-dimensional caves has been absolutely necessary with the vast amounts of data generated by supercomputers. Over many years LANL has contributed much to the research topics in visualization, but has also built some of the most impressive facilities on Earth. The Data Visualization Corridor built for the ASC program is a complement to the supercomputing facilities, and has provided visual insights into complex calculations not available any other way.
Petaflops? – How Do We Get There?
Petaflops is an artificial computing-speed milestone along the road, but is an indication of how far we have come. The entire world of HPC is rushing towards real sustained petaflops speed, but there are immense hardware and software challenges. Commercially available processors are going through a sea change to become either “many-core” processors or hybrid processors. LANL is exploring hybrid computing and is working with IBM to build “Road Runner,” a petaflops-level supercomputer based on the Cell processor! This is a game chip, right? But look more closely and you’ll find that Cells have vector-like processor units and promise immense gains in computing speed.
Where Are We Going?
I’ve described only a small fraction of scientific computing at Los Alamos. Our heritage is based on simulation and HPC, but perhaps even more important has become the analysis of Big Data acquired from all sources, not just simulation. The explosion of sequenced genetic data, new high-energy experimental facilities, comprehensive astronomical surveys, and the vast unwashed mass of data on the web have led us to place new emphasis outside of traditional HPC. Much of this data is no longer homogeneous, is not represented by floating point numbers, and is distributed all over the world. You can imagine the challenges that we need to meet to adapt to these changes in computing, but our 63-year history in scientific computing gives me confidence that we are up to the task.
Bill Feiereisen is Chief Technologist at Los Alamos National Laboratory.
Copyright © 2007 Computing Research Association. All Rights Reserved. Questions? E-mail: email@example.com.