[Published originally in the May 2002 edition of Computing Research News, p. 6, 15.]
Adventures in Computational Grids
By Pamela Walatka
Sometimes one supercomputer is not enough. Your local supercomputers may be busy, or not configured for your job; or you may not have any supercomputers. You might be trying to simulate worldwide weather changes, requiring more compute power than you could get from any one machine, or you might be collecting microbiological samples on an island, and need to examine them with a special microscope located on the other side of the continent. These are the times when you need a computational grid.
A computational grid is a network of geographically and often organizationally distributed resources including computers, instruments, and data. A user has single-sign-on access to all the resources on a grid. These resources may be managed by diverse organizations in widespread locations, and shared by researchers from many different institutions. Grid users can establish their identities by getting authentication certificates, obtain accounts on the grid's computational resources, and then use the resources-which are often scattered throughout a continent or beyond-from their desktops.
The idea of computational grids has taken shape over the past few years. The key concepts are described in a book edited by Ian Foster and Carl Kesselman, The Grid: Blueprint for a Future Computing Infrastructure (Morgan-Kaufmann, 1999). Foster and Kesselman initiated the Globus project to develop enabling software for grids: infrastructure, services, and application toolkits; Steve Tuecke, Lee Liming, and dozens of developers are responsible for continued improvement. (See www.globus.org for more information.) The ambitious Globus project provides free open-source software for a number of developing grids. While other organizations are developing alternative software, none are as widely used as the Globus toolkit.
From Dreamware to Testbeds
Computational grids have progressed from dreamware, through trials (and tribulations), to the existence of persistent testbeds with production capabilities. Numerous grid development and operation efforts are now underway. Universities without supercomputers have strung together clusters of workstations to provide their researchers with high-performance resources. Businesses have begun to explore the possibilities in grid computing; for example, Irving Wladawsky-Berger, Vice President, Technology and Strategy, IBM Server Group, calls grid computing the "key to advancing e?business into the future and the next step in the evolution of the Internet towards a true computing platform." (See IBM: Linux To Seed Next Generation Of Grid Computing. CRN, January 30, 2002, and http://www.globus.org/about/news/HPCwire/ibmGridsTransform12-7-01.html). Wladawsky-Berger predicts that grid computing, like supercomputing before it, will provide a vast infrastructure for e?business.(http://www-1.ibm.com/servers/events/grid.html) Great Britain is developing a grid, and Euroglobus is underway (see http://www.euroglobus.unile.it/). The Global Grid Forum (http://www.gridforum.org/) meets semiannually to consolidate standards for grids.
Eventually there may be one global grid. Currently the three most substantial grids, in terms of compute cycles already provided, persistence, and scope of resources, are:
The National Science Foundation PACI Program (http://www.interact.nsf.gov/cise/descriptions.nsf/pd/paci?OpenDocument) sponsors two grids: the National Computational Science Alliance (NCSA) in Urbana-Champaign, Illinois, and the National Partnership for Advanced Computational Infrastructure (NPACI), at the San Diego Supercomputer Center in San Diego, California. These grids provide interconnected high-performance computing systems and powerful instruments, and have pioneered the development, application, and testing of grid infrastructure. For example, Randy Butler's group at NCSA provides the Grid-in-a-Box (http://www.ncsa.uiuc.edu/TechFocus/Projects/NCSA/Grid-in-a-Box.html), which simplifies the grid-building process for Linux systems. Also at NCSA, Doru Marcusiu manages collaborative grid infrastructure development with NASA's IPG and the San Diego part of NPACI. In San Diego, Mary Thomas, Keith Thompson, Steve Mock, and many others have developed a Hot Page for interactive access to their grid.
NASA created the Information Power Grid (IPG) (http://www.ipg.nasa.gov/), which connects several NASA centers. IPG makes available supercomputers, high-end scientific instruments, and terabyte datasets. A web portal, Launchpad, (http://www.ipg.nasa.gov/launchpad/servlet/launchpad), provides IPG users the ability to submit jobs to "batch" compute engines located at NASA centers across the United States, execute commands on these resources, transfer files between two systems, obtain status on systems and jobs, and modify the user's environment
The IPG team has pioneered grid development in the areas of automated parameter studies, grid services, system status, data mining, Globus security, performance monitoring, benchmarks, documentation, system availability, and testing. Issac Lopez heads the IPG team at NASA Glenn Research Center; at NASA Ames Research Center, the IPG team is led by Tony Lisotta, with supervision from Bill Johnston, Arsi Vaziri, and Tom Hinke, and support form team leads Mary Hultquist, Warren Smith, and George Myers; plus staff members and collaborators.
The IPG and PACI teams frequently cooperate to develop new capabilities for grids, and both teams help the Globus team with new solutions to grid problems. Grid development and deployment are based on cooperation across great distances and between diverse organizational groups.
Real Work Example
Here is just one example of how grids foster collaboration among these diverse groups. Recently, a NASA research scientist, Tiffany Moisan, NASA Goddard/Wallops Flight Facility, Wallops Island, Virginia, collected microbiological samples in the tidewaters around Wallops Island, off the coast of Virginia. To see the samples at the level her research required, she needed to use a high-performance microscope located at the National Center for Microscopy and Imaging Research (NCMIR), University of California, San Diego (UCSD). She sent the samples to San Diego, then used NPACI's Telescience Grid (http://www.npaci.edu/envision/v16.2/telescience.html) and NASA's IPG to view and control the output of the microscope from her desk on Wallops Island. In addition to viewing the samples through the high performance microscope; she could actually move the platform holding the samples-located across the continent-and manipulate adjustments to the microscope, from Wallops Island. The microscope produces huge sets of image data; in this case the image data was stored using a Storage Resource Broker (SRB) on NASA's IPG, and Moisan was able to run algorithms on the data while watching the results in real time.
A new addition to the grid community, the Distributed Terascale Facility (DTF) Project, is being built by NSF's PACI. Research institutions NCSA, SDSC, Argonne, and Caltech will work in conjunction with IBM, Intel Corporation, and Qwest Communications, Myricom, Sun Microsystems, and Oracle Corporation. The DTF is expected to perform 11.6 trillion calculations per second and store more than 450 trillion bytes of data, with a comprehensive infrastructure called the "TeraGrid" to link computers, visualization systems, and data at four sites through a 40-billion-bits-per-second optical network.
The British Government, through the Office of Science and Technology and with the help of IBM, is building the National Grid for collaborative scientific research in a wide spectrum of disciplines.
The buzz at the February 2002 Global Grid Forum (www.gridforum.org) was the Open Grid Services Architecture (OGSA) being proposed by Ian Foster, Carl Kesselman, Jeffry Nick, and Steve Tuecke. (Their paper, "The Physiology of the Grid: An Open Grid Services Architecture for Distributed Systems Integration" is available at http://www.globus.org/research/papers.html#OGSA). An interesting aspect of the paper is that Foster, Kesselman and Tuecke, longtime spokesmen for Globus and grids, are joined here by Jeffry Nick from IBM. More information is available at zdnet.com.com/2100-1105-839265.html.
Difficulties and Opportunities
As grids grow, grid communities will be looking for scientists and computer engineers to help build grids or to use grids for research. The NASA Advanced Supercomputing (NAS) Division at NASA Ames Research Center is a focal point for the joint university and government creation of NASA's IPG. To discuss possible opportunities for internships or collaborative research, contact Arsi Vaziri (email@example.com), IPG Deputy Project Manager.
Grid development is not for the faint of heart. As Rob Fixmer says, in eWeek, January 11, 2002, "Grid architecture, hailed in many circles as the next great evolutionary step in computer technology, is a simple concept that becomes very complex in its implementation." Imagine the difficulty of getting hundreds of brilliant developers, each with their own idea of how things should be, to cooperate enough to make geographically dispersed resources-and organizationally diverse policies-all work together. Almost all of these developers work anonymously; the level of cooperation obscures any clear picture of who did what. This article has attempted to name a few of the grid pioneers, but for each person mentioned, there are 20 people making significant anonymous contributions.
The security issues alone could have prevented grids from ever coming to fruition. But brave and cooperative people have prevailed, and many grids actually work. Still, grid computing is difficult. Trying to get something-anything-to work on a grid can cause a person to swear and throw things at the wall. Building grids is hard work, as challenging and daunting as building the Transcontinental Railroad across the United States in the 1800s. An extraordinary level of courage and cooperation is required. Nevertheless, while grid software may be difficult to use, it is still easier to do a geographically distributed task with the software than it would be to do it without.
In a recent discussion of the perils on the path to fully functioning computational grids, an IPG staff member was asked whether the difficulties could be overcome. She replied, "We're going for it!"
Pamela P. Walatka, a technical writer under contract to NASA, is a member of the IPG team, and the author of the Globus Quick Start Guide.