Version 6: July 11, 1999
Research challenges abound in the field of software systems - fundamental challenges that will change the world if they are successfully met. The opportunities truly are unprecedented: as dramatic as the progress in information technology has been in the past few decades, there is every reason to believe that the best is yet to come.This document represents a community effort to describe some of the long-range research challenges in software systems, in the spirit of the PITAC report. The inspiration for this document, and the majority of the specific ideas, come from Jim Gray's 1999 Turing Award lecture and paper (http://research.microsoft.com/~Gray). David Notkin provided special assistance with challenges related to the engineering of software. Drafts have been circulated to the members of the NRC Computer Science and Telecommunications Board, the members of PITAC, the members of the NSF CISE Advisory Committee, and others, many of whom made contributions. Ed Lazowska is the editor; send comments to lazowska@cs.washington.edu.
Software Challenges related to large-scale systems
Systems are called upon to scale for many reasons. Our aspirations grow - we envision applications that require much greater processing capability (climate modeling, biology), or much greater distribution and interaction (e-commerce, multiple "agents" interacting with one another to serve a single human "master"). Processing capabilities continue to grow exponentially, and we seize this opportunity to gain functionality by shifting complexity into software - the systems in our lives are increasingly composed of fast, general purpose hardware platforms that gain their "personality" through complex software.
Scalability involves not only increasingly complex national-scale and international-scale systems, but increasingly complex systems in the home, in the automobile, and in other human-scale environments. Twenty years ago, a careful count would have revealed 100 electric motors in the typical home. Today, it would reveal 100 microprocessors. Within a few years, there will be 1000 microprocessors in the typical home, and they will be interconnected and interacting with one another.
True scalability requires automatic system management, extreme fault tolerance, high performance, large-scale distribution, automatic parallelism, and the "hot" (online) maintenance of hardware and software, among other things. We are making progress, but there's still a huge amount of work to be done.
To an ever-increasing extent, the day-to-day activities of ordinary people depend on software. Appliances such as televisions and telephones are trouble-free: they are easy to use, they mostly work, and if they don't, you can quickly obtain a replacement that behaves just like the original. Computers and systems of computers aren't like this. They need to be. The manager/owner would establish goals, policies, and budgets. The system would do the rest - by exploiting networked resources, it would be self-diagnosing, self-healing, and self-evolving. This is an issue that affects all systems, from personal digital appliances to the largest scale systems.
The President's Committee on Critical Infrastructure Protection recently concluded that 60% of the nation's "critical infrastructure" has information technology at its heart: the air traffic control grid, the electric power grid, the telecommunications infrastructure, even significant elements of our water supply and ground-based transportation systems.
This software is fragile: not only is it vulnerable to attack, it is vulnerable to collapse due to its inherent complexity. Technologies to build dependable systems - systems that are secure (they only serve authorized users; service cannot be denied by unauthorized users; information cannot be stolen), safe, reliable, and available - are inadequate. We must solve this problem both at the single-system level, and at the distributed system level.
Most traditional consumer devices - from televisions to automobiles - are "natural" to use. We have been much less successful at creating "usable interfaces" for digital devices - whether for personal computers or for VCR's. For digital systems to truly achieve their potential as "human enablers," breakthroughs are required. Otherwise, our interactions with these systems will continue to be frustrating, and they will fall short of their potential. In truth, we should be able to do far better with digital systems than with "unintelligent" consumer devices - interfaces that are "natural" to different users, that evolve as the user grows more sophisticated, that tailor themselves to the user's habits.
It is painfully clear that software innovations are not uniformly available in our culture. As one study put it, the underclass is "losing ground, bit by bit." Economics is not entirely to blame. Sometimes we build software whose design fails to be useful across cultural contexts. Every design has someone in mind. As we design systems that are more embedded in the fabric of life, we need design approaches that are more sensitive to life's diversity. We need to pioneer software design techniques that give voice to that diversity.
The world is increasingly connected - not just physically and electronically, but in terms of our interdependencies. Individuals want, and need, to extend their presence. Without traveling, observers must be able to experience a situation as thoroughly as if we were actually present, and participants must be able to interact with the remote environment, as well.
First, we must understand what is needed in order to make telepresence a good substitute for "being there." Do we require all of the senses, at extreme resolutions, or are compromises (perhaps task-dependent) possible? Then we must achieve this goal. Human control must not be a requirement - robotic sensing and monitoring of activity is essential.
Recently, computer systems have advanced to the point where they read for the blind, hear for the deaf, and type for the impaired. But we are still a long way from computers that see, hear, and speak as well as a fully-able person. And the potential is even greater than this: computer systems that offer "advanced prostheses" for all - that truly augment our memory, our vision, our communication.
As an ever-increasing amount of data about our lives goes online, the systems that provide this data must be engineering to respect our privacy - protect us from being imitated, exposed, profiled, or otherwise violated.
Similarly, we must be able to tailor the computer systems that we use to respect our personal values regarding content, security, and other attributes. This involves interfaces and technologies that allow us to express and implement our values.
Individuals experience far more than they can recall, and computer systems (and manual filing systems) store far more than they can effectively retrieve. As a result, enormous volumes of data are effectively "write-only." This situation - call it a problem or an opportunity - will only increase with time.
We must invent the "Memex" that Vannevar Bush foresaw - a computer system that can remember all that a person hears or sees, and quickly return any item upon request.
The personal Memex (above) remembers what an individual experiences, and retrieves without understanding or digesting. Each of us, though, is inundated every day with far more information than we can possibly absorb in its raw form. Further, the storehouse of the world's information is largely inaccessible to us.
The most precious resource is human attention. To respect this resource, we must invent computer systems that, given a corpus of text (or of sounds, or of images), can search, analyze, summarize, and answer questions about it as quickly and precisely as a human expert in the field - and with an equivalent sensitivity to the difficulty of posing accurate queries in uncertain territory. (Guessing accurately what the inquirer meant to ask is one of the greatest skills of a good reference librarian or teacher. "If you mean X, then the answer is Y. But perhaps you meant Z?")
The "digital librarian" described above is a challenging extension of today's computer systems. A true "digital library" is a challenging extension of today's library, or at least of the traditional role of the library. The library embodies an enduring intellectual construct: the provision, curation, and archiving of knowledge and information. Today's "digital libraries" are poor shadows of their analog ancestors in nearly every intellectual respect. There is much to be done.
We have made remarkable progress towards the goal of "intelligence amplification": computers have helped with the proofs of several significant theorems, have defeated the world chess champion, and are used in the conceptualization, simulation, manufacturing, testing, and evaluation of products of every kind.
We are still a long way, however, from "machine intelligence" - from computer systems that form new concepts, that learn and adapt, that are capable of rational decision-making, that "reason" in ways that seem "human" to us most of the time - computer systems to which we can delegate tasks that require significant independence. Software should be sophisticated enough that we cannot easily distinguish whether we are being assisted by a software agent or a human agent.
As with all of these challenges, this is not simply a matter of computational horsepower. Fundamentally new approaches are required.
We have made progress in datamining, but far more remains to be accomplished. We need digital assistants that can extract meaning from complex datasets, ranging from multi-terabyte astronomical and seismic data, to marketbasket analysis. These assistants should be able to detect patterns and make new (or suggest alternative) inferences, as well a digest and summarize.
This challenge would be subsumed by a complete solution to #8, since "tables of numbers" are a small part of the corpus of human knowledge implied there. The specific challenge here is to analyze enormous volumes of data to find trends and anomalies and to help people visualize, navigate, and truly understand the data.
This research assistant should open exploration to more people. For example, it should be possible for teachers and students to have ready access to "real" data for exploration, appropriate to their context. Similarly, when everyday people have problems, for example those related to environmental impact or zoning, national or regional data should be available for use. Large data sets should be viewed as national resources that, because of good software design, are open to more people with fewer intermediaries.
Current and emerging high-performance computing systems pose enormous software challenges - architectural advances continue to outstrip our ability to manage parallelism and to deliver large fractions of peak hardware performance. Moreover, the complexity of emerging parallel and distributed computing systems (computational grids) will require new approaches to programming languages, programming models, optimizing compilers, performance and debugging tools, operating system and resource management, flexible network support, adaptive parallel file systems, tertiary storage data staging, indexing and data correlation systems, high-fidelity visualization systems, new metaphors for data representation, intentional specification systems, and multidisciplinary collaboration support.
Three key challenges include scalability and programming systems (research that attacks the lack of performance portability across terascale systems, and the high intellectual cost of creating and optimizing high-performance applications and support software); intelligent data manipulation (research addressing the problems of data access, staging, coordination, and transfer, among heterogeneous physically distributed archives); and visualization and collaboration (research confronting the challenges of data display and representation on enormous scales).
A "perpetual software crisis" arises from the fact that we continually move complexity into software. We have, to first approximation, no useful models, data, or tools that allow us to predict or trade off "hard properties" (such as performance), "soft properties" (such as ease of change), and "economic properties" (such development cost and time-to-market). We are building software in the dark and cannot continue to do so. Research that significantly improves our ability to understand and manage these diverse tradeoffs will enable both the disciplined development of individual software systems and also the creation of market niches in software.
Software is at the heart of almost all organizations - either to allow the organization to run efficiently, or as a product of the organization, or both. Because software is so central, it is under enormous pressure to change extensively and rapidly, often even before it meets the requirements it was originally planned to satisfy. Some of the demand for change arises from other technological change (for instance, in the underlying hardware, operating systems, protocols, component models, or environment). Additional demand for change is imposed by the ever-increasing needs of users and the ever-increasing competition in the marketplace.
For these and other reasons, it is necessary to consider virtually all software systems as "legacy" systems in which the need to accommodate change is part of the requirement, from the moment of conception through its retirement - "from womb to tomb." We need to develop approaches and techniques that allow teams of developers to analyze, understand, reason about, and manipulate large programs written in real - and usually multiple - languages in an effective and predictable way.
Note that software development is only one example of a field in which we must learn how to use computers to amplify the productivity of groups of individuals working on a common task.
Building "correct" software - software that satisfies its requirements - has been the central goal of the research community for years. However, more software fails because it does not satisfy the needs of its users than because it isn't implemented correctly (particularly because subsequent adaptation of the software to meet user needs increases fragility and cost).
Identifying the real needs of users is difficult, both for systems in which the specific users are known (such as contract software for the military) and for systems in which a large and ill-defined population of users is intended (such as desktop software for the masses). Significant research is needed to increase our ability to build the "right" software systems, ensuring of course that those systems are also cost-effective to evolve as the needs of the users change.
Tools to support software development are almost exclusively general-purpose: for example, they may be intended to support the design of programs serving any purpose, or the construction of programs written in any language; they may be intended to help find errors in all programs written in a given language, or to manage arbitrary versions of systems.
In many cases, however, tools that aid in specific, expensive, frequently-encountered software development activities could significantly improve the speed and quality of these activities. For example, consider the value of tools intended to help port applications from one operating system to another, or of tools that help generate software for specific tasks (a spreadsheet is in fact a crude example of the latter). We need research that will yield task-oriented software tools, including tools for the rapid and cost-effective development of task-oriented software tools.
The demand for software far exceeds the nation's ability to produce it. Because of our rapidly increasing expectations for software, this problem will not go away, nor is it likely to be fully addressed by improved technologies for building software as we currently build it (e.g., better languages, better type systems, better source code control systems, better environments).
While task-oriented software tools are one approach, more radical approaches are needed. We must invent techniques for "automatic" or "high-level" programming. First, we must invent new approaches for specifying the desired behavior of computer systems: today's specifications are difficult to write, difficult to read, and often incomplete. Second, we must invent new approaches for generating software from specifications: today, humans translate the specification to the software in an expensive and error-prone process. An "automatic programming" system would reason about the application, interact with the designer, and generate the prototype system.
Attributions
Large-scale systems
1: Jim Gray #1
2: Jim Gray #9
3: Jim Gray #10, 11
Intelligent human-centered systems
4: Don Norman
5: Louis Gomez
6: Jim Gray #8
7: Jim Gray #3, 4, 5
8: Steve Wallach
9: Jim Gray #6
10: Jim Gray #7
11: Sid Karin
12: Jim Gray #2
Scientific exploration
13: Henry Kelly
14: Dan Reed
Engineering of software systems
15: David Notkin
16: David Notkin, Ray Ozzie
17: David Notkin
18: David Notkin, Susan Graham
19: Jim Gray #12