C.5 Infrastructure for Applications Development

Principal Authors:

Ken Kennedy and Paul Leach

Additional Contributors:

W. Richards Adrion, Dona L. Crawford, Jack Dongarra, Steve Evans, Geoffrey Fox, Dennis Gannon, Bill Janssen, S. Lennart Johnsson, Barry M. Leiner, Kevin Lewis, Mark Linton, Mark Miller, Paulette M. Quist, Joel Saltz, William L. Scherlis, Michael Schwartz, Alfred Z. Spector, David Steensgard, Rick L. Stevens and Willy Zwaenepoel

1. Introduction

The goal of the infrastructure for applications development is to enable rapid development of network applications at lower cost than is currently possible. The overall approach is to provide functionality and solutions to problems that are common to many applications either 1) in a form that can be used by applications developers or 2) in the form of reusable objects that provide services.

Applications developed for the National Information Infrastructure will differ from conventional applications in a number of ways: They will typically be distributed over heterogeneous computer systems and operate concurrently; they must scale to handle very large numbers of users, both with respect to processing power and storage; they must be continuously available and upgradable without becoming unavailable; they must maintain persistent states reliably and with integrity in the face of node and network failures; and they must have some way of locating and accessing a broad spectrum of network resources. These problems arise in nearly all distributed applications, so it would be good to provide an infrastructure that helps solve some or all of them.

To address these issues, we have developed a high-level view, or "application architecture," and have illustrated the soundness of this view by looking at some important application examples. In order to prioritize the possible projects, we have also considered the question of what makes sense in terms of government investment in the NII. Finally, we have proposed a number of research and development projects as possibilities for government investment.

2. A Plan for Providing an Applications Development Infrastructure

2.1 Application Development Model

Before proposing an infrastructure for applications development, one must have a model of the applications being developed and the process by which they are developed. We take as our base model one of "distributed objects," primarily for purposes of having a vocabulary for discussion. In this framework, the application development process can be described at three levels:

Provision of primitives for implementing objects, invoking them, storing them, letting them communicate securely, replicating them, etc. We call implementations of objects classes.
Creation of low-level (reusable) classes, both at the system level and application level, which can function as software components.
Assembling software components into applications, with little or no programming.

In order to be able to assemble applications from components, a repository of such components needs to exist. In order to choose the appropriate components from the repository, there needs to be an appropriate description of salient features of the components in the repository as well. Such features go beyond the current notions of repository, which usually limit themselves to describing the interfaces supported by components, extending to matters such as: the performance of the components; the reliability of the components; the license terms and conditions for use of the components; and third-party evaluations and certifications of the correctness and capabilities of the components.

Once applications have been created, they need to be released into general use, later maintained and updated, and appropriately licensed and paid for.

2.2 Example Objects

This working group proposes a software model where NII capabilities are built up from one or more coarse-grain objects or services. The methodology can be applied to:

The careful construction of critical NII applications (e.g. health care, financial services).
Rapid prototyping and distributed R&D.

The next two sections elaborate on these two types of applications.

2.2.1 Objects for Financial Services and Health Care

Here, we examine two representative NII applications to illustrate and verify this approach. These applications are quite rich and illustrate several possible services. However, our choice does not imply that these are the two most important applications, and further, they do not by any means cover all possible services. We recommend that detailed studies be conducted to refine (deep examination of each application) and broaden (many more applications) our illustrative discussion. We chose these two applications from the workshop briefings--namely, financial and health care services. In what follows, we show how these applications can be built up from component objects.

User Interface (client)

Financial services: Easy-to-use customer interface for integrated banking and financial services.
Health care: Portable medical records with user interfaces that allow easy interaction with individuals having varying degrees of medical sophistication.

User Interface (service objects)

Financial services: Interface for bankers to databases.
Health care: Standardized interface to portable record for text; image data from various modalities including MRI, CT, PET, ultrasound; EKG, EEG readings; lab tests, etc.

Security Monitor (network police)

Financial services: Security checks in finance areas.
Health care: Security checks in health care areas.

Outcome Modeling (security model)

Financial services: Model and forecast vulnerability of system to accidental and deliberate abuse; model and forecast behavior of system under various load scenarios.
Health care: Critique a patient case management strategy through the analysis of collections of matched case histories. Obtain customized assessment of treatment benefits, risks.

Real-time Synchronization Object, Distributed Database Object, Authorization Object

Financial services: Real-time processing of financial records with authorization.
Health care: Real-time processing of medical records (emergency medical care) with authorization; real-time processing of medical sensor data (e.g. monitoring patient status during anesthesia and in ICU).

Encryption/Decryption Object, Authentication Object, Tamper and Copy Proofing and Testing Object

Financial services: Make secure and examine documents, certificates, receipts, signatures.
Health care: Access control for medical records. Patients, medical professionals, researchers and hospital administrators each may be allowed access to an appropriate subset of a patient's medical records. Authentication and authorization objects will document rights to prescribe medications and to deliver various categories of medical treatment.

Digital Signature Object, Biometrics Service

Financial services: Identify customers and bankers.
Health care: Identify patients and medical providers.

Real-Time Simple Anomaly Detection Object

Financial services: Compare single and multiple transactions with templates to detect possible fraud.
Health care: Examine single and multiple transactions to flag careless errors (incorrect medication dose, allergy to prescribed medication and information from medical records that would reveal reasons not to pursue a particular treatment plan). Transactions can also be scanned to detect waste (multiple identical lab tests during a short period of time during which there is no clinical reason to expect results to have changed, diagnostic tests for which there is no medical indication, etc.).

Data and Knowledge Mining

Financial services: Examine multiple transactions to detect possible fraud (complex anomaly detection) and to identify patterns of market and consumer behavior (market segmentation).
Health care: Extensive statistical analysis of medical records to carry out epidemiological research and to evaluate costs and benefits of different treatment strategies. Analysis of medical databases to uncover patterns of fraud.

Simulation Object, Visualization Object

Financial services: Compute financial models and visualize.
Health care: Compute and visualize biomedical simulations (e.g., simulation of molecular dynamics used in drug discovery, hemodynamic simulations, etc.).

Image Compression Object with Size/Fidelity Trade-offs

Financial services: Transmit and store images of financial paper records.
Health care: Digital image data from various modalities (MRI, CT, PET, etc.) will be stored in a distributed manner. Because patients are geographically mobile, images need to be accessible on very short notice at arbitrary sites.

Contract Object

Financial services: Implement legal computer contract.
Health care: Provide informed consent for treatment.

Digital Cash Object

Financial services: Implement electronic checks, credit and bartering.
Health care: Verify details of health insurance coverage.

2.2.2 Rapid Prototyping and Collaborative Research and Development

Rapid software system prototyping and experimentation is an important class of application development activities. Examples include a university professor experimenting with new network software; a quickly assembled team attacking a timely problem (such as the Hunter's virus that infected residents of the Southwestern United States in the summer of 1993); and informal experiments performed by independent software developers.

Rapid prototyping is a key method used by experimental researchers and software developers in testing new concepts and tools. Rapid prototyping and informal collaboration have led to breakthroughs in personal computer software development, epidemiology and other diverse problems. By enabling developers to test their software without undue risk to the NII, we could help stimulate many of the important advances needed for the software components of the NII. These advances will likely have important economic and research impacts.

While conventional application development strives for implementation completeness (e.g., secure, scalable financial transaction processing), the goal of rapid prototyping is to permit experimentation with incomplete software systems. For example, in developing a new network protocol, it may be expedient to focus on scaling problems while ignoring security issues. Yet, developers of such software must safeguard themselves and the network at large from accidental problems that might arise from the incomplete nature of their applications. For example, errant software should not be able to generate excessive network traffic or accidentally violate the security perimeter of the network. The issue here is not enforcing general security measures (covered in Section C.9), but rather to give assurances to experimenters and application developers that their tests will not cause problems.

To support safe, rapid prototyping and NII-based testing, research is needed into methods and tools that allow programmers to develop and test software while restricting the scope of NII problems that can arise from their software. Examples might include high-level object services or software libraries that restrict the scope of network operations, test for various types of runtime errors or otherwise protect prototype tests from such problems.

3. Research and Development Recommendations

We found significant research issues in the following areas:

Object system foundations.
Application construction tools.
Repositories/object brokers.
License and payment mechanisms.
Release and maintenance.

We also propose several different testbeds to accelerate NII applications development.

3.1 Object System Foundations

Goals

To extend existing and proposed distributed object system foundations to technologies that scale to the NII. To develop new abstractions to cope with new problems and challenges introduced by the NII.

Issues

Many of the issues of implementing distributed objects are well understood and actively under research and development in industry, both in individual corporations and industry consortia such as the Object Management Group and the Open Software Foundation. Of the less understood issues, many are covered in other sections of this report, particularly security, communications and interoperability. We see the following issues as uniquely relevant to infrastructure for applications development for the NII:

Primitives for replication and caching of objects on an NII scale need to be created. Distributed naming and directory services would be very important to users of these primitives. The technology underlying current and near-future directory services--such as the Internet Domain Name Service, X.500 or OSF/DCE Directory Services--will not scale to the NII, which might require 10,000 copies of a "zone" with fault-tolerant updates appropriate to the data. Today's replication algorithms can probably be stretched to 10,000 copies, but only if there are no consistency requirements between the data items (so-called "weak" consistency), and will only handle tens of copies if there are consistency requirements ("strong" consistency). Heretofore unknown consistency levels between "weak" and "strong" might be required in order to get both consistent data and high replication factors. One way to scale processing power in step with the number of clients is to cache information at the client machines and run computations there whenever possible. This will require algorithms for cache consistency across a much larger number of clients than can currently be done. Here again, new notions of consistency may need to be developed.
Primitives for managing persistent storage that are appropriate for the replication and caching primitives above are likely to be different than the currently known ones such as atomic transactions, and will need to accompany them in order to allow NII applications that maintain persistent state to scale.
New models for distributed applications that go beyond the current ones of message passing, RPC, replication and atomic transactions, especially if they can address several of these issues simultaneously with a single model instead of piecemeal, would be highly desirable. Current and near-future technology requires the application developer to use many different techniques to build a high-quality distributed application, making it a difficult task.
Primitives for classes for large collaborative applications--e.g., an interactive-video town meeting with hundreds of people. Current interactive collaborative applications only handle a few people, and we don't even understand the issues that will arise when the NII offers the possibility of much larger interactive collaborations. There are also issues in transparency: It may be possible to create primitives so that application developers do not need to do anything special to allow the applications to be used collaboratively.
Definition of interfaces that classes can support to allow them to be assembled by application construction tools (which are discussed below).
Server load-balancing brokers that can cope with thousands or more of alternative service providers, each with varying performance, cost, reliability and trustworthiness factors. No existing or near-term technology can cope with this scale or heterogeneity.
Support for distributed debugging of real-time multimedia objects. While remote debuggers currently exist and distributed debuggers are in development, much research remains to be done to make debugging of real-time objects feasible.
Compiler and runtime technology that automatically decides parallel and distributed resource allocation. With current and near-term future technology, this must be done by hand, and is very difficult to do.
Tools for interactive visualization and tuning of performance of distributed applications. With current and near-term future technology, this is very difficult even for centralized applications, and almost nothing exists today for distributed applications.

3.2 Application Construction Tools

Goals

The primary goal of this category of research is to catalyze the development of tool suites that facilitate the rapid construction of NII application software, by composition of existing objects. The term "objects" in this context refers to relatively heavy-weight or complex objects, also known as "components," designed for generality and re-use.[1] The goal of NII application construction tools should be contrasted with the emphasis of conventional development tools, which focus on the implementation of efficient low-level ("primitive") objects. There is a closer match to current-generation rapid prototyping tools, except that these typically do not address large-scale distributed contexts such as the NII. Secondary goals should be to encourage smooth integration of high-level construction tools with existing development environments and to expand the accessibility of tools beyond the traditional programmer community.

Issues

The creation of the sort of tools envisioned here raises complex technical and policy issues. There needs to be a common object model, in the form of standard interfaces or protocols for obtaining object services, or at least an architecture within which diverse object models can co-exist. Interoperability of the model with schemes currently under development in the community (OSI, Cobra, OLE II, OpenDoc) must be considered.

There may be conflicting constraints. For example, the constraints of encapsulation (for improved modularity) and proprietary interfaces (for fostering healthy competition) tend to conflict with the constraint of encouraging commonality through open interfaces. Publishing "internal interfaces," in a spirit of openness, can also thwart release/configuration mechanisms.[2 ]

The opportunities to provide unique services on the NII lead to complexities for application developers that have not been addressed in current-generation tools. Applications may be highly distributed and invoke processes residing on heterogeneous devices. As a concrete example, a component residing on a hand-held "personal digital assistant" (PDA) might wirelessly transmit an information retrieval agent to a desktop machine, which in turn is connected to the NII backbone. The required data might be represented procedurally, in the form of a component to be run on a supercomputer at a university or government research center. The application writer must be able to specify these aspects of the distributed processing, and yet be shielded from irrelevant aspects, such as routine load balancing between available supercomputers adequate to execute that component. Similarly, network-based collaborative applications and remote execution capabilities will require novel support in the tool environment.

The intended user community for such tools is an important consideration. Whereas current-generation tools are primarily suitable for use by professional programmers, the NII represents an opportunity to capture the expertise of professionals in fields as diverse as art, music, education and medicine, who should not be distracted from their disciplines by idiosyncratic, low-level programming considerations. Visual programming paradigms such as models of data flow wiring seem attractive, but will they offer sufficient power to fully reap the benefits of the NII, or will textual representations, such as scripting languages, still be required? Can a single tool environment support a multidisciplinary project team, ranging from graphic artists to video producers to cognitive scientists to instructional designers to C++ software engineers? To what extent will domain-specific tools be required to facilitate "direct manipulation authoring" (as opposed to "programming") by subject matter experts?

Tools that support portable delivery of applications are an important issue. Unless one resorts to requiring a standard "look and feel" across multiple vendors--an area where healthy competition may still result in significant innovation--there arises a need for abstraction in user interface (UI) design. An ideal tool would offer abstract APIs for UI "widgets," be capable of emulating the look and feel of multiple delivery platforms during development, and then support "extrusion" of platform-specific code that preserves the essential semantics of the original UI. Some capability for "fine-tuning" on specific platforms may be needed, such as when special capabilities such as movies are deployed on less capable devices.

Finally, the increasing importance of multimedia in applications raises new challenges. The current Internet, for example, is strongly text-biased, although new developments such as MIME are changing this situation. Current-generation multimedia tools are too specialized and difficult to use. They are especially weak in real-time support, such as playback of movies over networks. Advances in networking technologies such as ATM will help, but meanwhile, application writers must have tool support for managing multimedia assets in a wide-area network environment.

3.3 Repositories/Object Brokers

A repository is a database of information about software components that can be reused within other software objects to facilitate software development. An object broker is an agent that can provide value-added services that help locate other objects.

Software brokers/repositories share many of the issues with generic digital libraries. Issues such as access, uniform naming, access terms and conditions, and others are fundamentally the same across all domains of digitally represented intellectual property. While many of the systems will be similar, this discussion will attempt to focus on what is different between software components and other digitally represented intellectual property.

Goal

Development of an infrastructure for storing, accessing and retrieving software objects for incorporation into other works.

Issues

Specification: There need to be techniques/tools for describing the functional, temporal, performance, reliability and other characteristics of programs. This information is needed for application construction tools to select appropriate low-level components to assemble into higher-level components. Current languages (such as OMG IDL or DCE IDL) for the definition of network services capture little beyond the interfaces supported by a component; all else is specified in natural language, which cannot today support automatic selection of components based on their specifications.
Terms/authorization: Software components will largely be used by incorporation into other systems, thus, the systems must address licensing, use and per-use issues, as well as derivative works. Special attention must be paid to terms with respect to stability of the product and changes to the specification or implementation of the product (see License and Payment Mechanisms section).
Access method: Fundamentally this will be very similar to other libraries. However, due to the high probability of per-use types of fees, special attention to metrics of access is necessary. In this, sanitization of results--providing market information to the developer while maintaining client privacy--is of special interest.
Multiplicity of repositories: There will not be a single repository for all uses. This brings up issues of commonality of interfaces to the repositories, search across repositories, and commonality of terms and conditions.
Scale: Software repositories will likely contain huge numbers of components. Each component will likely have several to hundreds of versions as well as host and OS variations. Repositories may exceed 1012 entries over time and could credibly have up to 1010 transactions/day or more (two to 10 times the daily load of airline systems).
Evaluation and certification: Value-added services for evaluation and certification of repository contents may be present in this environment. (Certifications are evaluations for which some party takes some contractual responsibility for their correctness.) Research issues include automatic and semi-automatic evaluation of implementations against specifications. Practical considerations include definition of common terms for levels and types of evaluations and certifications. Certification may also be non-functional (e.g., certify that this code does not infringe on a particular copyright).
Brokerage functions: The complexity of search for software components and evaluation of specifications and certification results makes it difficult for the individual developer to reasonably search for themselves. Much of the search will be aided by generic information access capabilities (discussed in the Information Access section of this report), but some will be specific to software components. Brokerages for specific application domains will arise, and the repository system must enable and encourage the implementation of value-added brokers.

3.4 License and Payment Mechanisms

Goal

To achieve widespread and rapid dissemination of NII applications by establishing a common framework for license and payment.

Issues

Many NII applications will be built from a variety of software components that may have differing authorship and license requirements. This will require flexible and automatic mechanisms for licensing and payment. Some users may wish to pay for software at the time of acquisition, others may wish to pay on a per-use basis. Within a single NII application, some software components may be used more often than others and payment may need to be made on a pro-rated, per-component basis. The research and development challenges for this area include: support for a plurality of payment methods, integration of freeware with proprietary software, automated negotiation and review of license mechanisms, user-driven license requirements implementation, support for market-based software pricing (foundations for an electronics software market), support for electronic payment (electro-commerce), support for electronic procurement of NII software products, creation of licensing representative agent surrogates, scalability and portability of licenses, standard payment and exchange interfaces, and interfaces to support brain scanners.

Rationale

The primary impact of resolving the license and payment issue will be the rapid and sustained growth of the NII software industry and the development of a vigorous market in network-based applications. New licensing and payment concepts are needed that both scale and handle the complexity of emerging applications. Once appropriate mechanisms are developed, federal support may be needed to ensure commonality in implementation.

3.5 Release and maintenance

Goals

The NII application infrastructure needs to support a diversity of software distribution mechanisms. The sub-mechanisms involve the release, update and maintenance of runnable software. The NII-specific issues revolve around the fact that no information superhighway downtime will ever be allowed and that current users of software that needs to be updated will expect to not have their service interrupted for what is viewed as administrative activities. The research issues are focused upon identifying necessary common infrastructure mechanisms and policies.

Issues

Release propagation: What are the common software distribution mechanisms necessary to support a diversity of smooth delivery channels?
Software identification: What are minimal common mechanisms necessary for establishing the identity of running software? A real-world analogy is that of the need for highway vehicle registration. If software is responsible for doing some damage on the highway, its owner must be identifiable. This issue may be different from establishing the "driver" of a particular vehicle.
Bug reporting: For "public health" reasons, it is important that common ways must exist for reporting problems with software vehicles cruising on, or crashed upon, the information highway.
Field debugging: Are there mechanisms that should be part of the application infrastructure that allow field debugging? The research issue has to do with whether such mechanisms can be made secure.
Replacement of running software: What would be the constraints and mechanisms necessary to support the replacement of running software, while in use? Given a negative result in this area, what are the technologies or conventions necessary to support field updating of components without interrupting service.
Update proliferation: Are there mechanisms required/ possible to track down existing versions of software that should be updated? There are privacy issues here as well as other pragmatic engineering/business issues. There may be "public health" policy issues here that may override other concerns.
Obsolescence: What are the mechanisms required for identifying obsolete programs that can be safely removed from a certain distribution point without denying service to an existing persistent object?

3.6 Testbeds

Goals

The purpose of the testbeds we envision in the context of the NII should serve to:

Validate hardware technology.
Develop and validate software technology.
Develop and validate databases.
Develop and validate innovative use of information, i.e., new applications.
Build user confidence.

Issues

The hardware technology for the NII includes not only computers of all sizes, from personal computers to massively parallel computers, and various communication technologies, but also various means of human interface technologies and information storage and retrieval technologies.

Software, which is the embodiment of applications, must be developed in an environment sufficiently close to the intended final environment both for reasons of speed and sufficiently accurate representation of operating environment.

New applications are likely to require concurrent use of multiple databases as well as extensions to existing databases and new access and update mechanisms. Early development phases are likely to require systems separate from production systems for security and stability reasons.

Though testbeds will be established for well-defined application and/or hardware development projects, and the access is limited, it is desirable that the availability of testbeds, even in early stages, is not too restrictive to stimulate innovative use of the NII.

The reason testbeds are needed for the purposes listed above are essentially the same as in any other activity, namely to:

Lower the liability of participants and providers.
Lower the demands for security and protection from unwanted use of data and systems.
Lower the demands on robustness and stability of software and hardware typical in prototyping.

We see a need for three types of testbeds:

Small testbeds for a specific application with a limited number of parties involved.
Medium testbeds where integration of two or more applications is carried out and the number of participants and users is substantially enlarged.
Large testbeds employing the NII itself for production-level testing, verification for a substantial number of users, as well as verification of use in a rich application environment.

Medium and, in particular, large testbeds must be accessible to a large number of users, many of whom may represent small businesses and independent entrepreneurs. This requirement places stringent demands on safeguards against inadvertent use of new applications both with respect to resource use, data integrity, system stability and interference with other applications on the NII. Safeguards against both unintended misuse and malicious use must be highly developed.

Testbeds for NII applications are likely to be multifaceted and distributed. Testbeds are likely to require substantial resources for integration and operation. It is likely that several aspects of providing, operating and using testbeds will result in viable businesses, such as systems integration; systems operation and maintenance; and training, assistance and consulting services.

Testbeds are expected to be established through partnerships between:

One or more government agencies.
Prime application developers/providers.
Resource providers:
Computation
Data storage
Communication
Databases
Data gathering
Software
Integration

Examples of government agencies that we envision would assume the prime responsibilities for testbeds are: NIH, for a health care testbed; EPA, for an environmental testbed; NSF, for a technology testbed; ARPA, for a global grid testbed; the Treasury, for a financial services testbed; NASA, for a manufacturing testbed; and DOE, for an energy management testbed. These testbeds could be linked in a second stage and integrated into NII itself in a third stage. Examples of prime application developers/providers are, for instance, the Mayo Clinic in the health area and Citibank and Fidelity Investments in the financial services area. Data gathering or collection may represent a significant part of the resources, such as CAT scanners and MRI equipment in the health sector.

[1] It is assumed that such objects and their descriptions, including meta-information, would be stored in one or more network repositories that can be accessed through a variety of search, directory services and indexing schemes. The detailed design issues for such repositories are addressed in another subsection. The information retrieval issues involved in locating objects within the repository are being addressed in another section. Likewise, issues of intellectual property compensation, metering and licensing are considered elsewhere in this report.

[2] The problems of configuration management and robust software release mechanisms are examined in another section.