C.5 Infrastructure for Applications Development
Principal Authors:
Ken Kennedy and Paul Leach
Additional Contributors:
W. Richards Adrion, Dona L. Crawford, Jack Dongarra, Steve Evans, Geoffrey
Fox, Dennis Gannon, Bill Janssen, S. Lennart Johnsson, Barry M. Leiner, Kevin
Lewis, Mark Linton, Mark Miller, Paulette M. Quist, Joel Saltz, William L.
Scherlis, Michael Schwartz, Alfred Z. Spector, David Steensgard, Rick L.
Stevens and Willy Zwaenepoel
1. Introduction
The goal of the infrastructure for applications development is to enable rapid
development of network applications at lower cost than is currently possible.
The overall approach is to provide functionality and solutions to problems that
are common to many applications either 1) in a form that can be used by
applications developers or 2) in the form of reusable objects that provide
services.
Applications developed for the National Information Infrastructure will differ
from conventional applications in a number of ways: They will typically be
distributed over heterogeneous computer systems and operate concurrently; they
must scale to handle very large numbers of users, both with respect to
processing power and storage; they must be continuously available and
upgradable without becoming unavailable; they must maintain persistent states
reliably and with integrity in the face of node and network failures; and they
must have some way of locating and accessing a broad spectrum of network
resources. These problems arise in nearly all distributed applications, so it
would be good to provide an infrastructure that helps solve some or all of
them.
To address these issues, we have developed a high-level view, or "application
architecture," and have illustrated the soundness of this view by looking at
some important application examples. In order to prioritize the possible
projects, we have also considered the question of what makes sense in terms of
government investment in the NII. Finally, we have proposed a number of
research and development projects as possibilities for government investment.
2. A Plan for Providing an Applications Development Infrastructure
2.1 Application Development Model
Before proposing an infrastructure for applications development, one must have
a model of the applications being developed and the process by which they are
developed. We take as our base model one of "distributed objects," primarily
for purposes of having a vocabulary for discussion. In this framework, the
application development process can be described at three levels:
- Provision of primitives for implementing objects, invoking them, storing
them, letting them communicate securely, replicating them, etc. We call
implementations of objects classes.
- Creation of low-level (reusable) classes, both at the system level and
application level, which can function as software components.
- Assembling software components into applications, with little or no
programming.
In order to be able to assemble applications from components, a repository of
such components needs to exist. In order to choose the appropriate components
from the repository, there needs to be an appropriate description of salient
features of the components in the repository as well. Such features go beyond
the current notions of repository, which usually limit themselves to describing
the interfaces supported by components, extending to matters such as: the
performance of the components; the reliability of the components; the license
terms and conditions for use of the components; and third-party evaluations and
certifications of the correctness and capabilities of the components.
Once applications have been created, they need to be released into general use,
later maintained and updated, and appropriately licensed and paid for.
2.2 Example Objects
This working group proposes a software model where NII capabilities are built
up from one or more coarse-grain objects or services. The methodology can be
applied to:
- The careful construction of critical NII applications (e.g. health care,
financial services).
- Rapid prototyping and distributed R&D.
The next two sections elaborate on these two types of applications.
2.2.1 Objects for Financial Services and Health Care
Here, we examine two representative NII applications to illustrate and verify
this approach. These applications are quite rich and illustrate several
possible services. However, our choice does not imply that these are the two
most important applications, and further, they do not by any means cover all
possible services. We recommend that detailed studies be conducted to refine
(deep examination of each application) and broaden (many more applications) our
illustrative discussion. We chose these two applications from the workshop
briefings--namely, financial and health care services. In what follows, we show
how these applications can be built up from component objects.
User Interface (client)
- Financial services: Easy-to-use customer interface for integrated
banking and financial services.
- Health care: Portable medical records with user interfaces that
allow easy interaction with individuals having varying degrees of medical
sophistication.
User Interface (service objects)
- Financial services: Interface for bankers to databases.
- Health care: Standardized interface to portable record for text;
image data from various modalities including MRI, CT, PET, ultrasound; EKG, EEG
readings; lab tests, etc.
Security Monitor (network police)
- Financial services: Security checks in finance areas.
- Health care: Security checks in health care areas.
Outcome Modeling (security model)
- Financial services: Model and forecast vulnerability of system to
accidental and deliberate abuse; model and forecast behavior of system under
various load scenarios.
- Health care: Critique a patient case management strategy through
the analysis of collections of matched case histories. Obtain customized
assessment of treatment benefits, risks.
Real-time Synchronization Object, Distributed Database Object, Authorization
Object
- Financial services: Real-time processing of financial records with
authorization.
- Health care: Real-time processing of medical records (emergency
medical care) with authorization; real-time processing of medical sensor data
(e.g. monitoring patient status during anesthesia and in ICU).
Encryption/Decryption Object, Authentication Object, Tamper and Copy Proofing
and Testing Object
- Financial services: Make secure and examine documents,
certificates, receipts, signatures.
- Health care: Access control for medical records. Patients, medical
professionals, researchers and hospital administrators each may be allowed
access to an appropriate subset of a patient's medical records. Authentication
and authorization objects will document rights to prescribe medications and to
deliver various categories of medical treatment.
Digital Signature Object, Biometrics Service
- Financial services: Identify customers and bankers.
- Health care: Identify patients and medical providers.
Real-Time Simple Anomaly Detection Object
- Financial services: Compare single and multiple transactions with
templates to detect possible fraud.
- Health care: Examine single and multiple transactions to flag
careless errors (incorrect medication dose, allergy to prescribed medication
and information from medical records that would reveal reasons not to pursue a
particular treatment plan). Transactions can also be scanned to detect waste
(multiple identical lab tests during a short period of time during which there
is no clinical reason to expect results to have changed, diagnostic tests for
which there is no medical indication, etc.).
Data and Knowledge Mining
- Financial services: Examine multiple transactions to detect
possible fraud (complex anomaly detection) and to identify patterns of market
and consumer behavior (market segmentation).
- Health care: Extensive statistical analysis of medical records to
carry out epidemiological research and to evaluate costs and benefits of
different treatment strategies. Analysis of medical databases to uncover
patterns of fraud.
Simulation Object, Visualization Object
- Financial services: Compute financial models and visualize.
- Health care: Compute and visualize biomedical simulations (e.g.,
simulation of molecular dynamics used in drug discovery, hemodynamic
simulations, etc.).
Image Compression Object with Size/Fidelity Trade-offs
- Financial services: Transmit and store images of financial paper
records.
- Health care: Digital image data from various modalities (MRI, CT,
PET, etc.) will be stored in a distributed manner. Because patients are
geographically mobile, images need to be accessible on very short notice at
arbitrary sites.
Contract Object
- Financial services: Implement legal computer contract.
- Health care: Provide informed consent for treatment.
Digital Cash Object
- Financial services: Implement electronic checks, credit and
bartering.
- Health care: Verify details of health insurance coverage.
2.2.2 Rapid Prototyping and Collaborative Research and Development
Rapid software system prototyping and experimentation is an important class of
application development activities. Examples include a university professor
experimenting with new network software; a quickly assembled team attacking a
timely problem (such as the Hunter's virus that infected residents of the
Southwestern United States in the summer of 1993); and informal experiments
performed by independent software developers.
Rapid prototyping is a key method used by experimental researchers and software
developers in testing new concepts and tools. Rapid prototyping and informal
collaboration have led to breakthroughs in personal computer software
development, epidemiology and other diverse problems. By enabling developers to
test their software without undue risk to the NII, we could help stimulate many
of the important advances needed for the software components of the NII. These
advances will likely have important economic and research impacts.
While conventional application development strives for implementation
completeness (e.g., secure, scalable financial transaction processing), the
goal of rapid prototyping is to permit experimentation with incomplete software
systems. For example, in developing a new network protocol, it may be expedient
to focus on scaling problems while ignoring security issues. Yet, developers of
such software must safeguard themselves and the network at large from
accidental problems that might arise from the incomplete nature of their
applications. For example, errant software should not be able to generate
excessive network traffic or accidentally violate the security perimeter of the
network. The issue here is not enforcing general security measures (covered in
Section C.9), but rather to give assurances to experimenters and application
developers that their tests will not cause problems.
To support safe, rapid prototyping and NII-based testing, research is needed
into methods and tools that allow programmers to develop and test software
while restricting the scope of NII problems that can arise from their software.
Examples might include high-level object services or software libraries that
restrict the scope of network operations, test for various types of runtime
errors or otherwise protect prototype tests from such problems.
3. Research and Development Recommendations
We found significant research issues in the following areas:
- Object system foundations.
- Application construction tools.
- Repositories/object brokers.
- License and payment mechanisms.
- Release and maintenance.
We also propose several different testbeds to accelerate NII applications
development.
3.1 Object System Foundations
Goals
To extend existing and proposed distributed object system foundations to
technologies that scale to the NII. To develop new abstractions to cope with
new problems and challenges introduced by the NII.
Issues
Many of the issues of implementing distributed objects are well understood and
actively under research and development in industry, both in individual
corporations and industry consortia such as the Object Management Group and the
Open Software Foundation. Of the less understood issues, many are covered in
other sections of this report, particularly security, communications and
interoperability. We see the following issues as uniquely relevant to
infrastructure for applications development for the NII:
- Primitives for replication and caching of objects on an NII scale need to
be created. Distributed naming and directory services would be very important
to users of these primitives. The technology underlying current and near-future
directory services--such as the Internet Domain Name Service, X.500 or OSF/DCE
Directory Services--will not scale to the NII, which might require 10,000
copies of a "zone" with fault-tolerant updates appropriate to the data. Today's
replication algorithms can probably be stretched to 10,000 copies, but only if
there are no consistency requirements between the data items (so-called "weak"
consistency), and will only handle tens of copies if there are consistency
requirements ("strong" consistency). Heretofore unknown consistency levels
between "weak" and "strong" might be required in order to get both consistent
data and high replication factors. One way to scale processing power in step
with the number of clients is to cache information at the client machines and
run computations there whenever possible. This will require algorithms for
cache consistency across a much larger number of clients than can currently be
done. Here again, new notions of consistency may need to be developed.
- Primitives for managing persistent storage that are appropriate for the
replication and caching primitives above are likely to be different than the
currently known ones such as atomic transactions, and will need to accompany
them in order to allow NII applications that maintain persistent state to
scale.
- New models for distributed applications that go beyond the current ones of
message passing, RPC, replication and atomic transactions, especially if they
can address several of these issues simultaneously with a single model instead
of piecemeal, would be highly desirable. Current and near-future technology
requires the application developer to use many different techniques to build a
high-quality distributed application, making it a difficult task.
- Primitives for classes for large collaborative applications--e.g., an
interactive-video town meeting with hundreds of people. Current interactive
collaborative applications only handle a few people, and we don't even
understand the issues that will arise when the NII offers the possibility of
much larger interactive collaborations. There are also issues in transparency:
It may be possible to create primitives so that application developers do not
need to do anything special to allow the applications to be used
collaboratively.
- Definition of interfaces that classes can support to allow them to be
assembled by application construction tools (which are discussed below).
- Server load-balancing brokers that can cope with thousands or more of
alternative service providers, each with varying performance, cost, reliability
and trustworthiness factors. No existing or near-term technology can cope with
this scale or heterogeneity.
- Support for distributed debugging of real-time multimedia objects. While
remote debuggers currently exist and distributed debuggers are in development,
much research remains to be done to make debugging of real-time objects
feasible.
- Compiler and runtime technology that automatically decides parallel and
distributed resource allocation. With current and near-term future technology,
this must be done by hand, and is very difficult to do.
- Tools for interactive visualization and tuning of performance of
distributed applications. With current and near-term future technology, this is
very difficult even for centralized applications, and almost nothing exists
today for distributed applications.
3.2 Application Construction Tools
Goals
The primary goal of this category of research is to catalyze the development of
tool suites that facilitate the rapid construction of NII application software,
by composition of existing objects. The term "objects" in this context refers
to relatively heavy-weight or complex objects, also known as "components,"
designed for generality and re-use.[1] The goal of NII
application construction tools should be contrasted with the emphasis of
conventional development tools, which focus on the implementation of efficient
low-level ("primitive") objects. There is a closer match to current-generation
rapid prototyping tools, except that these typically do not address large-scale
distributed contexts such as the NII. Secondary goals should be to encourage
smooth integration of high-level construction tools with existing development
environments and to expand the accessibility of tools beyond the traditional
programmer community.
Issues
The creation of the sort of tools envisioned here raises complex technical and
policy issues. There needs to be a common object model, in the form of standard
interfaces or protocols for obtaining object services, or at least an
architecture within which diverse object models can co-exist. Interoperability
of the model with schemes currently under development in the community (OSI,
Cobra, OLE II, OpenDoc) must be considered.
There may be conflicting constraints. For example, the constraints of
encapsulation (for improved modularity) and proprietary interfaces (for
fostering healthy competition) tend to conflict with the constraint of
encouraging commonality through open interfaces. Publishing "internal
interfaces," in a spirit of openness, can also thwart release/configuration
mechanisms.[2 ]
The opportunities to provide unique services on the NII lead to complexities
for application developers that have not been addressed in current-generation
tools. Applications may be highly distributed and invoke processes residing on
heterogeneous devices. As a concrete example, a component residing on a
hand-held "personal digital assistant" (PDA) might wirelessly transmit an
information retrieval agent to a desktop machine, which in turn is connected to
the NII backbone. The required data might be represented procedurally, in the
form of a component to be run on a supercomputer at a university or government
research center. The application writer must be able to specify these aspects
of the distributed processing, and yet be shielded from irrelevant aspects,
such as routine load balancing between available supercomputers adequate to
execute that component. Similarly, network-based collaborative applications and
remote execution capabilities will require novel support in the tool
environment.
The intended user community for such tools is an important consideration.
Whereas current-generation tools are primarily suitable for use by professional
programmers, the NII represents an opportunity to capture the expertise of
professionals in fields as diverse as art, music, education and medicine, who
should not be distracted from their disciplines by idiosyncratic, low-level
programming considerations. Visual programming paradigms such as models of data
flow wiring seem attractive, but will they offer sufficient power to fully reap
the benefits of the NII, or will textual representations, such as scripting
languages, still be required? Can a single tool environment support a
multidisciplinary project team, ranging from graphic artists to video producers
to cognitive scientists to instructional designers to C++ software engineers?
To what extent will domain-specific tools be required to facilitate "direct
manipulation authoring" (as opposed to "programming") by subject matter
experts?
Tools that support portable delivery of applications are an important issue.
Unless one resorts to requiring a standard "look and feel" across multiple
vendors--an area where healthy competition may still result in significant
innovation--there arises a need for abstraction in user interface (UI) design.
An ideal tool would offer abstract APIs for UI "widgets," be capable of
emulating the look and feel of multiple delivery platforms during development,
and then support "extrusion" of platform-specific code that preserves the
essential semantics of the original UI. Some capability for "fine-tuning" on
specific platforms may be needed, such as when special capabilities such as
movies are deployed on less capable devices.
Finally, the increasing importance of multimedia in applications raises new
challenges. The current Internet, for example, is strongly text-biased,
although new developments such as MIME are changing this situation.
Current-generation multimedia tools are too specialized and difficult to use.
They are especially weak in real-time support, such as playback of movies over
networks. Advances in networking technologies such as ATM will help, but
meanwhile, application writers must have tool support for managing multimedia
assets in a wide-area network environment.
3.3 Repositories/Object Brokers
A repository is a database of information about software components that can be
reused within other software objects to facilitate software development. An
object broker is an agent that can provide value-added services that help
locate other objects.
Software brokers/repositories share many of the issues with generic digital
libraries. Issues such as access, uniform naming, access terms and conditions,
and others are fundamentally the same across all domains of digitally
represented intellectual property. While many of the systems will be similar,
this discussion will attempt to focus on what is different between software
components and other digitally represented intellectual property.
Goal
Development of an infrastructure for storing, accessing and retrieving software
objects for incorporation into other works.
Issues
- Specification: There need to be techniques/tools for describing the
functional, temporal, performance, reliability and other characteristics of
programs. This information is needed for application construction tools to
select appropriate low-level components to assemble into higher-level
components. Current languages (such as OMG IDL or DCE IDL) for the definition
of network services capture little beyond the interfaces supported by a
component; all else is specified in natural language, which cannot today
support automatic selection of components based on their specifications.
- Terms/authorization: Software components will largely be used by
incorporation into other systems, thus, the systems must address licensing, use
and per-use issues, as well as derivative works. Special attention must be paid
to terms with respect to stability of the product and changes to the
specification or implementation of the product (see License and Payment
Mechanisms section).
- Access method: Fundamentally this will be very similar to other libraries.
However, due to the high probability of per-use types of fees, special
attention to metrics of access is necessary. In this, sanitization of
results--providing market information to the developer while maintaining client
privacy--is of special interest.
- Multiplicity of repositories: There will not be a single repository for
all uses. This brings up issues of commonality of interfaces to the
repositories, search across repositories, and commonality of terms and
conditions.
- Scale: Software repositories will likely contain huge numbers of
components. Each component will likely have several to hundreds of versions as
well as host and OS variations. Repositories may exceed 1012 entries over time
and could credibly have up to 1010 transactions/day or more (two to 10 times
the daily load of airline systems).
- Evaluation and certification: Value-added services for evaluation and
certification of repository contents may be present in this environment.
(Certifications are evaluations for which some party takes some contractual
responsibility for their correctness.) Research issues include automatic and
semi-automatic evaluation of implementations against specifications. Practical
considerations include definition of common terms for levels and types of
evaluations and certifications. Certification may also be non-functional (e.g.,
certify that this code does not infringe on a particular copyright).
- Brokerage functions: The complexity of search for software components and
evaluation of specifications and certification results makes it difficult for
the individual developer to reasonably search for themselves. Much of the
search will be aided by generic information access capabilities (discussed in
the Information Access section of this report), but some will be specific to
software components. Brokerages for specific application domains will arise,
and the repository system must enable and encourage the implementation of
value-added brokers.
3.4 License and Payment Mechanisms
Goal
To achieve widespread and rapid dissemination of NII applications by
establishing a common framework for license and payment.
Issues
Many NII applications will be built from a variety of software components that
may have differing authorship and license requirements. This will require
flexible and automatic mechanisms for licensing and payment. Some users may
wish to pay for software at the time of acquisition, others may wish to pay on
a per-use basis. Within a single NII application, some software components may
be used more often than others and payment may need to be made on a pro-rated,
per-component basis. The research and development challenges for this area
include: support for a plurality of payment methods, integration of freeware
with proprietary software, automated negotiation and review of license
mechanisms, user-driven license requirements implementation, support for
market-based software pricing (foundations for an electronics software market),
support for electronic payment (electro-commerce), support for electronic
procurement of NII software products, creation of licensing representative
agent surrogates, scalability and portability of licenses, standard payment and
exchange interfaces, and interfaces to support brain scanners.
Rationale
The primary impact of resolving the license and payment issue will be the rapid
and sustained growth of the NII software industry and the development of a
vigorous market in network-based applications. New licensing and payment
concepts are needed that both scale and handle the complexity of emerging
applications. Once appropriate mechanisms are developed, federal support may be
needed to ensure commonality in implementation.
3.5 Release and maintenance
Goals
The NII application infrastructure needs to support a diversity of software
distribution mechanisms. The sub-mechanisms involve the release, update and
maintenance of runnable software. The NII-specific issues revolve around the
fact that no information superhighway downtime will ever be allowed and that
current users of software that needs to be updated will expect to not have
their service interrupted for what is viewed as administrative activities. The
research issues are focused upon identifying necessary common infrastructure
mechanisms and policies.
Issues
- Release propagation: What are the common software distribution mechanisms
necessary to support a diversity of smooth delivery channels?
- Software identification: What are minimal common mechanisms necessary for
establishing the identity of running software? A real-world analogy is that of
the need for highway vehicle registration. If software is responsible for doing
some damage on the highway, its owner must be identifiable. This issue may be
different from establishing the "driver" of a particular vehicle.
- Bug reporting: For "public health" reasons, it is important that common
ways must exist for reporting problems with software vehicles cruising on, or
crashed upon, the information highway.
- Field debugging: Are there mechanisms that should be part of the
application infrastructure that allow field debugging? The research issue has
to do with whether such mechanisms can be made secure.
- Replacement of running software: What would be the constraints and
mechanisms necessary to support the replacement of running software, while in
use? Given a negative result in this area, what are the technologies or
conventions necessary to support field updating of components without
interrupting service.
- Update proliferation: Are there mechanisms required/ possible to track
down existing versions of software that should be updated? There are privacy
issues here as well as other pragmatic engineering/business issues. There may
be "public health" policy issues here that may override other concerns.
- Obsolescence: What are the mechanisms required for identifying obsolete
programs that can be safely removed from a certain distribution point without
denying service to an existing persistent object?
3.6 Testbeds
Goals
The purpose of the testbeds we envision in the context of the NII should serve
to:
- Validate hardware technology.
- Develop and validate software technology.
- Develop and validate databases.
- Develop and validate innovative use of information, i.e., new
applications.
- Build user confidence.
Issues
The hardware technology for the NII includes not only computers of all sizes,
from personal computers to massively parallel computers, and various
communication technologies, but also various means of human interface
technologies and information storage and retrieval technologies.
Software, which is the embodiment of applications, must be developed in an
environment sufficiently close to the intended final environment both for
reasons of speed and sufficiently accurate representation of operating
environment.
New applications are likely to require concurrent use of multiple databases as
well as extensions to existing databases and new access and update mechanisms.
Early development phases are likely to require systems separate from production
systems for security and stability reasons.
Though testbeds will be established for well-defined application and/or
hardware development projects, and the access is limited, it is desirable that
the availability of testbeds, even in early stages, is not too restrictive to
stimulate innovative use of the NII.
The reason testbeds are needed for the purposes listed above are essentially
the same as in any other activity, namely to:
- Lower the liability of participants and providers.
- Lower the demands for security and protection from unwanted use of data
and systems.
- Lower the demands on robustness and stability of software and hardware
typical in prototyping.
We see a need for three types of testbeds:
- Small testbeds for a specific application with a limited number of parties
involved.
- Medium testbeds where integration of two or more applications is carried
out and the number of participants and users is substantially enlarged.
- Large testbeds employing the NII itself for production-level testing,
verification for a substantial number of users, as well as verification of use
in a rich application environment.
Medium and, in particular, large testbeds must be accessible to a large number
of users, many of whom may represent small businesses and independent
entrepreneurs. This requirement places stringent demands on safeguards against
inadvertent use of new applications both with respect to resource use, data
integrity, system stability and interference with other applications on the
NII. Safeguards against both unintended misuse and malicious use must be highly
developed.
Testbeds for NII applications are likely to be multifaceted and distributed.
Testbeds are likely to require substantial resources for integration and
operation. It is likely that several aspects of providing, operating and using
testbeds will result in viable businesses, such as systems integration; systems
operation and maintenance; and training, assistance and consulting services.
Testbeds are expected to be established through partnerships between:
- One or more government agencies.
- Prime application developers/providers.
- Resource providers:
- Computation
- Data storage
- Communication
- Databases
- Data gathering
- Software
- Integration
Examples of government agencies that we envision would assume the prime
responsibilities for testbeds are: NIH, for a health care testbed; EPA, for an
environmental testbed; NSF, for a technology testbed; ARPA, for a global grid
testbed; the Treasury, for a financial services testbed; NASA, for a
manufacturing testbed; and DOE, for an energy management testbed. These
testbeds could be linked in a second stage and integrated into NII itself in a
third stage. Examples of prime application developers/providers are, for
instance, the Mayo Clinic in the health area and Citibank and Fidelity
Investments in the financial services area. Data gathering or collection may
represent a significant part of the resources, such as CAT scanners and MRI
equipment in the health sector.
[1] It is assumed that such objects and their descriptions,
including meta-information, would be stored in one or more network repositories
that can be accessed through a variety of search, directory services and
indexing schemes. The detailed design issues for such repositories are
addressed in another subsection. The information retrieval issues involved in
locating objects within the repository are being addressed in another section.
Likewise, issues of intellectual property compensation, metering and licensing
are considered elsewhere in this report.
[2] The problems of configuration management and robust
software release mechanisms are examined in another section.