CRA Logo

About CRA
CRA for Students
CRA for Faculty
Computing Community Consortium (CCC)
Government Affairs
Computing Research Policy Blog
Data & Resources
CRA Bulletin
What's New

The following text was scanned from a National Research Council study titled Research-Doctorate Programs in the United States: Continuity and Change (Appendix G, pp. 143-146).

Data Sources Utilized in Profiles of Participating Programs

The data sets developed for the Study of Research-Doctorate Programs in the United States provide a helpful source of information about the faculty and students who participate in research-doctorate programs and about the institutions sponsoring the programs. The data describing institutional characteristics were obtained from several national data bases, as well as from independently generated data sets which supply descriptive information about the programs, such as lists of faculty members that participate in the programs. However, as mentioned in Appendix E, gaps may occur in the information owing to response patterns to each of the data sources described below. The following sections describe the origins and composition of the data sets.


A generally accepted measure for judging the productivity and quality of a research program is the publication record of the faculty. The measure includes both a count of papers published in reviewed journals and monographs printed by recognized publishers, and the impact of those publications on the research in the area as measured by a citation analysis. The Institute for Scientific Information (ISI) maintains a computer file consisting of bibliographic records of papers indexed in the ISI citation indexes: the Science Citation Index, Social Science Citation Index, and Arts and Humanities Citation Index. With this file ISI maintains a citation index which identifies for each publication the other publications on its file that have been cited in the article. By matching the names on ISI's file with the program faculty it is possible to calculate a publication measure and a citation measure. This measure is considered of little value in the Arts and Humanities fields, due to the ISI concentration of papers in journals and monographs, and is only used for the 30 fields outside the Arts and Humanities.

For the Study of Research-Doctorate Programs in the United States, ISI extracted a raw data file which contains over 4.5 million publication records for the period 1981 to 1992. Each record contains bibliographic information about the publication, the author's (s') last name and initials, the author's (s') addresses, and identifying codes that link the publication to a citation file. For the Research-Doctorate Study a computer match was made between the ISI records and the publication of faculty participating in the 3,634 programs. The matching process was done at several stages and under successively stronger criteria. First the last name of each author on the ISI file was matched to the last name of the faculty list, independent of the faculty member's program and institution. Next the Zip Codes of the addresses on the ISI file and the institution of the faculty member were matched using a criterion for area Zip Codes that allowed authors to use an address near their home institution. Finally each journal was assigned an area identifier to be matched against the field of the program faculty. The result of this matching was the identification of approximately l million publications that could be credited to the program faculty in the study. The number of publications for each faculty member was then tabulated and the number of citations counted from the citation file. The data for the purpose of this study were aggregated to the program level, but are still available at the individual level for detailed analysis.


One measure of the research activity for a program is the amount of federal research support that can be attributed to its program faculty. By matching the names of principal investigators on federal grants with the faculty for each program, it is possible to calculate measures such as the amount of grant support and the proportion of the faculty receiving support. While this analysis of research support does not take into consideration funds from private foundations and industry, it does provide information that can be used to compare one program with another.

The primary federal agencies that provide research support for faculty at U.S. universities are the National Science Foundation, the National Institutes of Health, the Department of Defense, the National Endowment for the Humanities, the Department of Energy, the Department of Agriculture, and the National Aeronautics and Space Administration. Analyzing funding for the agencies in the population was complicated by the inclusion of support for instrumentation and facilities as part of a research award. This type of funding, if not specified by a particular category, could not be identified without an examination of each award. Thus the study used the overall funding figure, assuming that development funds of this type have an overall positive benefit to the research program and provide some indication of the research level of the program.

Data files-mainly in computer form-were obtained from each of the above agencies. The amount of grant support for a program was obtained by matching the names of the principal investigator and co-principal investigator, when available, against the program faculty file. The match was made on a last name-first initial basis by computer, using institutional names and a research field to identify faculty for the large NSF and NIH files. For the other files a preliminary match was made using last name and first initial only and a hand match was then done using the institution and area of research. The data available from these matches include the amount of each award, the duration of the award, and the agency supporting the research.


The Survey of Earned Doctorates (SED) has collected basic statistics from the universe of doctorate recipients in the United States since the 1920s. Beginning in 1958, SED has been conducted by the National Academy of Sciences/ National Research Council and is currently supported by five federal agencies: the National Science Foundation, the U.S. Department of Education, the National Center for Education Statistics, the National Endowment for the Humanities, the Department of Agriculture, and the National Institutes of Health.

Administered annually, SED produces national-level data. The survey form contains 25 questions that obtain information on sex, race/ethnicity, marital status, citizenship, disabilities, dependents, specialty field of doctorate, educational institutions attended, time spent in completion of doctorate, financial support, educational debt, postgraduation plans, and educational attainment of parents of each person who receives a doctorate. These data were compiled and placed in a file called the Doctorate Records File (DRF).

The survey universe is a complete census of all regionally accredited universities in the United States and its territories that confer research doctorates. In 1992, there were 366 such institutions. Approximately 95 percent or more of the annual cohorts of doctoral recipients respond to the survey. Response rates are further delineated by science and/or engineering (S&E) field. In 1991, these varied from 91 percent to 98 percent across science and engineering fields.

To use the SED data for the Research-Doctorate Study, a crosswalk was developed among the 41 fields in the study and the SED Specialties List, the doctorate fields into which graduates classify themselves. Using this crosswalk the number of graduates and the characteristics of the graduates for each of the 3,634 programs in the study were determined. One problem with this procedure is the possibility of incorrect identification of degree field by the graduate. However, this survey is considered to be the most accurate of its type, and the level of this error is small and needs to be taken into consideration only when the number of responses in a particular cell is small. While some characteristics of the graduates, such as the number of portable fellowships held by program graduates were considered to be particularly interesting for the study, the data were considered to be unreliable and were not used. In general the data file is the DRF is extensive and only a fraction of these data were used in the research-doctorate analysis.


The accomplishments of an individual faculty member are often recognized by independent groups, such as foundations and governmental agencies which award competitive fellowships or peer-reviewed research grants, and the national academies and other honorific organizations which confer membership on the basis of academic distinction. One measure of the quality of a program's faculty can be derived by identifying the number of faculty members who have received such awards. In future studies of this kind, it should be possible to assemble lists of the major fellowships, residencies, and academic honors pertinent to each of the five broad subject areas, and to derive one indicator of faculty quality from such lists. Owing to constraints of both time and resources, the authors of the present study have not been able to make comprehensive use of such data. However, since the Arts and Humanities Citation Index has been judged inadequate to provide a fair measure of scholarly productivity in these fields, in this study data on Honors and Awards have been preferred instead, and so have been substituted from scholarly publication as one measure of quality in the Humanities.

Awards, Fellowships and Honors Organization List:

  • Nobel Prize
  • John Simon Guggenheim Fellowship
  • Fulbright Awards
  • American Council of Learned Societies Fellowships
  • Huntington Library Research Fellowships
  • American School of Classical Studies in Athens Fellowships
  • Residency at the Center for Advance Study in the Behavioral Sciences
  • Residency at the National Humanities Center
  • Residency at the Getty Center for Humanities and Arts
  • American Academy of Arts and Sciences
  • MacArthur Awards
  • Alexander von Humbolt Fellowship
  • National Endowment for the Humanities Fellowships
  • American Antiquarian Society Fellowships
  • Newberry Library Fellowships
  • Folger Library Post Doctoral Fellowships
  • Residency at the Institute for Advance Study
  • Residency at the Center for the Advance Study in the Visual Arts
  • Residency at the Woodrow Wilson Center for Scholars
  • American Philosophical Society
  • American Academy at Rome


Three data sources were used to generate institutional statistics on research activity: library data from the professional library associations, R&D expenditures from the National Science Foundation, and fall undergraduate and graduate enrollment data from the U.S. Department of Education.

Association of Research Libraries Association of Colleges and Research Libraries

The Association of Research Libraries (ARL) collects data on an annual basis from its 113 member libraries. Ninety-five of these are institutions in the U.S. with graduate programs. Data are collected on volumes held (excluding microfilm, uncatalogued government material, and audio-visual materials), current serials held, volumes added and withdrawn, and microfilm and government documents. They also collect data on the size of the library staff, the expenditure for staff and materials, the number of interlibrary loans, and the size of the faculty and the student body. These data are made available through an annual report, ARL Statistics. In addition to the data collected by ARL on its member institutions, a survey is conducted by the Association of Colleges and Research Libraries (ACRL) on an additional 110 U.S. Libraries. The Research-Doctorate Study used the data from ARL and ACRL, supplemented by Department of Education data for institutions not surveyed by the library organizations, to measure the size of institutional libraries through the number of volumes and serials in each library, and its level of activity and support services through the amount of annual expenditures.

National Science Foundation

NSF conducts an annual survey on research and development expenditures from a sample of 459 institutions of higher education in the United States. In 1992 the sample was generated from the universe of 595 schools that grant a graduate science or engineering degree and/or received at least $50,000 funded from separately budgeted R&D expenditures. This is a very carefully conducted survey with attention given to the recordkeeping process at the institution to ensure consistency from one year to the next. Several items on R&D expenditures are contained in the survey, including allocation of amounts to major subdisciplines and purchases of research equipment. The two main data elements used for the Research-Doctorate Study are the total R&D expenditure and the Federal R&D expenditure. Other data elements are included in the institutional file and can be used by researchers to compare institutional and program characteristics.

U.S. Department of Education

The Integrated Postsecondary Education Data System (IPEDS) consists of several integrated components that obtain information on types of postsecondary institutions, student participants, programs offered and completed, and the human and financial resources involved in the delivery of postsecondary education. The IPEDS Fall Enrollment Survey replaces and extends the previous Higher Education General Information Survey, "Fall Enrollment and Compliance Report of Institutions of Higher Education." The IPEDS Fall Enrollment Survey has two versions designated EFI and EF2, which are administered to a census of accredited institutions offering degrees at the bachelor' s level and above (EF1 survey) and all two-year institutions (EF2 survey). These surveys are conducted annually and are completed by institutional administrators who provide total enrollments and broad field enrollments. For the Research-Doctorate Study the institutional enrollment data were used to compare undergraduate and graduate enrollment as a measure of graduate activity.

Copyright © 2004 Computing Research Association. All Rights Reserved. Questions? E-mail: