LOC Workshop on Etexts | Page 5

Not Available
of the transcribed text--some 135,000 documents, available
for research during the decades while the perfect or print version is
completed. Members of the American Memory team and the staff of
NAL's Text Digitization Program (see below) also outlined a middle
ground concerning searchable texts. In the case of American Memory,
contractors produce texts with about 99-percent accuracy that serve as
"browse" or "reference" versions of written or printed originals. End
users who need faithful copies or perfect renditions must refer to
accompanying sets of digital facsimile images or consult copies of the
originals in a nearby library or archive. American Memory staff argued
that the high cost of producing 100-percent accurate copies would
prevent LC from offering access to large parts of its collections.
THE MACHINE-READABLE TEXT: METHODS OF
CONVERSION
Although the Workshop did not include a systematic examination of
the methods for converting texts from paper (or from facsimile images)
into machine-readable form, nevertheless, various speakers touched
upon this matter. For example, WEIBEL reported that OCLC has
experimented with a merging of multiple optical character recognition
systems that will reduce errors from an unacceptable rate of 5
characters out of every l,000 to an unacceptable rate of 2 characters out
of every l,000.
Pamela ANDRE presented an overview of NAL's Text Digitization
Program and Judith ZIDAR discussed the technical details. ZIDAR
explained how NAL purchased hardware and software capable of
performing optical character recognition (OCR) and text conversion

and used its own staff to convert texts. The process, ZIDAR said,
required extensive editing and project staff found themselves
considering alternatives, including rekeying and/or creating abstracts or
summaries of texts. NAL reckoned costs at $7 per page. By way of
contrast, Ricky ERWAY explained that American Memory had decided
from the start to contract out conversion to external service bureaus.
The criteria used to select these contractors were cost and quality of
results, as opposed to methods of conversion. ERWAY noted that
historical documents or books often do not lend themselves to OCR.
Bound materials represent a special problem. In her experience, quality
control--inspecting incoming materials, counting errors in
samples--posed the most time-consuming aspect of contracting out
conversion. ERWAY reckoned American Memory's costs at $4 per
page, but cautioned that fewer cost-elements had been included than in
NAL's figure.
OPTIONS FOR DISSEMINATION
The topic of dissemination proper emerged at various points during the
Workshop. At the session devoted to national and international
computer networks, LYNCH, Howard BESSER, Ronald LARSEN, and
Edwin BROWNRIGG highlighted the virtues of Internet today and of
the network that will evolve from Internet. Listeners could discern in
these narratives a vision of an information democracy in which millions
of citizens freely find and use what they need. LYNCH noted that a
lack of standards inhibits disseminating multimedia on the network, a
topic also discussed by BESSER. LARSEN addressed the issues of
network scalability and modularity and commented upon the difficulty
of anticipating the effects of growth in orders of magnitude.
BROWNRIGG talked about the ability of packet radio to provide
certain links in a network without the need for wiring. However, the
presenters also called attention to the shortcomings and incongruities of
present-day computer networks. For example: 1) Network use is
growing dramatically, but much network traffic consists of personal
communication (E-mail). 2) Large bodies of information are available,
but a user's ability to search across their entirety is limited. 3) There are
significant resources for science and technology, but few network
sources provide content in the humanities. 4) Machine-readable texts
are commonplace, but the capability of the system to deal with images

(let alone other media formats) lags behind. A glimpse of a multimedia
future for networks, however, was provided by Maria LEBRON in her
overview of the Online Journal of Current Clinical Trials (OJCCT), and
the process of scholarly publishing on-line.
The contrasting form of the CD-ROM disk was never systematically
analyzed, but attendees could glean an impression from several of the
show-and-tell presentations. The Perseus and American Memory
examples demonstrated recently published disks, while the descriptions
of the IBYCUS version of the Papers of George Washington and
Chadwyck-Healey's Patrologia Latina Database (PLD) told of disks to
come. According to Eric CALALUCA, PLD's principal focus has been
on converting Jacques-Paul Migne's definitive collection of Latin texts
to machine-readable form. Although everyone could share the network
advocates' enthusiasm for an on-line future, the possibility of rolling up
one's sleeves for a session with a CD-ROM containing both textual
materials and a powerful retrieval engine made the disk seem an
appealing vessel indeed. The overall discussion suggested that the
transition from CD-ROM to on-line networked access may prove far
slower and more difficult than has been anticipated.
WHO ARE THE USERS AND WHAT DO THEY DO?
Although concerned with the technicalities of production, the
Workshop never lost sight
Continue reading on your phone by scaning this QR Code

 / 81
Tip: The current page has been bookmarked automatically. If you wish to continue reading later, just open the Dertz Homepage, and click on the 'continue reading' link at the bottom of the page.