ERPANET Case Study: Project Gutenberg ebook read online

of knowledge, art, music and culture.

Regulatory Environment
Project Gutenberg must adhere to U.S. laws involving operation as a not-for-profit corporation. However, these regulations are not sector specific. Project Gutenberg must be exceedingly careful to respect U.S. copyright laws regarding the works that they digitise and make available over the Internet. However, once a publication has been verified as being in the public domain, there are no other legal restrictions affecting Project Gutenberg.
Preservation Activity
Policies and Strategies
Project Gutenberg scans literary works and employs OCR technology to create eBooks. In some cases, eBooks are typed in by hand. The eBooks are then edited by a team of volunteer proof-readers. There are procedures and guidelines available online for volunteers to consult when scanning and editing texts for Project Gutenberg to ensure that all eBooks follow a standard format. Once the eBook has been produced, it is uploaded to two main servers. The eBook is made accessible via the official Project Gutenberg website and the Internet Archive site and on over thirty mirror sites around the world. As there are no access or distribution issues, Project Gutenberg encourages users to save copies of the eBooks to CD or DVD.
Project Gutenberg believes that by generating a multitude of versions - those stored on the main servers, on local servers (through mirror sites) and those downloaded to CD and DVD - will ensure that the bit stream of the literary work is preserved for access. This embodies the philosophy of the LOCKSS strategy. LOCKSS 'uses the caching technology of the web to collect pages of journals as they are published, allowing libraries to take physical custody of selected electronic titles they purchase'(11). LOCKSS was inspired by the words of Thomas Jefferson who said "let us save what remains: not by vaults and locks which fence them from the public eye and use in consigning them to the waste of time, but by such a multiplication of copies, as shall place them beyond the reach of accident." (12)
Selection
Project Gutenberg aims to make digitised versions of popular literature and reference materials in the public domain freely accessible to the general public. As copyright expires, publications can be freely replicated and distributed. Many of these works are out of print. By digitising the out of print works, Project Gutenberg feels that they are saving the publications from 'obscurity and ultimate oblivion'(13). Basically, all of the texts can be classified into three categories: light literature (such as Alice in Wonderland), heavy literature (such as Shakespeare and Dante) and references (such as Roget's Thesaurus). Mathematical and scientific works are also made available including the Human Genome. There are no real restrictions to what Project Gutenberg will make accessible. As long as the material is in the public domain, they can be digitised and submitted to Project Gutenberg. However, Project Gutenberg aims to benefit the widest possible audience and therefore prioritise the digitisation of popular literature and reference materials rather than extremely specialised works. Project Gutenberg already have texts in over 31 languages and are especially keen to increase their multilingual holdings.
Preservation
Project Gutenberg already has numerous plain text files that are 20-30 years old. In that time, many file formats have come and gone while plain text is still readable on virtually all computers. The use of plain text will also help to insure against future obsolescence. All Project Gutenberg eBooks are created as plain ASCII text files. This means that people with 'Apples and Ataris all the way to the old homebrew Z80 computers' (14) as well as Mac and UNIX users are all able to read the text files. Any open format can be submitted but the Project Gutenberg team will also generate plain ASCII (15) text files. Project Gutenberg encourages users to created new formats from the plain text files to suit their individual needs. Once the eBook has been generated and edited by volunteers, it is uploaded to two main servers. The first is the Project Gutenberg site itself and the other is the Internet Archive site. From this point, mirror sites can download the redundant files to their own sites and store them on their own servers.
Project Gutenberg uses the unique eBook number as the file name. Therefore, if the eBook is the 10001 plain text file created it will be named 10001. txt. Project Gutenberg will accept as many open file formats as volunteers are willing to submit, but will also generate a plain text version. Additional versions in other formats will be named accordingly but with different file extensions (e.g., html, pdf, xml). Each eBook has its own subdirectory that contains all versions of the eBook.
Project Gutenberg have volunteers representing a wide range of sectors (cultural heritage, government and higher education). Through these affiliations, they keep up to date with digital preservation developments. Project Gutenberg staff have

Continue reading on your phone by scaning this QR Code

Tip: The current page has been bookmarked automatically. If you wish to continue reading later, just open the Dertz Homepage, and click on the 'continue reading' link at the bottom of the page.

ERPANET Case Study: Project Gutenberg | Page 4