DESIRE Information Gateways Handbook
HomeTable of contentsAuthors-
Search | Help   
-2.4. Cataloguing

In this chapter...
 
  • describing Internet resources: cataloguing and metadata approaches
  • metadata formats and content rules
  • types of information needed by an information gateway
  • developing cataloguing guidelines for a gateway
  • cataloguing interfaces and maintenance
Introduction
 

The role of cataloguing rules or guidelines is to specify how the content of a metadata format is entered. Once a metadata format has been chosen, consideration should then be given to how this metadata should be entered into the information gateway database and a set of cataloguing rules prepared.

One of the key roles of Internet subject gateways is the creation of descriptive metadata about networked resources which can be used as a basis for searching and browsing the gateway. These descriptions can also help gateway users to identify whether the resources are really what they need, potentially saving them a considerable amount of time browsing through the limited amounts of information available elsewhere on the Internet (Sha, 1995, p. 467). Therefore, one of the most important (and time-consuming) activities for a subject gateway will be the provision of these descriptions. This is the activity generally known as 'cataloguing' and is one of the key tasks of any information gateway.


Background
 

Cataloguing can be defined as the creation of surrogate records which can be used to facilitate the identification, location, access and use of resources (Levy, 1995). These descriptions are usually created in accordance with certain standards (cataloguing rules and metadata formats) and will often include additional features such as classification, subject analysis and authority control (Dillon and Jul, 1996, p. 198, Bryant 1980). These tools and standards were originally developed for the cataloguing and indexing of traditional - mostly printed - collections. However, many of them have been revised to take account of resources based on newer technologies. Recent developments include:

1. ISBD(ER). In 1997, the IFLA Universal Bibliographic Control and International MARC Programme (UBCIM) published a revision of ISBD(CF) for 'Computer Files' for both online and offline 'Electronic Resources' (ISBD(ER), 1997; Sandberg-Fox and Byrum, 1998).

E X A M P L E

Web page description according to ISBD(ER)

Southampton Oceanography Centre [Electronic resource]. - Electronic interactive multimedia. -- [Southampton] : University of Southampton, Southampton Oceanography Centre, cop. 199?.
Mode of access: World Wide Web. URL: http://www.soc.soton.ac.uk/.
Title from title screen.
Summary: An introduction to the services provided by the Southampton Oceanography Centre - a joint venture between the University of Southampton and the Natural Environment Research Council. Includes information on internal departments and divisions, and the National Oceanographic Library.


2. USMARC 856 field - 'Electronic Location and Access'. The use of this field enables the encoding of enough information to locate and retrieve networked resources, including an URL (Network Development and MARC Standards Office, 1997). Field 856 has been implemented in other 'flavours' of MARC such as UNIMARC (Holt, 1998).

The use of the MARC formats for describing Internet resources has been extensively tested in North America, particularly through the work of a series of OCLC projects.

E X A M P L E

OCLC Internet projects

The OCLC Internet Resources project (1991-92), which resulted in the proposal for the USMARC 856 field (Dillon, et al., 1994).

  • The OCLC Internet Cataloging (InterCat) project (1994-96) to test the use of the USMARC format (including the 856 field) and AACR2 cataloguing rules for describing Internet resources.
  • InterCat: http://purl.org/net/intercat The Cooperative Online Resource Catalog (CORC) project (1998-). The project is exploring the co-operative creation and sharing of metadata by libraries. At the centre of CORC will be a catalogue containing Internet resource descriptions from a variety of sources. The project is also investigating automated methods for subject assignment, authority control and the conversion of metadata formats.
  • CORC: http://www.oclc.org/oclc/research/projects/corc/index.htm

Information gateways build upon these practices, but have a particular focus on developing cataloguing practices and technologies that are designed specifically to manage Internet resources, taking into account the unique features of these resources.

Gateways tend to opt for more flexible and less formal cataloguing solutions, using less complex metadata formats like Dublin Core. This is largely because these formats can be flexible and quick to respond to new developments in the ever-changing Internet environment. It also helps gateways to cope with the volatility of Internet resources - one of the key challenges in Internet cataloguing - as resources change, their associated records become out of date and require frequent updating. Information gateways have sought to develop relatively simple technologies and cataloguing procedures, which provide adequate descriptions but which also support the high level of maintenance that is required.

As Clifford Lynch (1997, p. 44) has commented, if the Internet is to continue to thrive as a new means of communication, 'something very much like traditional library services will be needed to organize, access and preserve networked information'. This article also comments that combining 'the skills of the librarian and the computer scientist may help organize the anarchy of the Internet'.


Cataloguing issues for information gateways
 

Information gateways, like libraries, need tools that facilitate the identification, location, access and use of resources; they have therefore developed (or adapted) tools that can be used for the descriptive cataloguing of Internet resources and their indexing. In this, information gateways have the distinct advantage that they can build upon the past century and a half of experience which libraries and other organisations have of the task of cataloguing. Information gateways need to work on the following:

  • metadata formats
  • types of descriptive information required
  • content standards and cataloguing rules
  • cataloguing tools and interfaces
  • catalogue maintenance

Metadata formats

Firstly, it must be noted that cataloguing issues are to some extent related to the decisions that information gateways need to make about metadata formats.

Cross reference
Metadata formats

That said, the use of a particular metadata format does not necessarily determine the adoption of any particular description standard or set of cataloguing rules. Formats such as Dublin Core, MARC or ROADS templates are merely frameworks into which data can be entered and by which it can retrieved. The role of cataloguing rules or guidelines is to specify how the content of this format is entered. For this reason, once a metadata format has been chosen, consideration should then be given to how this metadata should be entered into the information gateway database and a set of cataloguing rules prepared.

Types of descriptive information required by an information gateway

During the cataloguing process for an information gateway, a resource will first be identified and selected and then described in some standardised way. Typically, a description will record a variety of different types of information:

  1. Bibliographic-type descriptive information. This should include information primarily taken from the resource itself, including its title, its location (usually a URL) and the persons and organisations responsible for its content.
  2. Subject information. This would include any terms added from subject schemes, such as classification codes, terms from thesauri and subject heading lists as well as any keywords added by a cataloguer. More information can be found in the chapter on Classification.
  3. Administrative metadata. This includes any other information that may be useful to the management of the subject gateway. This may include information on individuals who selected or catalogued a given resource, the date that a catalogue record was created (or updated) and the dates when selected resources need to be reviewed.

Choosing content standards and developing cataloguing rules

Once a metadata format has been adopted and decisions have been taken on the particular information that resource descriptions need to contain, it is time to start the preparation of cataloguing rules or guidelines. Such guidelines can be as detailed (or not) as a particular gateway requires. In most cases, there will not be a requirement to develop rules as comprehensive as those in AACR2, for example, but cataloguing guidelines should often contain the following things:

  • a list of all possible data elements
  • a brief explanation of what particular information each element is supposed to hold
  • an explanation of how information should be entered into this element (the rule)
  • some guidelines on the use of formats for dates, language codes, etc.
  • notes of (and links to) external standards used, e.g. classification schemes, name authorities

Once developed, these guidelines can be distributed to those people who will be responsible for providing resource descriptions for the gateway.

E X A M P L E

ROADS Cataloguing Guidelines

The ROADS project has developed some cataloguing guidelines for the two most commonly used ROADS template types (SERVICE and DOCUMENT) which can be used as a framework for the development of cataloguing rules for new or existing information gateways (Day, 1998). These guidelines were adapted from existing practice (notably from guidelines developed by ADAM (Bradshaw, 1997) and SOSIG) and could be used as the basis for other gateways, whether based on ROADS tools or not.


Many of the decisions that need to be made relate to the particular formats that need to be used for things like dates, language codes or names.

Date formats

Dates tend to be important parts of content metadata. As well as being used to record the time when a resource was created or last modified, dates are also used to record administrative data about the metadata itself. For this reason, dates need to be entered in some agreed format so that they can be automatically processed by software. The main date formats currently in use are ISO 8601:1988 - as recommended for use in Dublin Core descriptions (Wolf and Wicksteed, 1996) - and the modified RFC 822 format used by ROADS templates (Deutsch, et al., 1994, p. 14):

  • ISO 8601:1988:
    1998-06-01
  • RFC 822 (as modified by RFC 1123):
    01 Jun 1998 12:00:00 GMT

Language codes

Resource descriptions tend to include an element recording the language of the intellectual content of a resource. Gateways could (and some do) record these by using the names of languages in full, e.g.:

  • Language: Portuguese
  • Language: Deutsch

However, natural language may not be the best way of recording this information. It would be difficult (if not impossible) for machines to be able to tell that, for example, the words 'Welsh' and 'Cymraeg' refer to the same language, or that the terms 'English' and 'Old English' refer to quite different ones. For these reasons, a number of standardised language codes have been proposed, usually based on either two or three letters (e.g. ISO 639-1:1988, RFC 1766). The best current candidate for language codes is the three-letter (known as 'Alpha-3') code ISO 639-2:1998 with more than 460 codes (Byrum, 1999):

  • ISO 639-2:1998
    Language: eng
    Language: emn

Name formats and authority files

Names are one of the more problematic areas for information gateway cataloguing rules to make decisions about content. There are (in general) two main ways in which personal names can be ordered:

  • Direct order:
    Author-Name-v1: Conrad Russell
    Author-Name-v1: R. Po-chia Hsia
  • Inverted order:
    Author-Name-v1: Russell, Conrad
    Author-Name-v1: Hsia, R. Po-chia

However, there are a number of variations that exist within each of these ways. There is a need for rules that deal with things like titles, pseudonyms and hyphenation. These can be extremely complex. Rules concerning 'headings for persons' in AACR2 (1988 rev.), for example, take up 54 pages. Similar rules for corporate bodies take up 41 pages. In addition, in some cases there will be a requirement to be able to distinguish between two persons (or organisations) with the same name. Rules like AACR2 usually achieve this by adding more information to the name itself, e.g. dates of birth and death and titles, with appropriate punctuation:

Author-Name-v1: Hsia, R. Po-chia, 1955-
Author-Name-v1: Newman, J. H. (John Henry), 1801-1890
Admin-Name-v1: University of Southampton

Libraries have considerable experience of dealing with names in catalogues, as can be attested by the extremely full treatment of name entries in codes such as AACR2. The sharing of bibliographic records between institutions has additionally led to the foundation of authoritative lists of names (i.e. verified access points) with cross-references, known as name authority files.

A number of name authority lists exist, mostly produced by national libraries or national bibliographic agencies, for example:

  • Library of Congress Name Authority File (LCNAF) - used by the majority of US libraries
  • British Library Name Authority File - originally created for the British National Bibliography (BNB) but also now used in the British Library's own catalogues
  • German-based name authority files include the Gemeinsame Körperschaftsdatei (GKD) for corporate body names and the Personennamendatei (PND) for personal names (Münnich, 1996)

At the present time name authority data tends to be national in origin, based on a variety of national formats and made available in a wide variety of ways, not always in electronic form. As one response to this problem, the AUTHOR project, funded by the Commission of the European Communities (DG XIII) as part of Computerised Bibliographic Record Actions (CoBRA), has investigated the feasibility of the international exchange of name authority data (Zillhardt and Bourdon, 1998).

If information gateways want to implement name authorities, the most logical place to start would be with the relevant national file, possibly supplemented by reference to LCNAF.

Authority files can also be used for things like geographical names or subjects. Indeed, the Library of Congress Subject Headings (LCSH) are probably the best example of a library-originated subject authority file.

Subject information

Subject information, in the form of keywords, classification scheme codes, subject heading terms and so on, forms an important part of the resource descriptions provided by information gateways. Subject information can form the basis of part of the search system, or - in the case of classification codes or terms from a subject hierarchy - can form part of the gateway's browse structure. As Vizine-Goetz (1998, p. 93) has said, the 'knowledge structures that form traditional classification schemes hold great potential for improving resource description and discovery on the Internet and for organising electronic document collections'. More information on these issues can be found in the chapter on Classification.

Any cataloguing guidelines developed for information gateways need to contain information on the selected (or adapted) subject schemes and documentation will be required so that terms from these schemes can be added at the cataloguing stage. This may require reference to the published scheme itself or a link to the selected part being implemented. So, for example, a gateway based on a limited implementation of the 21st edition of the Dewey Decimal Classification (DDC21) will need at least a list of all of the classification codes in use and their meaning. More detailed implementations may require the use of the published DDC21 manuals and the employment of suitably trained staff.

Cross reference
Subject indexing and classification

Cataloguing tools and interfaces

The creation of Internet resource descriptions for information gateways will largely take place via an interface or cataloguing tool. With some metadata formats it may be possible to create resource descriptions using text editors (e.g. for ROADS templates) or Web based tools (e.g. DC-dot for Dublin Core in HTML and RDF).

Ideally, however, information gateways need cataloguing interfaces that can be adapted for their particular needs, which contain, for example, the subject schemes used by that particular gateway as its default and including some help in the form of cataloguing rules and examples. In principle, it should be possible to embed most of the cataloguing rules developed for an information gateway inside the cataloguing interface. It should also be able to automatically validate certain elements (e.g. language codes or dates) before adding records to the database and to add certain administrative metadata.

Developing a catalogue interface, however, is a time-consuming and specialised task which is influenced by the choice of underlying software tools and metadata formats. The ROADS toolkit, for example, comes with a template editor which can be used for creating resource descriptions but this would in most cases require some customisation by the addition of guidelines for the use of subject schemes and other guidelines. Other metadata formats may have their own creation tools; for example, most MARC formats could be created using a proprietary library-based cataloguing interface.

Cross reference
User interface implementation

Catalogue maintenance

Another important factor that needs to be considered is the ongoing maintenance of the information gateway database. One of the characteristics of Internet information is that it is subject to rapid (and unadvertised) change. The content of Web pages can be frequently updated (not always for the better), their virtual locations (usually in the form of URLs) can change, and even IP addresses can expire or move to another - sometimes inappropriate - organisation. For these reasons, a considerable task for any information gateway is keeping its resource descriptions up to date. This will, in part, require the use of automated tools like link-checkers, but may also entail some periodic checking of information content (possibly based on 'expiry-date' administrative metadata or random sampling). In any case, resource descriptions will need to be periodically updated (or removed) and any cataloguing tools will need to facilitate this.

Cross reference
Collection management


Conclusions
 

As we have seen, the creation and maintenance of resource descriptions (or cataloguing) is an important part of the role of any information gateway. Gateways, therefore, need to consider in detail any cataloguing requirements that they have. This will mean decisions being made on:

  • content standards - these need to be developed, whether based on Internet cataloguing guidelines such as those produced by the ROADS project or on implementations of existing standard descriptive standards like ISBD(ER)
  • subject schemes - important for any browse interface to the gateway and for subject searching
  • cataloguing interfaces - to ease the creation of surrogate records by gateway staff or others
  • database maintenance issues - to ensure that the gateway's database is as up to date as possible

All of these decisions (and their associated activity) will require the input of specialised staff and considerable commitment in terms of time to produce (or adapt) some cataloguing guidelines, to implement a suitable cataloguing interface and to train those people who will carry out the cataloguing task itself. Of course, there are a growing number of gateways with experience of doing these things, so new gateways would be advised to build on this experience before developing new solutions.


Glossary
 

AACR2 - Anglo American Cataloguing Rules, 2nd edition
ADAM - Art, Design, Architecture & Media information gateway
BNB - British National Bibliography
CoBRA - Computerised Bibliographic Record Actions
CORC - OCLC Cooperative Online Resource Catalog project
DDC21 - Dewey Decimal Classification, 21st edition
GKD - Gemeinsame Körperschaftsdatei
IFLA - International Federation of Library Associations and Institutions
InterCat - OCLC Internet Cataloging project
ISBD - International Standard Bibliographic Description
ISBD(CF) - International Standard Bibliographic Description for Computer Files
ISBD(ER) - International Standard Bibliographic Description for Electronic Resources
ISO - International Standards Organisation
LCNAF - Library of Congress Name Authority File
LCSH - Library of Congress Subject Headings
MARC - Machine-Readable Cataloguing
OCLC - Online Computer Library Center
PND - Personennamendatei
RDF - Resource Description Framework
RFC - IETF Request for Comments
ROADS - Resource Organisation and Discovery in Subject-based services
SOSIG - Social Science Information Gateway
UBCIM - IFLA Universal Bibliographic Control and International MARC Programme
UNIMARC - Universal MARC format


References
 

CORC, http://www.oclc.org/oclc/research/projects/corc/index.htm

Intercat, http://purl.org/net/intercat

H. Alvestrand, RFC 1766, Tags for the identification of languages (Internet Engineering Task Force, Network Working Group, March 1995).
ftp://ftp.isi.edu/in-notes//rfc1766.txt

R. Braden, ed., RFC 1123, Requirements for Internet hosts - application and support (Internet Engineering Task Force, Network Working Group, October 1989).
ftp://ftp.isi.edu/in-notes//rfc1123.txt

R. Bradshaw, Cataloguing rules for the ADAM database: a procedural manual (ADAM, the Art, Design, Architecture & Media Information Gateway, 1997).
http://www.adam.ac.uk/adam/reports/cat/

P. Bryant, 'Progress in documentation: the catalogue', Journal of Documentation 36 (2) (1980), 133-163.

J. D. Byrum, 'ISO 639-1 and ISO 639-2: international standards for language codes. ISO 15924: international standard for names of scripts', 65th IFLA Council and General Conference, Bangkok, Thailand, 20-28 August 1999.
http://www.ifla.org/IV/ifla65/papers/099-155e.htm

D. H. Crocker (rev.), RFC 822, Standard for the format of ARPA Internet text messages (Internet Engineering Task Force, 13 August 1982).
ftp://ftp.isi.edu/in-notes//rfc822.txt

M. Day, ROADS cataloguing guidelines (Bath: UKOLN The UK Office for Library and Information Networking, 1998).
http://www.ukoln.ac.uk/roads/cataloguing/cataloguing-rules.html

P. Deutsch, A. Emtage, M. Koster & M. Stumpf, Publishing information on the Internet with Anonymous FTP (Internet Engineering Task Force Internet Draft, September 1994).
http://info.webcrawler.com/mak/projects/iafa/iafa.txt

M. Dillon & E. Jul, 'Cataloging Internet resources: the convergence of libraries and Internet resources', Cataloging & Classification Quarterly 22 (3/4) (1996), 197-238.

M. Dillon, E. Jul, M. Burge & C. Hickey, 'The OCLC Internet Resources Project: Toward Providing Library Services for Computer-Mediated Communication' in A. P. Bishop (ed.), Emerging communities: integrated networked information into library services (Urbana-Champaign, Ill.: University of Illinois at Urbana Champaign, Graduate School of Library and Information Science, 1994), 54-69.

M. Gorman & P. W. Winkler (ed.), Anglo-American Cataloguing Rules, 2nd ed. (Ottawa: Canadian Library Association; London: Library Association Publishing; Chicago, Ill.: American Library Association, 1988).

Guidelines for the Use of Field 856 (Network Development and MARC Standards Office, Washington, D.C.: Library of Congress, 1997).
http://lcweb.loc.gov/marc/856guide.html

B. Holt, 'Presentation of UNIMARC on the Web: new fields including the one for electronic resources', 64th IFLA General Conference, Amsterdam, Netherlands, 16-21 August 1998.
http://www.ifla.org/IV/ifla64/110-161e.htm

ISBD(ER) International Standard Bibliographic Description for Electronic Resources: revised from the ISBD(CF): International Standard Bibliographic Description for Computer Files (UBCIM publications, New Series, 17. Munich: Saur, 1997).

ISO 639-1:1988, Code for the representation of names of languages (Geneva: International Organisation for Standardization, 1988).

ISO 639-2:1998, Codes for the representation of names of languages - Part 2: Alpha-3 code (Geneva: International Organisation for Standardization, 1998).

ISO 8601:1988, Data elements and interchange formats - Information interchange - Representation of dates and times (Geneva: International Organisation for Standardization, 1988).

E. Jul, InterCat year-end statistics (E-mail to OCLC Internet Cataloging project list INTERCAT@oclc.org, 4 January 1999).

D. M. Levy, 'Cataloguing in the digital order', Digital Libraries '95: The Second Annual Conference on the Theory and Practice of Digital Libraries, Texas A & M University, Austin, Texas, USA, 11-13 June 1995.
http://csdl.tamu.edu/DL95/papers/levy/levy.html

C. Lynch, 'Searching the Internet', Scientific American 276 (3) (March 1997), 52-56.

M. Münnich, 'German authority work and control', Authority Control in the 21st Century, Online Computer Library Center (OCLC), Dublin, Ohio, 31 March-1 April 1996.
http://www.oclc.org/oclc/man/authconf/muennich.htm

N. B. Olson (ed.), Cataloging Internet resources: a manual and practical guide, 2nd. ed. (Dublin, Ohio: OCLC Online Computer Library Center, 1997).
http://www.purl.org/oclc/cataloging-internet

A. Sandberg-Fox & J. D. Byrum, 'From ISBD(CF) to ISBD(ER): process, policy, and provisions', Library Resources and Technical Services 42 (2) (1998), 89-101.

V. T. Sha, 'Cataloguing Internet resources: the library approach', The Electronic Library 13 (5) (1995), 467-476.

D. Vizine-Goetz, 'OCLC investigates using classification tools to organize Internet data', in P. A. Cochrane & E. H. Johnson (eds.), Visualizing subject access for 21st century information sources (Urbana-Champaign, Ill.: University of Illinois at Urbana Champaign, Graduate School of Library and Information Science,1998), 93-105.

M. Wolf & C. Wicksteed, Date and Time Formats (submission to World Wide Web Consortium (W3C), 15 September 1997).
http://www.w3.org/TR/NOTE-datetime-970915

S. Zillhardt & F. Bourdon, AUTHOR project: final report (Paris: Bibliothcque nationale de France, 5 June 1998).
http://www.bl.uk/information/author.pdf


Credits
 

Chapter author: Michael Day

With contributions from: Emma Place


<< P R E V I O U S 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 N E X T >>
  Go to the table of contents  

Return to:
Handbook Home
DESIRE Home
Search | Full Glossary | All References

Last updated : 20 April 00
Contact Us
© 1999-2000 DESIRE