Search | Help |
2.4. Cataloguing |
||||
|
Introduction
|
|
The role of cataloguing rules or guidelines is to specify how the content of a metadata format is entered. Once a metadata format has been chosen, consideration should then be given to how this metadata should be entered into the information gateway database and a set of cataloguing rules prepared. One of the key roles of Internet subject gateways is the creation of descriptive metadata about networked resources which can be used as a basis for searching and browsing the gateway. These descriptions can also help gateway users to identify whether the resources are really what they need, potentially saving them a considerable amount of time browsing through the limited amounts of information available elsewhere on the Internet (Sha, 1995, p. 467). Therefore, one of the most important (and time-consuming) activities for a subject gateway will be the provision of these descriptions. This is the activity generally known as 'cataloguing' and is one of the key tasks of any information gateway. |
Background
|
|||||||
Cataloguing can be defined as the creation of surrogate records which can be used to facilitate the identification, location, access and use of resources (Levy, 1995). These descriptions are usually created in accordance with certain standards (cataloguing rules and metadata formats) and will often include additional features such as classification, subject analysis and authority control (Dillon and Jul, 1996, p. 198, Bryant 1980). These tools and standards were originally developed for the cataloguing and indexing of traditional - mostly printed - collections. However, many of them have been revised to take account of resources based on newer technologies. Recent developments include: 1. ISBD(ER). In 1997, the IFLA Universal Bibliographic Control and International MARC Programme (UBCIM) published a revision of ISBD(CF) for 'Computer Files' for both online and offline 'Electronic Resources' (ISBD(ER), 1997; Sandberg-Fox and Byrum, 1998).
2. USMARC 856 field - 'Electronic Location and Access'. The use of this field enables the encoding of enough information to locate and retrieve networked resources, including an URL (Network Development and MARC Standards Office, 1997). Field 856 has been implemented in other 'flavours' of MARC such as UNIMARC (Holt, 1998). The use of the MARC formats for describing Internet resources has been extensively tested in North America, particularly through the work of a series of OCLC projects.
Information gateways build upon these practices, but have a particular focus on developing cataloguing practices and technologies that are designed specifically to manage Internet resources, taking into account the unique features of these resources. Gateways tend to opt for more flexible and less formal cataloguing solutions, using less complex metadata formats like Dublin Core. This is largely because these formats can be flexible and quick to respond to new developments in the ever-changing Internet environment. It also helps gateways to cope with the volatility of Internet resources - one of the key challenges in Internet cataloguing - as resources change, their associated records become out of date and require frequent updating. Information gateways have sought to develop relatively simple technologies and cataloguing procedures, which provide adequate descriptions but which also support the high level of maintenance that is required. As Clifford Lynch (1997, p. 44) has commented, if the Internet is to continue to thrive as a new means of communication, 'something very much like traditional library services will be needed to organize, access and preserve networked information'. This article also comments that combining 'the skills of the librarian and the computer scientist may help organize the anarchy of the Internet'. |
Cataloguing issues for information gateways
|
||||
Information gateways, like libraries, need tools that facilitate the identification, location, access and use of resources; they have therefore developed (or adapted) tools that can be used for the descriptive cataloguing of Internet resources and their indexing. In this, information gateways have the distinct advantage that they can build upon the past century and a half of experience which libraries and other organisations have of the task of cataloguing. Information gateways need to work on the following:
Metadata formats Firstly, it must be noted that cataloguing issues are to some extent related to the decisions that information gateways need to make about metadata formats. That said, the use of a particular metadata format does not necessarily determine the adoption of any particular description standard or set of cataloguing rules. Formats such as Dublin Core, MARC or ROADS templates are merely frameworks into which data can be entered and by which it can retrieved. The role of cataloguing rules or guidelines is to specify how the content of this format is entered. For this reason, once a metadata format has been chosen, consideration should then be given to how this metadata should be entered into the information gateway database and a set of cataloguing rules prepared. Types of descriptive information required by an information gateway During the cataloguing process for an information gateway, a resource will first be identified and selected and then described in some standardised way. Typically, a description will record a variety of different types of information:
Choosing content standards and developing cataloguing rules Once a metadata format has been adopted and decisions have been taken on the particular information that resource descriptions need to contain, it is time to start the preparation of cataloguing rules or guidelines. Such guidelines can be as detailed (or not) as a particular gateway requires. In most cases, there will not be a requirement to develop rules as comprehensive as those in AACR2, for example, but cataloguing guidelines should often contain the following things:
Once developed, these guidelines can be distributed to those people who will be responsible for providing resource descriptions for the gateway.
Many of the decisions that need to be made relate to the particular formats that need to be used for things like dates, language codes or names. Date formats Dates tend to be important parts of content metadata. As well as being used to record the time when a resource was created or last modified, dates are also used to record administrative data about the metadata itself. For this reason, dates need to be entered in some agreed format so that they can be automatically processed by software. The main date formats currently in use are ISO 8601:1988 - as recommended for use in Dublin Core descriptions (Wolf and Wicksteed, 1996) - and the modified RFC 822 format used by ROADS templates (Deutsch, et al., 1994, p. 14):
Language codes Resource descriptions tend to include an element recording the language of the intellectual content of a resource. Gateways could (and some do) record these by using the names of languages in full, e.g.:
However, natural language may not be the best way of recording this information. It would be difficult (if not impossible) for machines to be able to tell that, for example, the words 'Welsh' and 'Cymraeg' refer to the same language, or that the terms 'English' and 'Old English' refer to quite different ones. For these reasons, a number of standardised language codes have been proposed, usually based on either two or three letters (e.g. ISO 639-1:1988, RFC 1766). The best current candidate for language codes is the three-letter (known as 'Alpha-3') code ISO 639-2:1998 with more than 460 codes (Byrum, 1999):
Name formats and authority files Names are one of the more problematic areas for information gateway cataloguing rules to make decisions about content. There are (in general) two main ways in which personal names can be ordered:
However, there are a number of variations that exist within each of these ways. There is a need for rules that deal with things like titles, pseudonyms and hyphenation. These can be extremely complex. Rules concerning 'headings for persons' in AACR2 (1988 rev.), for example, take up 54 pages. Similar rules for corporate bodies take up 41 pages. In addition, in some cases there will be a requirement to be able to distinguish between two persons (or organisations) with the same name. Rules like AACR2 usually achieve this by adding more information to the name itself, e.g. dates of birth and death and titles, with appropriate punctuation: Author-Name-v1: Hsia, R. Po-chia, 1955- Libraries have considerable experience of dealing with names in catalogues, as can be attested by the extremely full treatment of name entries in codes such as AACR2. The sharing of bibliographic records between institutions has additionally led to the foundation of authoritative lists of names (i.e. verified access points) with cross-references, known as name authority files. A number of name authority lists exist, mostly produced by national libraries or national bibliographic agencies, for example:
At the present time name authority data tends to be national in origin, based on a variety of national formats and made available in a wide variety of ways, not always in electronic form. As one response to this problem, the AUTHOR project, funded by the Commission of the European Communities (DG XIII) as part of Computerised Bibliographic Record Actions (CoBRA), has investigated the feasibility of the international exchange of name authority data (Zillhardt and Bourdon, 1998). If information gateways want to implement name authorities, the most logical place to start would be with the relevant national file, possibly supplemented by reference to LCNAF. Authority files can also be used for things like geographical names or subjects. Indeed, the Library of Congress Subject Headings (LCSH) are probably the best example of a library-originated subject authority file. Subject information Subject information, in the form of keywords, classification scheme codes, subject heading terms and so on, forms an important part of the resource descriptions provided by information gateways. Subject information can form the basis of part of the search system, or - in the case of classification codes or terms from a subject hierarchy - can form part of the gateway's browse structure. As Vizine-Goetz (1998, p. 93) has said, the 'knowledge structures that form traditional classification schemes hold great potential for improving resource description and discovery on the Internet and for organising electronic document collections'. More information on these issues can be found in the chapter on Classification. Any cataloguing guidelines developed for information gateways need to contain information on the selected (or adapted) subject schemes and documentation will be required so that terms from these schemes can be added at the cataloguing stage. This may require reference to the published scheme itself or a link to the selected part being implemented. So, for example, a gateway based on a limited implementation of the 21st edition of the Dewey Decimal Classification (DDC21) will need at least a list of all of the classification codes in use and their meaning. More detailed implementations may require the use of the published DDC21 manuals and the employment of suitably trained staff.
Cataloguing tools and interfaces The creation of Internet resource descriptions for information gateways will largely take place via an interface or cataloguing tool. With some metadata formats it may be possible to create resource descriptions using text editors (e.g. for ROADS templates) or Web based tools (e.g. DC-dot for Dublin Core in HTML and RDF). Ideally, however, information gateways need cataloguing interfaces that can be adapted for their particular needs, which contain, for example, the subject schemes used by that particular gateway as its default and including some help in the form of cataloguing rules and examples. In principle, it should be possible to embed most of the cataloguing rules developed for an information gateway inside the cataloguing interface. It should also be able to automatically validate certain elements (e.g. language codes or dates) before adding records to the database and to add certain administrative metadata. Developing a catalogue interface, however, is a time-consuming and specialised task which is influenced by the choice of underlying software tools and metadata formats. The ROADS toolkit, for example, comes with a template editor which can be used for creating resource descriptions but this would in most cases require some customisation by the addition of guidelines for the use of subject schemes and other guidelines. Other metadata formats may have their own creation tools; for example, most MARC formats could be created using a proprietary library-based cataloguing interface. Catalogue maintenance Another important factor that needs to be considered is the ongoing maintenance of the information gateway database. One of the characteristics of Internet information is that it is subject to rapid (and unadvertised) change. The content of Web pages can be frequently updated (not always for the better), their virtual locations (usually in the form of URLs) can change, and even IP addresses can expire or move to another - sometimes inappropriate - organisation. For these reasons, a considerable task for any information gateway is keeping its resource descriptions up to date. This will, in part, require the use of automated tools like link-checkers, but may also entail some periodic checking of information content (possibly based on 'expiry-date' administrative metadata or random sampling). In any case, resource descriptions will need to be periodically updated (or removed) and any cataloguing tools will need to facilitate this. |
Conclusions
|
|
As we have seen, the creation and maintenance of resource descriptions (or cataloguing) is an important part of the role of any information gateway. Gateways, therefore, need to consider in detail any cataloguing requirements that they have. This will mean decisions being made on:
All of these decisions (and their associated activity) will require the input of specialised staff and considerable commitment in terms of time to produce (or adapt) some cataloguing guidelines, to implement a suitable cataloguing interface and to train those people who will carry out the cataloguing task itself. Of course, there are a growing number of gateways with experience of doing these things, so new gateways would be advised to build on this experience before developing new solutions. |
Glossary
|
|
AACR2 - Anglo American Cataloguing Rules, 2nd edition |
References
|
|
CORC, http://www.oclc.org/oclc/research/projects/corc/index.htm Intercat, http://purl.org/net/intercat H. Alvestrand, RFC 1766, Tags for the identification of languages (Internet Engineering Task Force, Network Working Group, March 1995). R. Braden, ed., RFC 1123, Requirements for Internet hosts - application and support (Internet Engineering Task Force, Network Working Group, October 1989). R. Bradshaw, Cataloguing rules for the ADAM database: a procedural manual (ADAM, the Art, Design, Architecture & Media Information Gateway, 1997). P. Bryant, 'Progress in documentation: the catalogue', Journal of Documentation 36 (2) (1980), 133-163. J. D. Byrum, 'ISO 639-1 and ISO 639-2: international standards for language codes. ISO 15924: international standard for names of scripts', 65th IFLA Council and General Conference, Bangkok, Thailand, 20-28 August 1999. D. H. Crocker (rev.), RFC 822, Standard for the format of ARPA Internet text messages (Internet Engineering Task Force, 13 August 1982). M. Day, ROADS cataloguing guidelines (Bath: UKOLN The UK Office for Library and Information Networking, 1998). P. Deutsch, A. Emtage, M. Koster & M. Stumpf, Publishing information on the Internet with Anonymous FTP (Internet Engineering Task Force Internet Draft, September 1994). M. Dillon & E. Jul, 'Cataloging Internet resources: the convergence of libraries and Internet resources', Cataloging & Classification Quarterly 22 (3/4) (1996), 197-238. M. Dillon, E. Jul, M. Burge & C. Hickey, 'The OCLC Internet Resources Project: Toward Providing Library Services for Computer-Mediated Communication' in A. P. Bishop (ed.), Emerging communities: integrated networked information into library services (Urbana-Champaign, Ill.: University of Illinois at Urbana Champaign, Graduate School of Library and Information Science, 1994), 54-69. M. Gorman & P. W. Winkler (ed.), Anglo-American Cataloguing Rules, 2nd ed. (Ottawa: Canadian Library Association; London: Library Association Publishing; Chicago, Ill.: American Library Association, 1988). Guidelines for the Use of Field 856 (Network Development and MARC Standards Office, Washington, D.C.: Library of Congress, 1997). B. Holt, 'Presentation of UNIMARC on the Web: new fields including the one for electronic resources', 64th IFLA General Conference, Amsterdam, Netherlands, 16-21 August 1998. ISBD(ER) International Standard Bibliographic Description for Electronic Resources: revised from the ISBD(CF): International Standard Bibliographic Description for Computer Files (UBCIM publications, New Series, 17. Munich: Saur, 1997). ISO 639-1:1988, Code for the representation of names of languages (Geneva: International Organisation for Standardization, 1988). ISO 639-2:1998, Codes for the representation of names of languages - Part 2: Alpha-3 code (Geneva: International Organisation for Standardization, 1998). ISO 8601:1988, Data elements and interchange formats - Information interchange - Representation of dates and times (Geneva: International Organisation for Standardization, 1988). E. Jul, InterCat year-end statistics (E-mail to OCLC Internet Cataloging project list INTERCAT@oclc.org, 4 January 1999). D. M. Levy, 'Cataloguing in the digital order', Digital Libraries '95: The Second Annual Conference on the Theory and Practice of Digital Libraries, Texas A & M University, Austin, Texas, USA, 11-13 June 1995. C. Lynch, 'Searching the Internet', Scientific American 276 (3) (March 1997), 52-56. M. Münnich, 'German authority work and control', Authority Control in the 21st Century, Online Computer Library Center (OCLC), Dublin, Ohio, 31 March-1 April 1996. N. B. Olson (ed.), Cataloging Internet resources: a manual and practical guide, 2nd. ed. (Dublin, Ohio: OCLC Online Computer Library Center, 1997). A. Sandberg-Fox & J. D. Byrum, 'From ISBD(CF) to ISBD(ER): process, policy, and provisions', Library Resources and Technical Services 42 (2) (1998), 89-101. V. T. Sha, 'Cataloguing Internet resources: the library approach', The Electronic Library 13 (5) (1995), 467-476. D. Vizine-Goetz, 'OCLC investigates using classification tools to organize Internet data', in P. A. Cochrane & E. H. Johnson (eds.), Visualizing subject access for 21st century information sources (Urbana-Champaign, Ill.: University of Illinois at Urbana Champaign, Graduate School of Library and Information Science,1998), 93-105. M. Wolf & C. Wicksteed, Date and Time Formats (submission to World Wide Web Consortium (W3C), 15 September 1997). S. Zillhardt & F. Bourdon, AUTHOR project: final report (Paris: Bibliothcque nationale de France, 5 June 1998). |
Credits
|
|
Chapter author: Michael Day |
<< P R E V I O U S | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | N E X T >> |
Go to the table of contents |
Return to: Handbook Home DESIRE Home |
Search | Full Glossary | All References Last updated : 20 April 00 |
Contact Us © 1999-2000 DESIRE |