Newsletter No. 53 (December 2008)

Black clouds and silver linings: a rare chance to catalogue a backlog

Renata Osborne
Manager, Menzies Precinct
Australian National University Library

In 2007, the East Asia section of the Australian National University (ANU) Library undertook a major project to catalogue an estimated 25,000 volumes of hitherto uncatalogued Chinese language books. This cataloguing backlog was the result of a period of very active (and inventive) acquisition under the guidance of former ANU East Asia librarians YS Chan and Susan Prentice in the mid- 1970s and early 1980s. The rate of buying far outpaced the capacity of the cataloguing staff to create full bibliographic records, an understandable situation given the lack of sources for copy-cataloguing and the labour-intensive nature of East Asian manual cataloguing before the age of online catalogue records containing CJK scripts and down-line loading. This was especially the case for Chinese imprints of that period given the geo-political situation of the time and the challenges in acquiring library resources from China.

Faced with this growing backlog an innovative decision was taken in the mid-1980s to place the uncatalogued books on open shelves and make them accessible to the public, a decision that was welcomed by the users. The backlog of thousands of volumes was divided into 8 subject categories and given the collective name China New Book Collection before being made available to the public. While the name China New Book Collection was appropriate in the early years, over time the adjective "new" became a subject of irony and comment. An attempt was made in the late 1990s to 'rectify' the name. The China New Book Collection was closed off and renamed the China Book Collection, and a new sequence started of genuinely "new", that is, recently purchased, books. It was the older section of the uncatalogued works, the renamed China Book Collection, which was catalogued in the 2007 project.

The reason, or opportunity, for cataloguing the China Book Collection came from quite a different direction than difficulties with nomenclature. In 2006/07 the ANU Library undertook a Collection Relocation Project to move some 28,000 metres of material from campus libraries to an off-site storage (the Library Print Repository) in order to relieve massive overcrowding in library buildings and meet the need to provide more library user space on campus. The RG Menzies Building, home of the East Asian collections, was one of the more crowded buildings with a shelf-occupancy of over 91%.

It was in this context that the China Book Collection was identified as a possible candidate for relocation to the Library Print Repository, on the proviso that the collection was catalogued online before being taken off-site. The Collection Relocation Project was sufficiently funded by the University to accommodate the cost of this cataloguing exercise, and represented a windfall opportunity which, possibly, would not have presented itself again for a very long time.

In taking the decision to relocate the China Book Collection, the Library felt that the advantages of cataloguing this collection and locating it off-site, outweighed the disadvantages of keeping the collection on-site but with no catalogue access, and with little prospect of being catalogued in the foreseeable future. The actual card catalogue was already a relic. This elegant and evocative piece of furniture was being utilised by a shrinking band of people, primarily East Asia library staff or older scholars schooled in the complexities of the card catalogue. Changed library patron habits meant a growing and decided preference for online resources, including library catalogues, and this preference resulted in the card catalogue being ignored, even when patrons knew there was material in the card catalogue they would not find in the online version.

There were specific CJK factors too that militated against use of the card catalogue, chief among them being the Wade-Giles romanisation system which was virtually unknown to the vast majority of library users. Even if the Wade-Giles system were not a barrier, the card catalogue records for the China Book Collection were very basic acquisition-level records that only offered romanised title access.

By contrast, full or even low-level MARC records would provide multiple access points, in both romanisation and Chinese characters, and would extend bibliographic access to this collection to users outside the Menzies Building. If off-site storage for these works is viewed by some as a black cloud, then surely, improved bibliographic access can be seen as a silver lining.

The Project

The timeframe for this project was extremely tight given the Library's overall schedules for the relocation to off-site storage. We had approximately 4 months to recruit staff and catalogue an estimated 25,000 volumes. The tight timeframe also dictated the processes adopted for this exercise.

In order to meet our deadline, we employed a team of eight staff. Fortunately, the timing of the project coincided with the end of the academic year, when students were free from class and searching for summer jobs. We were able to recruit eight graduates and post-graduates who had the requisite Chinese language skills, biaozhun Mandarin and excellent pinyin. None had library work experience. The team-of-eight proved to be quick learners and developed into a dedicated and reliable unit. They were given basic training in use of the Millenium system, MARC format, and copy-cataloguing processes and were issued with a Pinyin/Wade-Giles conversion table. Within a couple of weeks the team was achieving a record-entry rate of 11 bibliographic records and 14 holdings records per hour, peaking at 14 bibliographic and 17 holdings records per hour. Their output exceeded our expectations.

At the end of the four months it was necessary to extend the project by another ten weeks (but with reduced personnel) in order to complete it. As with other projects of this scale we are, to this day, continuing to discover small blocks of unconverted material or cataloguing inconsistencies that need attention and this "sweeping-up" work is still taking place.

The Process

Given the tight timeframe, we decided to use the China Book Collection shelf-list as the basis for copy-cataloguing and to avoid recourse to the physical item as much as possible. Not only would working from the physical item have been extremely time-consuming, it would also have required a degree of cataloguing skills that our newly recruited team of novice copy-cataloguers did not possess. Fortunately, for the most part, the acquisition slips in the shelf-list provided sufficient information for copy-cataloguing, namely main author, title and publisher (all in Chinese characters), title repeated in pinyin, date of publication, and, very importantly, pagination.

A hierarchy of cataloguing sources was established based on anticipated hit-rates for copy records. OCLC WorldCat was at the top of the hierarchy, followed by a group of catalogues that included Hong Kong university libraries, Libraries Australia and the Library of Congress catalogue. If no copy was found in any of these catalogues, the information on the acquisition slips was entered into the library system. After some weeks, as individual strengths and preferences emerged among the eight project staff, a degree of specialisation was introduced into the workflow. Some staff concentrated on searching OCLC, while others searched the second group of catalogues, and yet others concentrated on keying records from scratch.

Apart from the occasional need to consult the physical item to verify catalogue data, there was very little handling of books. Any cross-checking against the physical item was carried out by permanent staff members who had the necessary cataloguing expertise to verify bibliographic data. Similarly, other problems such as duplicate holdings, duplicate records or scattered series holdings, were also referred to permanent staff members to resolve.

We decided early not to classify the books but retain them on the shelves in accession number order. Classifying the books would have meant an unthinkable amount of relabelling and interfiling. In addition, although the Library Print Repository was open to browsing by ANU users, we did not expect a high demand for on-site browsing and therefore the benefit of a classified shelf order would not have been sufficiently utilised. The China Book Collection therefore was shelved in accession number order, within the original eight subject categories.

Unfortunately, one drawback in relation to the eight subject categories could not be remedied in the time we had available for this project. The great majority of the China Book shelf-list cards did not specify the subject category to which an item was assigned. This information was not added to the shelf-list when the eight subject categories were created, and the project staff did not have time to do so retrospectively. As a result, this information is also missing from the online records. The lack of this data may present some retrieval difficulties for staff working in the Library Print Repository.


The China Book Project promised several interesting and much-anticipated outcomes for the Library besides improving bibliographic access to the collection. We looked forward to discovering the precise size of the collection and gaining a better indication of its content. We also hoped, through the copy-cataloguing process of checking major East Asian catalogues, to obtain an indication of the uniqueness (or otherwise) of our holdings. The prospect of the latter was of special interest since we expected a portion of the China Book Collection to contain Chinese publications from the 1970s and early 1980s, a period when purchasing books from China was restricted.

It goes without saying that any analysis of content based purely on catalogue data in a large-scale project such as this one will produce very general statistical results with a degree of error. Nevertheless, even this ordinary information can give a useful indication of the content within a collection that was not previously available.

Below are some statistics of the China Book Collection obtained as a result of the cataloguing project.

1. Size of Collection

We discovered that there were more items in the China Book Collection than previously estimated. The table below show the number of books in each of the 8 subject categories. However, unfortunately, we have no subject category information for 87% of the books in the collection because this information is missing from the catalogue record.

Table 1. Subject categories within the China Book Collection

Subject Categories No. of Titles No. of Volumes
A: Classics, Philosophy, Religion 160 165
B: History (Pre-1900) and Geography 410 473
C: Modern History (1900-1949) 297 354
D: Social Sciences 403 467
E: Contemporary China (1949- ) 1066 1079
F: Language & Literature (Pre 20th century) 196 276
G: Language & Literature (since 1900) 340 351
H: Art, Science, Technology & Others 230 247
A to H* 21,272 23,383
TOTAL 24,374 26,795
" A-H denotes unspecified subject section    

2. Imprint

The statistics on "country of publication" were drawn from data in the 008 MARC field. A relatively small number of records did not contain this information. As expected, the majority of publications are from the People's Republic of China (PRC). There were surprisingly few Hong Kong publications, and I have included the 150 Hong Kong publications in the figures for the PRC.

Table 2. Analysis based on Country of Publication

Country of Publication No. of Titles %
China -- pre 1949 877 3.6
China -- PRC (including Hong Kong) 17,462 71.6
Taiwan 4,706 19.3
Unknown (no data available) 1,178 4.8
Others (including Singapore, Malaysia) 151 0.7
Total 24,374 100

An examination of imprint date within PRC publications (taken from the 260 MARC field) was also carried out. The date categories in the table below correspond broadly to political events in China's recent history. As seen from the table, the bulk of acquisitions fall within the post-Cultural Revolution period of 1977-1989 (58.9%), with a fairly small number falling within the Cultural Revolution period of 1966-76 (8.7%). It was a surprise to discover there were 42 Qing dynasty publications in this collection including three works published in the 18th century.

Table 3.  Analysis based on date of publication (PRC imprints)

Publication Date Range No. of titles No. of titles %
Pre-1911 42 42  
1911-1949 2,039 2,039 11.7
1950-1959 1,085 1,559 8.9
1960-1965 474
1966-1976 1,521 1,521 8.7
1977-1979 2,717 10,279 58.9
1980-1989 7,562
1990- * 2,022 2,022 11.6
TOTAL 17,462    

* most of the post 1990 publications are not in this collection, hence the low number of titles recorded

3. Subject Analysis

The most difficult and ultimately unsatisfactory analysis of this collection is that of subject coverage. The two data elements from which this information can be gleaned are the Library of Congress (LC) classification and Library of Congress (LC) subject headings assigned to the works.

Of the 21,262 downloaded records, 17,471 (82%) records contained LC subject headings. This is a high proportion of records and represents a major improvement to content discoverability.

LC classification (based on data in MARC field 050) was available for 11,560 records (54%). LC classification, unless attached to holdings records, does not directly benefit the users, but it does allow one method of analysing subject content.

A review of LC classifications found in the records reveals that the bulk of titles relate to literature or history. The table below gives a further breakdown of the works found within those two subject areas.

Unfortunately, since only 54% of the downloaded records contain LC classification information, this analysis applies to approximately half of the China Book Collection only. However, even this narrow glimpse into the collection highlights some interesting features, such as the strength in literary works of post-1949 authors (which accounts for 28% of total works within literature class), or the strength in local histories (24% of total).

Table 4. Analysis of subject matter based on LC call number

LC Classification No. of Titles %
B      Religion 482 4.2
DS    History 2,964 25.6
         Details within DS  'History'    
         -  early to Pre-Ming dynasty 442  
         -  Ming to Qing 260  
         -  Republic to 1949 500  
         -  1949+ 460  
         -   Local Histories 700  
         -   Ethnology, minorities 200  
H      Social Sciences 1,006 8.7
M-N  Music and Art 604 5.2
PL-PZ   Language & Literature 4,173 36.1
            Details within PL-PZ 'Language & Literature'    
         -  Chinese language, dialects 450  
         -  Chinese literature: history & criticism (includes all periods and forms) 1,130  
         -  Literary forms (drama, poetry, fables, etc.) 300  
         -  Juvenile literature 100  
         -  Individual authors (early - end of Ming dynasty) 320  
         -  Individual authors (Qing) 150  
         -  Individual authors (Republican) 360  
         -  Individual authors (1949+) 1,170  
Q-T   Science & Technology 691 6
Others 1,640 14.2
TOTAL 11,560  

A very high percentage of downloaded records contained subject headings (84%) which would have provided a wealth of detail on subject coverage. However, an analysis of LC subject terms would have been extremely difficult because of the multitude of subject terms used and therefore such an analysis was not attempted.

4. Source of Catalogue Records

OCLC was the main source of catalogue records and accounted for 79% (19,294) of all newly added records. The second group of databases accounted for another 9% (2,245) while in-house keying accounted for 11.6% (2,835) of all newly added records. A large number of the records had to be keyed into the catalogue because there was insufficient bibliographic information on the shelf list cards to verify the publication. In many instances there was only a romanised title entry on the shelf list card, and given our tight project time-frame we were unable to check the books themselves. Hence, very simple entries were made of approximately 500 works for which we could not adequately find copy records.

Table 5.  Source of copy-catalogue records

Source of copy No. of Titles %
OCLC 19,294 79.2
Other databases * 2,245 9.2
No copy - keyed in-house 2,835 11.6
            (very brief title only entries)  (500)  
TOTAL 24,374 100
*Includes: Libraries Australia, Library of Congress, selected catalogues

From the start, one of the more keenly anticipated outcomes from this project was to discover what proportion of titles in the China Book Collection was not duplicated in other major East Asian libraries outside the PRC. Although it is not possible to conclude that the absence of catalogue records indicates a lack of holdings, this figure will nevertheless provide a general impression of the number of Chinese publications held at ANU Library that are not widely held elsewhere.

Using this somewhat imperfect premise, it is possible to say that there are approximately 2,000- 2,800 titles in the China Book Collection that are not widely available in libraries outside the PRC. It is this group of titles that warrant closer examination, and unfortunately, it is this group for which we have the least information, a situation that can only be remedied by another cataloguing project.

Of the 2,835 records keyed in-house, 35% (996 titles) was published in the period 1977-1989, and 15.7% (445 titles) was published in the Cultural Revolution period of 1966-76. Seventeen titles were pre-1900 publications.


The China Book Collection was the last major block of materials in the Library's East Asia collections that was not represented in the online catalogue. The completion of the China Book Collection cataloguing project means that approximately 98% of the ANU Library's East Asia collection is now discoverable through the catalogue. This represents a major achievement, and one brought about through a seemingly unrelated event, the introduction of off-site storage.

Although the decision to relocate CJK material off-site was not popular with many East Asia scholars in our user community who were concerned at the loss of browsing, there is already evidence to show that the China Book Collection is still being used. Books from this collection have been requested and borrowed from the Library Print Repository through the book retrieval service. It will be a while however before more meaningful usage statistics are available.

The results of the bibliometric analyses by and large confirmed our previous "guesstimates" regarding size, date and place of publication, and general subject coverage of this collection. However, the cataloguing project provided quantitative data and additional details of the collection, especially in terms of subject matter, publication dates and the duplication (or lack) of holdings with other libraries. The one major drawback is the lack of information on the very books that are not duplicated elsewhere, and which may be the very works of most interest to some researchers on contemporary China.

Nevertheless, the value of adding over 24,000 full catalogue records to the online system, 72% of which offer subject access, is considerable. There is now bibliographic access to a large collection of books on China not previously available. The trade-off between convenient on-site browsability for a relatively small number of local users versus online searchability for a large group of remote users is one that should pay dividends in an age where seeking information through online sources is fast becoming the established and preferred norm for researchers.

| Top of page | Back to EALRGA home page | Back to newsletter index |