Wednesday, July 22, 2009

25,000 new words to search


The Library blog generally focuses on items in the Wellcome Library’s holdings, mentioning their catalogue descriptions in passing: it rarely features the unglamorous process of how those catalogue descriptions are created. Two new additions to the catalogue of archives and manuscripts, however, highlight this process and between them present over 25,000 more words to act as targets for readers searching the database.

When the Wellcome Library switched to describing its archives and manuscripts on a database, there were many printed and typed hard-copy catalogues to convert: descriptions of 424 collections of twentieth century papers and over 8000 manuscripts. A dedicated project converted manuscript descriptions and the majority of the modern papers (c.80%), and since then Library staff have been working quietly behind the scenes to deal with the remaining catalogues. We have recently passed 400 converted catalogues and currently stand near the 95% mark.

A recently-converted catalogue, the papers of the British Medical Association (SA/BMA), demonstrates the process and the gain to the reader. The basic text of the old catalogue is transferred to a spreadsheet, with one line per database entry and each column corresponding to a database field. This transfer can be done by retyping or by cut and paste, depending on the nature of the old catalogue. Once in the spreadsheet, it is manipulated: new fields are added such as reproduction conditions and language, and HTML tags are added to the old text so that it will display correctly when viewed on the internet (paragraph breaks and italicised text are the commonest additions). Since the references by which readers have known the material previously will not always enable the database to build the records into the hierarchical “tree” used in the archive catalogue, they have to be examined and a new, slightly different version created that will sit in the database invisible to the reader but generating the “tree”. Finally, all this data is loaded to the database.

Some of our old catalogues contained not merely lists of the items held but also detailed indexing – most notably of correspondents, where collections contained a large number of letters. These indexes, too, can be converted to database form and made available to the researcher. A recent example is the correspondence of Carlos Paton Blacker (1895-1975) (PP/CPB), psychiatrist and secretary to the Eugenics Society. In the course of his long and varied life Blacker generated enough correspondence for the index to come to 39 pages. By moving this to a spreadsheet and carrying out some manipulation there to sift the entries into groups, this bulk has been broken up into nine more manageable accumulations of data, one for each of the major sections of the collection.

For researchers, the net result is 25,000 more words in the archives database: words of all varieties, expected and unexpected, whose common factor is that all can act as targets for searches. As an example, the BMA papers' previous basic entry on the database said that they included subject files on medical issues arising in particular places or with reference to particular diseases, but the database now allows one to locate, should one wish, the files relating to Llwynipia in the Glamorgan coalfield and the health problems there; Major Stevens’s controversial “Umckaloabo”, an alleged cancer cure; or the intriguingly-named Overbeck Rejuvenator. Blacker’s correspondents, similarly, can now be located, from (taking the letter B as a sample) the poet Edmund Blunden to Boxing magazine and Lieutenant A.D.A. “Fido” Balfour.

These, of course, are just samples from the long-running process behind the scenes that populates the archive database. Other recently converted collections include the papers of the sex education pioneer Marie Stopes (PP/MCS), the medical historian and tuberculosis specialist Walter Pagel (PP/PAG), and the Research Defence Society (SA/RDS). Each is now fully searchable online in a way that is sure to bring to light new ideas for research, new and strange product names, and individuals that you never knew were in correspondence with each other.

The illustration shows work in progress on the British Medical Association catalogue.