General advice on searching databases

Selecting an appropriate database

When searching for bibliographic information, selection of an appropriate database is probably the most important requisite for success. Each database has its specific quantitative and content selection criteria. No single database in the world covers all published literature (and certainly not all 'grey' literature, such as internal reports).
When considering a database the following criteria might help you evaluate its appropriateness for your specific research topic:

  • To what extent does the database cover your topic? E.g. it would not make much sense to search for exhaustive information on tuberculosis in the Ebola and Marburg Virus Disease Literature database.
  • How extensive is the database? E.g. the number of records, the number of source items, the number of years covered. When searching for the immunological aspects of malaria Medline or CAB Health are far more appropriate than the highly selective Tropical Endemic Diseases Control database.
  • How up-to-date is the database? E.g. last update, update frequency, usual backlog. Recently published articles may feature in CCOD, but will generally not be included in Medline until a few months later.
  • What types of publications are included? e.g. Medline features journal articles only; Tropical Endemic Diseases Control, Health Care in Developing Countries and other databases produced by the ITG Library also contain references to books, book chapters, abstracts and unpublished documents or reports.
  • How much overlap is there between the database and your library? This criterion could be useful when, instead of an exhaustive list, you are merely looking for a limited number of relevant publications which are immediately available.
  • Most answers to these questions can be found in the database descriptions:
  • Catalogs produced by the ITG library
  • Databases produced by the ITG library
  • International databases subscribed to by the ITG library

Searching for journal titles

Retrieving journal titles may pose some problems:

  • Some databases provide only journal abbreviations, while others offer full titles or both. If only the abbreviation is available, it may be difficult to find the original article. If only the full title is provided, it may be difficult to find the correct abbreviation when using the article in a reference list. A list with abbreviations and full titles of Medline journals is available in the library. The printed periodicals catalogue also provides 'standard' journal abbreviations. Abbreviations for journals missing in both lists can be constructed using the analogy principle.
  • Some systems allow for truncation, while others will only accept the literally correct journal name format. In this case any misspelling may ruin your retrieval session.

Therefore, alphabetical indexes are especially useful for searching journal titles. Many systems have field-specific indexes, so all available journal names can be viewed. If these are not available, a general (free-text) index probably will be.

Searching for authors

Retrieving author names may also pose some problems:

  • It may not always be clear to searchers, journal editors or database producers what part of a multi-part name should be considered as the major constituent. Therefore different variants of the same author name may occur within the database. Some databases (e.g. CAB Health) systematically provide permuted versions of multi-part author names.
  • A comparable problem is featured by initials. No matter how certain you may be of an author's name, many are not true to one format. Chances are that "Robert A. Smith" has published the paper you are looking for as "Smith-R"' or "Smith-RAT" (unfortunately, no index will alert you to papers signed "Bob Smith"). Therefore, alphabetical indexes are especially useful for identifying author names. Many systems have field-specific indexes, so all available author names can be viewed. If these are not available, a general (free-text) index will probably be.
  • Some systems allow for truncation (e.g. only last name, without initials), while others will only accept the literal format of the author name as it is entered in the database. Of course, omitting initials will often yield far too many results, so again, alphabetical indexes should be used.

Searching for subjects

Subject information can be found mainly in the "title" and "keywords" fields. Therefore it may be worthwhile to use both "keyword searching" and "free-text" retrieval. Generally the first one will give the better results, but for exhaustive results the combination of both is required.

Using keywords:

Theoretically, this should be the most precise method, as keywords are part of a controlled vocabulary. They are accorded in a consistent fashion, after careful consideration of the document described. Ideally there is a hierarchical thesaurus with powerful "explode" capabilities. There are, however, several disadvantages:

  • Each database may have its own specific keyword system.
  • Indexers may not always be consistent and occasionally make mistakes.
  • Some systems automatically create or map entries to extra keywords, e.g. based on words in the abstracts or the bibliographies. While these may generally be helpful, in many cases they will also yield superfluous or faulty hits.

Searching free-text

Free-text retrieval is often understood as searching for just any word that is present anywhere in the database (e.g. in the "author address" field). Other systems limit this option to a few text-based fields like "title" and "abstract" (and sometimes "keywords"). It is obvious that this difference will influence retrieval results, as in the first case you may be overwhelmed by excessive hits while in the second you may not be able to retrieve information from certain fields (e.g. original language title).

There are two major disadvantages to 'free-text' searching:

Low recall or sensitivity
Using natural language phrases may yield useful results, but these may be only a fraction of what is available in the database, because indexer and searcher have different points of view:

  • Synonyms may be used.
  • Higher or lower level terms may be used.
  • Singular forms are used instead of plural ones, and vice versa. E.g. In Medline, using the term "children" will yield thousands of records, while "child" is a more appropriate and still more productive search term.
  • Different language terms. E.g. There are dozens of records featuring the French or Dutch term "tuberculose" (but these hits are limited to "original title" and rather surprisingly "author address") while "tuberculosis" features an almost fifty-fold success rate.
  • Also multiple word concepts may be found accidentally in the literal format they were entered, while using the correct keyword entry is far more proficient and relevant, e.g. the free-text search "cancer treatment" may find only a fraction of what the thesaurus combination "explode neoplasms / drug therapy, therapy" yields.

Low relevance or specificity
Conversely, free-text searching will often generate many extra hits when compared to keyword searching. Yet a large proportion of these may not be sufficiently relevant, as is clear from some examples yielded by the free-term search on the word "malaria". These may show little relevance, e.g. when part of a long (implied) list of diseases as in "... ranging from malaria and tuberculosis to Ebola fever and AIDS..." or when mentioned when another concept is in focus, as in "tuberculosis was the second (after malaria) leading cause ...". Whether actual hit terms are relevant or not also depends on the angle of research. The explicit absence of a disease as in "new cases of malaria were not observed" or "the patient was treated symptomatically after exclusion of malaria" may constitute useful intelligence. Papers investigating the intricacies of the "malaria parasite" or the "malaria vector" may have little clinical relevance but in a broader sense do indeed belong to the domain of malaria. But "the simian malaria parasite" or "two rodent malaria species' or 'a potential malaria vector" are generally not what one expects when searching information on human malaria, and "babesiosis is a malaria-like illness ..." is a genuine miss. And to end with a classic, using "AIDS" when searching for the "acquired immune deficiency syndrome" results in dozens of "hearing aids", "audiovisual aids", "diagnostic aids", "teaching aids", etc., and the verbal form "aids" in the meaning of "helps".

Some concluding remarks on subject searching:

Don't focus too strongly on predetermined concepts.
One should not expect the specific words one wants to find to always feature literally. Often a different spelling or a synonym are used instead. Therefore, use the alphabetic indexes or structured thesauri when available. Also, take into account that almost all database software and keyword systems are unilingually English.

Don't get too specific too soon.
The results you find may be disappointing because your search formulations are too specific: a subject may well be covered in a publication, even though this is not explicitely clear from the "title" or the "keyword" fields. E.g. most books on infectious diseases do contain a chapter on malaria, yet this word will probably not feature in the bibliographical record of this book. In the same fashion it is not wise to start with combining all the concepts you have in mind. If just one of them is missing, the search may fail. Therefore it may be more rewarding to start searching for the most specific concept, and add extra concepts only when results are sufficiently numerous.
Try to bear these (human) limitations in mind and do not lose courage too fast. Use your imagination: even with the ubiquitous World Wide Web the perfect universal retrieval systeem has not been invented yet.