Cleaned up OCR

117,022 ALTO XML files at page level

120,903 image files

METS metadata files at item level

22,504,344 words

Covers years 1850-1950


This dataset forms the first half of the Medical History of British India collection, which itself is part of the broader India Papers collection held by the Library. A Medical History of British India consists of official publications varying from short reports to multi-volume histories related to disease, public health and medical research between circa 1850 to 1950. These are historical sources for a period which witnessed the transition from a humoral to a biochemical tradition, which was based on laboratorial science and document the important breakthroughs in bacteriology, parasitology and the developments of vaccines in a colonial context.

The vivid detail in these reports that makes them a treasure-trove of regional histories, including the effect of diseases on the social fabric of small villages and towns. The accompanying detailed topographical maps and extensive statistics also provide valuable data. In addition, they reveal the development of surveillance systems and the official response to epidemic emergencies within a colonial context, providing vital insights into the role of government and the operation of colonial power. Many of the breakthroughs and advances in the treatment and prevention of communicable diseases are represented in the collection, and a number of the titles are either about or by prominent figures in the field (for example, Sir Ronald Ross, W M Haffkine and S R Christophers). They also provide names of individuals attending medical schools, hospitals, the Indian civil service and other institutions.

Visit the Library’s website for the collection: A Medical History of British India website


Rights information

Public domain

This collection is free of known copyright restrictions. For details visit the Library‚Äôs copyright page.


Download the data

Trial the data

Download a sample of the dataset for initial evaluation.

File contents: 1 plain text readme file; 832 ALTO XML files; 1 METS file; 832 image files.

File size: 15.5 MB compressed (26.92 MB uncompressed)

All the data

File contents: 1 plain text readme file; 1 CSV inventory file; 117,022 ALTO XML file; 468 METS files; 120,903 image files.

File size: 10.3 GB compressed (17.5 GB uncompressed)

Caution: large dataset

Just the text

File contents: 1 plain text readme file; 1 CSV inventory file; 468 plain text files.

File size: 34.7 MB compressed (119.48 MB uncompressed)