Date published: 28/2/20

The Global Digitised Dataset Network (GDD Network) was an AHRC-funded research collaboration from 2019-2020 investigating the feasibility of creating a global catalogue of digitised texts. This would enable people to search and find texts, and access them for reading, digital scholarship, collections analysis, and more.

This is an aggregated dataset of digitised records created by the Global Digitisation Dataset project in 2019, an AHRC funded project under the UK/US Digital Scholarship in Cultural Institutions networking fund. The records come from the project’s members: HathiTrust, National Library of Scotland, British Library and the National Library of Wales. Each record in the dataset contains limited bibliographic metadata, along with a link to the item. The dataset was created as a proof of concept, merging records of digitised texts from different organisations together.

Related links

Visit the project website: GDD network website

Screenshot of prototype search for a global dataset of digitised texts

Rights information

CC-BY 4.0

This data collection is licensed under a CC-BY 4.0 license.


Download the data

Trial the data

Download a sample of the dataset for initial evaluation.

File contents: 2 plain text readme files; 1 TSV file.

File size: 1.32 MB compressed (3.17 MB uncompressed)

All the data

File contents: 2 plain text readme files and 1 TSV file.

File size: 1.04 GB compressed (5.58 GB uncompressed)


Cite the data

DOI: https://doi.org/10.34812/fda4-5336

Dataset creator: HathiTrust, National Library of Scotland, British Library, National Library of Wales

Dataset publisher: HathiTrust, National Library of Scotland, British Library, National Library of Wales

Publication year: 2020

Suggested citation: National Library of Scotland.  Aggregated dataset of digitised texts from the GDD project, 2020. https://doi.org/10.34812/fda4-5336