Internet Archive


The Internet Archive is a 501(c)(3) non-profit that was founded to build an Internet library, with the purpose of offering permanent access for researchers, historians, and scholars to historical collections that exist in digital format. Founded in 1996 and located in the Presidio of San Francisco, the Archive has been receiving data donations from Alexa Internet and others. In late 1999, the organization started to grow to include more well-rounded collections. Now the Internet Archive includes texts, audio, moving images, and software as well as archived web pages in our collections.

IA Resources of Particular interest to the DID Challenge:

1) The Prelinger Film Archives- Public Collection of 2,139 films

This free public archive is a subset of the approximately 60,000 item collection of ephemeral films assembled by Rick Prelinger, a filmmaker and historian. 

Ephemeral films are defined as advertising, educational, industrial, and amateur films including the famous “Duck and Cover” nuclear safety education film and other extraordinary items of social, artistic and historical significance.  As a whole, the Prelinger collection currently contains over 10% of the total production of ephemeral films between 1927 and 1987, and it may be the most complete and varied collection in existence of films from these poorly preserved genres.
A tag cloud of the public collection to which the Archive will provide researcher access is viewable at the following URL:

The Archive believes that this collection may be of interest to sociologists, filmmakers, scholars of marketing and advertising and of course, historians for study of themes in public opinion, politics and popular culture.  Moreover, it offers a substantial set of digital video imagery that can serve as a testbed for tools development in video search, face recognition and other image-based technologies.

Rights:  This collection is being made available for study and reuse under a Creative Commons License.  Details of the rights granted for this collection are available at the following URL:

2) Canadian Libraries Collection- 161,732 Digitized Books

This collection contains digitazed books, the vast majority of which were contributed by theUniversity of Toronto Libraries.
Major themes include Canadian history/regional history, medical history, religion, military history, Greek Classics (Tufts University/Perseus Digital Library), government documents and certain special collections (e.g., the Cardinal Newman collection.)  The collection can be viewed and searched at the following URL:

The Archive believes this collection may be of interest to scholars of history and sociology among other disciplines, as well as tool development for OCR, multi-lingual translation, name and place recognition and natural language processing.

Both collections are fully searchable using text boxes on the home pages. 


For technical questions (e.g. bulk downloads, etc), please contact Hank Bromley or Alexis Rossi. For questions about usage rights, please contact