The Arts & Humanities Research Council (AHRC): Each year the AHRC provides approximately £112 million from the Government to support research and postgraduate study in the arts and humanities, from languages and law, archaeology and English literature to design and creative and performing arts. In any one year, the AHRC makes approximately 700 research awards and around 1,300 postgraduate awards. Awards are made after a rigorous peer review process, to ensure that only applications of the highest quality are funded. The quality and range of research supported by this investment of public funds not only provides social and cultural benefits but also contributes to the economic success of the UK. 
The Canada Foundation for Innovation (CFI): Created by the Government of Canada in 1997, the Canada Foundation for Innovation (CFI) strives to build our nation’s capacity to undertake world-class research and technology development to benefit Canadians. Thanks to CFI investment in state-of-the-art facilities and equipment, universities, colleges, research hospitals and non-profit research institutions are attracting and retaining the world’s top talent, training the next generation of researchers, supporting private-sector innovation and creating high-quality jobs that strengthen Canada’s position in today’s knowledge economy.
The Economic and Social Research Council (ESRC) is the UK's largest organisation for funding research on economic and social issues. It supports independent, high quality research which has an impact on business, the public sector and the third sector. The ESRC’s total budget for 2011/12 is £203 million. At any one time the ESRC supports over 4,000 researchers and postgraduate students in academic institutions and independent research institutes. More at
The Institute of Museum and Library Services (IMLS) is the primary source of federal support for the nation’s 123,000 libraries and 17,500 museums. Our mission is to inspire libraries and museums to advance innovation, lifelong learning, and cultural and civic engagement. Our grant making, policy development, and research help libraries and museums deliver valuable services that make it possible for communities and individuals to thrive. To learn more, visit and follow us on Facebook and Twitter.
Jisc offers digital services for UK education and research. The charity does this to achieve its vision for the UK to be the most digitally advanced education and research nation in the world. Working together across the higher education, further education and skills sectors, Jisc provides trusted advice and support, reduces sector costs across shared network, digital content, IT services and procurement negotiations, ensuring the sector stays ahead of the game with research and development for the future. Find out more at or contact the press team on
The National Endowment for the Humanities (NEH). Created in 1965 as an independent federal agency, the NEH supports learning in history, literature, philosophy, and other areas of the humanities. NEH grants enrich classroom learning, create and preserve knowledge, and bring ideas to life through public television, radio, new technologies, museum exhibitions, and programs in libraries and other community places. Additional information about the National Endowment for the Humanities and its grant programs is available on the Internet at
The National Science Foundation (NSF) is an independent federal agency that supports fundamental research and education across all fields of science and engineering. In fiscal year (FY) 2009, its budget is $9.5 billion, which includes $3.0 billion provided through the American Recovery and Reinvestment Act. NSF funds reach all 50 states through grants to over 1,900 universities and institutions. Each year, NSF receives about 44,400 competitive requests for funding, and makes over 11,500 new funding awards. NSF also awards over $400 million in professional and service contracts yearly. More information about NSF is available on the Internet at
The Natural Sciences and Engineering Research Council (NSERC). NSERC aims to make Canada a country of discoverers and innovators for the benefit of all Canadians. The agency supports university students in their advanced studies, promotes and supports discovery research, and fosters innovation by encouraging Canadian companies to participate and invest in postsecondary research projects. NSERC researchers are on the vanguard of science, building on Canada’s long tradition of scientific excellence.
The Netherlands Organisation for Scientific Research (NWO) funds thousands of top researchers at universities and institutes and steers the course of Dutch science by means of subsidies and research programmes.
The Social Sciences and Humanities Research Council (SSHRC) is the Canadian federal agency that promotes and supports postsecondary-based research and training in the humanities and social sciences. Through its programs, SSHRC works to develop talented leaders for all sectors of society, helps generate insights about people, ideas and behaviour and builds connections within and beyond academia that will build a better future for Canada and the world and help build understanding and knowledge to better equip Canadians make informed decisions about their future and long-term prosperity. For more information, visit

 Announcing the Winners of Round Three (2013) Minimize
January 15, 2014—Today, ten international research funders from four countries jointly announced the winners of the third Digging into Data Challenge, a competition to develop new insights, tools and skills in innovative humanities and social science research using large-scale data analysis.
Fourteen teams representing Canada, the Netherlands, the United Kingdom, and the United States will receive grants to investigate how computational techniques can be applied to “big data”; changing the nature of humanities and social sciences research. Each team represents collaborations among scholars, scientists, and information professionals from leading universities and libraries in Europe and North America.
The first round of the Digging into Data Challenge was held in 2009 and the second in 2011. Previous Digging into Data research projects have received international attention. For the current round, there are ten sponsoring funders and a total of fourteen funded projects.
List of awardees is below.
 List of 2013 Award Recipients Minimize

Automating Data Extraction from Chinese Texts

(Principal Investigators: Peter K. Bol, Harvard University, US; Hilde De Weerdt, King's College London, UK)

Abstract: The Automating Data Extraction from Chinese Texts Project aims to provide humanists and social scientists with a means of transforming 2200 years of Chinese texts into structured data. The project will fully develop an open-source platform that allows its users to apply sophisticated text-mining techniques, hitherto the domain of information scientists, to a wide variety of historical and literary texts. Users interested in biographical data, for example, will be able to tag and extract personal names, dates, place names, official titles and postings, kinship ties, and other social relationships. The platform will be tested against 2000 local histories spanning an 800-year period and 19,000 letters and 500 notebooks dating from the seventh through the thirteenth century. Data extracted from the sample repositories will be used to enrich text-mining applications and will also be made available in English and Chinese for research through open-access online databases and data archives.


Cleaning, Organizing, and Uniting Linguistic Databases (the COULD project)

(Principal Investigators: Maria Polinsky, Harvard University, US; Alan Bale, Concordia University, CAN)

Abstract: The COULD project has 5 goals. (1) It seeks to transfer existing linguistic data from a variety of different formats into a universal format that will allow linguists to combine and share information, not only with other linguists but also with the public at large. (2) The project will build applications that automatically correct errors, draw attention to inconsistencies, and fill gaps in the data. (3) These automated mechanisms will provide new tools to detect patterns that are not obvious when looking at smaller databases. (4) The project seeks to make the vast amounts of linguistic data, currently only being used by researchers, available to second language learners by developing search algorithms that facilitate lesson creation. (5) The project will make data collection easier and thus make language preservation and documentation less dependent on experts. Communities trying to revive endangered languages will benefit directly from this project.


Commonplace Cultures: Mining Shared Passages in the 18th Century using Sequence Alignment and Visual Analytics

(Principal Investigators: Robert Morrissey, University of Chicago, US; Min Chen, University of Oxford; UK)

Abstract: Recent scholarship has demonstrated that the various practices associated with Early Modern “commonplacing” -- the extraction and organization of quotations and other passages for later recall and reuse--were highly effective strategies for dealing with the perceived "information overload" of the period. But, the 18th century was also a crucial moment in the modern construction of a new sense of self-identity. Our goal is to examine this paradigm shift in 18th-century culture from the perspective of commonplaces and their textual and historical deployment in the contexts of collecting, reading, writing, classifying, and learning. These practices allowed individuals to master a collective literary culture through the art of commonplacing, a nexus of intertextual activities that we aim to explore through the concerted application of sequence alignment algorithms for shared passage detection and large-scale visual analytics on the largest collection of 18th-century works ever assembled.


Digging Archaeology Data: Image Search and Markup (DADAISM)

(Principal Investigators: Maarten de Rijke, University of Amsterdam, NL; Helen Petrie, University of York, UK; Mark Eramian, University of Saskatchewan, CAN)

Abstract: Teams from the UK, Canada and the Netherlands will investigate how we can use interactive systems design in conjunction with image processing and text mining techniques to help archaeologists find, organise and analyse the thousands of image and document resources available to them for answering archaeology research questions.


Digging into Linked Parliamentary Data

(Principal Investigators: Maarten Marx, University of Amsterdam, NL; Jane Winters, University of London, UK; Christopher Cochrane, University of Toronto Scarborough, CAN)

Abstract: This project brings together political scientists, historians and computational linguists, from Canada, The Netherlands and the UK, to enable large-scale analysis of the proceedings of three parliaments, from c.1800 to the present day. This data reflects any event of significance over the past 200 years, and will be enhanced during the course of the project to shed light on developments across different nations, cultures and systems of political representation. The project will deliver a common, and extensible, format for encoding parliamentary proceedings; a joint, linked dataset covering all three jurisdictions; a range of tools to facilitate the longitudinal study of parliamentary data; and a series of case studies to test and inform the chosen methodology.


Digging into signs: Developing standard annotation practices for cross-linguistic quantitative analysis of sign language data

(Principal Investigators: Onno Crasborn, Radboud University Nijmegen, NL; Kearsy Cormier, University College London, UK)

Abstract: This project will develop cross-linguistic annotation protocols for exploring the content of sign language video datasets. The key progress lies in a) standardised lemmatisation protocols for lexicalised signs, and b) protocols for annotating partly-lexical and non-lexical (including gestural) elements. The project will demonstrate its approach using corpora of British Sign Language (BSL) and Sign Language of the Netherlands (NGT). Linguistic corpora – i.e. large, representative samples of naturalistic language use – are one of the richest type of resources for studying language structure and use. The new annotation protocols and resulting corpora will enable users to really dig into the content of the existing video data and to enable cross-linguistic research with sign language corpora. The project thus goes far beyond the current state of the art with online sign language corpus data which restricts searches to a few key background details about participants via metadata.


Field Mapping: An Archival Protocol for Social Science Research Findings

(Principal Investigators: Frank Bosco, Virginia Commonwealth University, US; Piers Steel, University of Calgary, CAN)

Abstract: In this project, psychology and management scholars from the United States and Canada will collaborate with an expert in online research and classification methods to devise a web application that will (i) enable the encoding of millions of individual findings in a multidisciplinary social science research domain, (ii) facilitate complex analyses, and (iii) provide open access to members of the scholar community and the general public. Our project provides protocols for the extraction and classification of research findings into a semantic taxonomy. The foundation of this taxonomy will change how researchers search for and analyze findings from big data. We will develop efficient algorithms to access and analyze research findings. This will lead us to our eventual goal -- a comprehensive repository of findings from social science research that is updated continuously and responds to dynamic queries.


Global Currents: Cultures of Literary Networks, 1050-1900

(Principal Investigators: Elaine Treharne, Stanford University, US; Lambert Schomaker, Groningen University, NL; Andrew Piper, McGill University, CAN)

Abstract: This project undertakes the cross-cultural study of literary networks in a global context, ranging from post-classical Islamic philosophy to the European Enlightenment. Integrating new image-processing techniques with social network analysis, we examine how different cultural epochs are characterized by unique networks of intellectual exchange. Research on "world literature" has become a central area of inquiry today within the humanities, and yet so far data-driven approaches have largely been absent from the field. Our combined approach of visual language processing and network modeling allows us to study the non-western and pre-print textual heritages so far resistant to large-scale data analysis as well as develop a new model of global comparative literature that preserves a sense of the world’s cultural differences.


Legal Structures

(Principal Investigators: Adam Badawi, Washington University School of Law, US; Rens Bod, University of Amsterdam)

Abstract: This project takes a radically novel approach to the problem of measuring and visualizing differences among legal systems: it focuses on machine coding of internal references in codes and laws. Internal referencing is an inherent characteristic of codes. Already the Code of Hammurabi, almost 3800 years ago, was structured as a numbered list of laws with at least one cross-reference. The intuition behind this approach is that fundamental differences among legal systems manifest themselves in the structure of the texts and can be detected, parameterized, and visualized using computerized algorithms. For instance, the French Civil Code—based on a deductive ideal of legal thought—has fewer internal references than the hundred-year younger German Civil Code—influenced by the idea that law finds its legitimacy in the history of a country rather than on natural principles and hence is less organically structured. We will use this procedure to analyze the world’s codes.


Mining Biodiversity

(Principal Investigators: William Ulate Rodriguez, Missouri Botanical Garden, US; Sophia Ananiadou, University of Manchester, UK; Anatoliy Gruzd, Dalhousie University, CAN)

Abstract: The Mining Biodiversity project aims to transform the Biodiversity Heritage Library into a next-generation social digital library resource to facilitate the study and discussion (via social media integration) of legacy science documents on biodiversity by a worldwide community and to raise awareness of the changes in biodiversity over time in the general public. The project will integrate novel text mining methods, visualisation, crowdsourcing and social media into the BHL to provide a semantic search system.


MIning Relationships Among variables in large datasets from CompLEx systems (MIRACLE)

(Principal Investigators: C. Michael Barton, Arizona State University, US; Tatiana Filatova, University of Twente, NL; Terence P. Dawson, University of Dundee, UK; Dawn Cassandra Parker, University of Waterloo, CAN)

Abstract: Social scientists have used agent-based models (ABMs) to explore the interaction and feedbacks among social agents and their environments. The bottom-up structure of ABMs enables simulation and investigation of complex systems and their emergent behavior with a high level of detail; however the stochastic nature and potential combinations of parameters of such models create large non-linear multidimensional “big data,” which are difficult to analyze using traditional statistical methods. Our proposed project seeks to address this challenge by developing algorithms and web-based analysis and visualization tools that provide automated means of discovering complex relationships among variables. The tools will enable modelers to easily manage, analyze, visualize, and compare their output data, and will provide stakeholders, policy makers and the general public with intuitive web interfaces to explore, interact with and provide feedback on otherwise difficult-to-understand models.


Project Arclight: Analytics for the Study of 20th Century Media

(Principal Investigators: Eric Hoyt, University of Wisconsin-Madison, US; Charles Acland, Concordia University, CAN)

Abstract: Commercial media companies have embraced computational analytics to study discussions of media content across social media data streams. Data mining companies identify actors and TV shows that are “trending” in global popularity, along with more granular analyses of regional tastes, social networks, and discourse. We propose to apply a similar methodology toward the study of film and media history. Project Arclight ( will create a web-based tool that enables the study of 20th century American media through comparisons across time and space. The Arclight tool will be built using several popular open source technologies, including Ruby on Rails, Javascript, and Solr. The tool will analyze roughly two million pages of public domain publications derived from two repositories: the Media History Digital Library (which uses the Internet Archive’s scanning, hosting, and preservation services) and the Library of Congress Chronicling America collection.


Resurrecting Early Christian Lives: Digging in Papyri in a Digital Age

(Principal Investigators: Philip Sellew, University of Minnesota, US; Dirk Obbink, Oxford University, UK)

Abstract: Our team proposes to study papyrus documents from Egypt found in trash heaps: scraps giving us rich evidence of human activity in the ancient Mediterranean. They allow us to retrieve lost poetry, new gospels, and everyday writings: letters, contracts, census returns, homilies, recipes. Half a million fragments await study in the Oxyrhynchus collection alone. Building on data from our crowd-sourcing transcriptions of this material in Greek, we will study a range of papyri relevant to early Christianity. We will develop a transcription tool for Coptic, the late version of Egyptian used by Christians. We will complete a web-based interface to allow scholars to edit the results of the transcriptions; these tools allow us to look in detail at complex networks of identity and authority and examine how Christians saw their new religion as part of their other identities (Greek, Egyptian, Roman, merchant, monk). Our tools and our results will be made available to other developers and scholars.


Trees and Tweets: Mining Billions to Understand Human Migration and Regional Linguistic Variation

(Principal Investigators: Diansheng Guo, University of South Carolina, US; Jack Grieve, Aston University, UK)

Abstract: The proposed research aims to analyze contemporary twitter data for the UK and USA for regional variation in linguistic forms and link the patterns of variation with migration in both countries. Our goal is to understand how linguistic variation is shaped by migration in both the past and present. Two sorts of “big data” will be collected, cleaned, and analyzed for spatial patterns: tweets will be used to document regional linguistic variation and family trees to describe the large-scale migration patterns that might explain this variation. By analyzing successive tweets by the same individuals, we will also have a record of their mobility which we will relate to linguistic variation in the tweets.



