Update: The conference is now over! Thank you very much for your interest. Papers from the conference ("Final White Papers") are now available. Click on each project link below to find the paper.
Please join us in Washington, DC on June 9 - 10 for the Round One Digging into Data Challenge Conference. Principal investigators and respondents will discuss eight cutting-edge projects that are studying text, music, images, and spoken word using advanced computing. Registration information is below.
These projects were funded during the 2009 round of the Digging into Data Challenge, an international grant competition sponsored by leading research agencies in Canada, the United States, and the United Kingdom.
In addition to principal investigators and respondents, we will also have three keynotes:
- Tony Hey, Corporate Vice President, External Research, Microsoft Research
- Tom Jenkins, Executive Chairman and Chief Strategy Officer, Open Text
- Erez Lieberman-Aiden & JB Michel, Harvard University, lead authors of “Quantitative Analysis of Culture Using Millions of Digitized Books” and Google Ngrams tool from Science.
Projects that are Presenting
Eight projects will be featured at the conference, covering a variety of media and disciplines:
- Digging into Image Data to Answer Authorship Related Questions
This project will pursue research using advanced computational techniques to explore humanities themes related to the authorship of large collections of cultural heritage materials, namely 15th century manuscripts, 17th and 18th century maps, and 19th and 20th century quilts.
- Digging into the Enlightenment: Mapping the Republic of Letters
This project will focus on a body of 53,000 18th-century letters, and analyze the degree to which the effects of the Enlightenment can be observed in the letters of people of various occupations.
- Mining a Year of Speech
This project focuses on large scale data analysis of audio -- specifically the spoken word. This project will create tools to enable rapid and flexible access to over 9,000 hours of spoken audio files, containing a wide variety of speech, drawn from some of the leading British and American spoken word corpora, allowing for new kinds of linguistic analysis.
- Structural Analysis of Large Amounts of Music Information
SALAMI (Structural Analysis of Large Amounts of Music Information) is an innovative and ambitious computational musicology project. To date, musical analysis has been conducted by individuals and on a small scale. Our computational approach, combined with the huge volume of data now available from such source as the Internet Archive, will a) deliver a very substantive corpus of musical analyses in a common framework for use by music scholars, students and beyond; and, b) establish a methodology and tooling which will enable others to add to this in the future and to broaden the application of the techniques we establish. A resource of SALAMI’s magnitude empowers musicologists to approach their work in a new and different way, starting with the data, and to ask research questions that have not been possible before.
- Harvesting Speech Datasets for Linguistic Research on the Web
This project will harvest audio and transcribed data from podcasts, news broadcasts, public and educational lectures and other sources to create a massive corpus of speech. Tools will then be developed to analyze the different uses of prosody (rhythm, stress and intonation) within spoken communication.
- Using Zotero and TAPoR on the Old Bailey Proceedings: Data Mining with Criminal Intent
This project will create an intellectual exemplar for the role of data mining in an important historical discipline – the history of crime – and illustrate how the tools of digital humanities can be used to wrest new knowledge from one of the largest humanities data sets currently available: the Old Bailey Online.
- Towards Dynamic Variorum Editions
The creation of a framework to produce "dynamic variorum" editions of classics texts that enable the reader to automatically link not only to variant editions but also to relevant citations, quotations, people, and places that are found in a digital library of over one million primary and secondary source texts.
- Railroads and the Making of Modern America—Tools for Spatio-Temporal Correlation, Analysis, and Visualization
This project will integrate a vast collection of textual, geographical and numerical data to allow for the visual presentation of the railroads and its impact on society over time, concentrating initially on the Great Plains and Northeast United States.
Biographies of Principal Investigators & Other Speakers
Tony Hey, Corporate Vice President, External Research, Microsoft Research. As corporate vice president of Microsoft Research Connections, Tony Hey is responsible for worldwide external research (ER) collaboration in Microsoft Research. He leads the company's efforts to build long-term public-private partnerships with global scientific and engineering communities, spanning broad reach and in-depth engagements with academic and research institutions, related government agencies, and industry partners. His responsibilities also include working with internal Microsoft groups to build future technologies and products that will transform computing for scientific and engineering research. Hey manages the U.S.-based external research group for North and South America, and the multidisciplinary eScience Research Group. He also has dotted-line management responsibility for Microsoft Research's ER teams in Asia, Europe, and India. Before joining Microsoft, Hey served as director of the U.K.'s e-Science Initiative, managing the government's efforts to provide scientists and researchers with access to key computing technologies. Before leading this initiative, Hey worked as head of the School of Electronics and Computer Science, and dean of Engineering and Applied Science at the University of Southampton, where he helped build the department into one of the most respected computer science research institutions in England.
Tom Jenkins, Executive Chairman and Chief Strategy Officer, Open Text. P. Thomas Jenkins is Executive Chairman and Chief Strategy Officer for Open Text™ of Waterloo, Ontario, Canada, a US$1 Billion enterprise software firm and the largest software company in Canada. Mr. Jenkins has served as a Director of Open Text since 1994 and as its Chairman since 1998. From 1994 to 2005, Mr. Jenkins was President and Chief Executive Officer. From 2005 to present, Mr. Jenkins has been Executive Chairman and Chief Strategy Officer of Open Text. In addition to his Open Text responsibilities, Mr. Jenkins is the Chair of the Government of Canada’s Research and Development Policy Review Panel which will report in October 2011 and is tasked with reviewing the $7 Billion of federal public spending on research to assist the Canadian economy in becoming more innovative.
Erez Lieberman-Aiden, Harvard University. Erez Lieberman Aiden is a fellow at the Harvard Society of Fellows and Visiting Faculty at Google. His research spans many disciplines and has won numerous awards, including recognition for one of the top 20 "Biotech Breakthroughs that will Change Medicine", by Popular Mechanics; the Lemelson-MIT prize for the best student inventor at MIT; the American Physical Society's Award for the Best Doctoral Dissertation in Biological Physics; and membership in Technology Review's 2009 TR35, recognizing the top 35 innovators under 35. His last three papers - two with JB Michel - have all appeared on the cover of Nature and Science.
JB Michel, Harvard University. Jean-Baptiste Michel is FQEB Fellow at Harvard and Visiting Faculty at Google. With Erez Lieberman Aiden, he founded the Cultural Observatory at Harvard, where their team develops quantitative approaches to the humanities and social sciences. Jean-Baptiste is an Engineer of Ecole Polytechnique, and received an MS in Applied Math and a PhD in Systems Biology from Harvard.
Peter Baskerville, University of Alberta, respondent to Railroads and the making of Modern America--Tools for Spatio-Temporal Correlation, Analysis and Visualization. Peter Baskerville holds the Chair in Modern Western Canadian History at University of Alberta and is cross appointed in History and Classics and Humanities Computing. He is overseeing the Canadian Century Research Infrastructure at the University of Alberta. He is principal investigator of a Canada Foundation for Innovation funded project titled The Last Best West: The Alberta Land Settlement Infrastructure Project.
Jennifer Cole, University of Illinois, Urbana-Champaign, respondent to Harvesting Speech Datasets for Linguistic Research on the Web. She is a Professor in the Department of Linguistics at the University of Illinois and a faculty affiliate in the Department of Computer Science. Cole received her Ph.D. in Linguistics from the Massachusetts Institute of Technology in 1987, and taught at Yale University (1987-1989) prior to joining U of I in 1990. Her research interests relate to spoken language and include phonology, phonetics, human speech processing, and computational linguistics.
Cynthia Damon, University of Pennsylvania, respondent to Towards Dynamic Variorum Editions. Cynthia Damon is Professor of Classical Studies at the University of Pennsylvania. She is the author of The Mask of the Parasite (1997), a commentary on Tacitus, Histories 1 (2003), and, with Will Batstone, Caesar’s Civil War (2006). Current projects are a text of Caesar’s Bellum civile, a translation of Tacitus’ Annals, and work on the reception of Pliny.
David Huron, Ohio State University, respondent to Structural Analysis of Large Amounts of Musical Information. David Huron is Arts and Humanities Distinguished Professor in Music and Cognitive Science at the Ohio State University. Dr. Huron heads the OSU Cognitive and Systematic Musicology Laboratory.His research emphasizes music-induced emotion, computational musicology, and comparative ethnomusicology. In addition to laboratory-based research, his activities have also involved field studies among various cultures in Micronesia.
Dan Jurafsky, Stanford University, respondent to Mining a Year of Speech. Dan Jurafsky is Professor in the Department of Linguistics, and by courtesy in the Department of Computer Science, at Stanford University. From 1996-2003 he was on the faculty of the University of Colorado, Boulder. Dan received a B.A in Linguistics in 1983 and a Ph.D. in Computer Science in 1992, both from the University of California at Berkeley, and was a postdoc 1992-1995 at the International Computer Science Institute. He is the recipient of a 2002 MacArthur Fellowship, and is the co-author with Jim Martin of the widely-used textbook"Speech and Language Processing". He has research interests throughout computational linguistics; recent topics include the induction of meaning, machine translation, the role of probability in human language processing, the application of natural language processing to social science topics including social psychology and the sociology of science, and the linguistics of food. Dan was born in Yonkers, New York, and grew up in Los Altos, California.
Stephen Nichols, Johns Hopkins University, respondent to Digging into the Enlightenment: Mapping the Republic of Letters. Stephen G. Nichols, a medievalist, is James M. Beall Professor Emeritus of French & Humanities, and Research Professor at Johns Hopkins. He received the MLA’s James Russell Lowell Prize for Romanesque Signs: Early Medieval Narrative and Iconography, and his The New Philology was honored by the Council of Learned Journals. He holds an honorary Docteur ès Lettres, from the University of Geneva, and was decorated Officier de l’Ordre des Arts et Lettres from the French government. The Alexander von Humboldt Foundation awarded him its coveted Research Prize in 2008.
Steve Ramsay, University of Nebraska-Lincoln, respondent to Using Zotero and TAPOR on the Old Bailey Proceedings: Data Mining with Criminal Intent. Stephen Ramsay is an Associate Professor of English and a Fellow at the Center for Digital Research in the Humanities at the University of Nebraska-Lincoln. He designs and builds text technologies for humanist scholars, and has lectured widely on subjects related to literary theory and software design for the humanities. His book, *Reading Machines: Toward an Algorithmic Criticism*, will be published by University of Illinois Press later this year.
Xin Wei Sha, Concordia University, respondent to Digging into Image Data to Answer Authorship Related Questions. Sha Xin Wei, Ph.D., is Canada Research Chair in media arts and sciences, and Associate Professor of Fine Arts and Computer Science at Concordia University in Montréal, Canada. He directs the Topological Media Lab, a studio-laboratory for the study of gesture and materiality from computational and phenomenological perspectives. His graduate courses combine critical studies of computation and technology with studio work in responsive environments and live events. Sha’s major art research work include the TGarden responsive environments, Hubbub speech-sensitive urban surfaces, Membrane calligraphic video, and Softwear gestural sound instruments, and most recently kinetic sculpture and low resolution displays responding to movement and gesture.
Principal Investigators from Eight Winning Projects
Peter Ainsworth, University of Sheffield, Digging into Image Data to Answer Authorship Related Questions. After two years at the Université de Bourgogne in Dijon, I joined the University of Manchester in 1972 as a Lecturer, later becoming Senior Lecturer and Head of Department. In 1996 I went to a Chair of French at the University of Liverpool where I was Director of the Humanities Graduate School and Head of French and of the School of Modern Languages. In January 2001 I came to my Chair at Sheffield. Between 2003 and 2005 I was Director of Research for the Arts and Humanities Division. In September 2007 I was appointed Head of French. On 1st November 2009 I retired from the University and received the title Emeritus Professor of French.
Peter Bajcsy, University of Illinois at Urbana-Champaign, Digging into Image Data to Answer Authorship Related Questions. Peter Bajcsy received his Master of Science in Electrical Engineering from the University of Pennsylvania and his Doctorate in Electrical and Computer Engineering from the University of Illinois at Urban-Champaign. His research draws from the sub-fields of computer science such as image processing, pattern recognition, machine learning, data mining and artificial intelligence. Peter is currently employed in multiple positions at the University of Illinois: as the Associate Director for Data Analytics and Pattern Recognition at the Institute for Computing in Humanities, Arts and Social Sciences (ICHASS), as Adjunct Assistant Professor in the Electrical and Computer Engineering and Computer Sciences Departments, and as a Senior Research Scientists in Image Spatial Data Analysis (ISDA) at National Center for Supercomputing Applications (NCSA).
Dan Cohen, George Mason University, Using Zotero and TAPOR on the Old Bailey Proceedings: Data Mining with Criminal Intent. Daniel J. Cohen is an Associate Professor in the Department of History and Art History at George Mason University and the Director of the Center for History and New Media, where he has worked on projects ranging from digital archives (The September 11 Digital Archive) to scholarly software (Zotero). He is the co-author of Digital History: A Guide to Gathering, Preserving, and Presenting the Past on the Web (University of Pennsylvania Press, 2005), and author of Equations from God: Pure Mathematics and Victorian Faith (Johns Hopkins University Press, 2007) and the forthcoming The Ivory Tower and the Open Web (University of Michigan Press).
John Coleman, University of Oxford, Mining a Year of Speech. John Coleman is Professor of Phonetics and Director of the Phonetics Laboratory, University of Oxford. His research background is in speech synthesis and acoustic phonetics, and he has conducted, managed or collaborated in several corpus-based speech research projects, including IViE, a corpus of speech recordings covering intonational variation in British English dialects.
Greg Crane, Tufts University, Towards Dynamic Variorum Editions. Gregory Crane's interests are twofold. On the one hand, he has published on a wide range of ancient Greek authors (including articles on Greek drama and Hellenistic poetry and a book on the Odyssey). Much of his traditional scholarly work has been devoted to Thucydides; his book The Blinded Eye: Thucydides and the New Written Word appeared from Rowman and Littlefield in 1996; his second Thucydides book (The Ancient Simplicity: Thucydides and the Limits of Political Realism) was published by the University of California Press in 1998. At the same time, he has a long-standing interest in the relationship between the humanities and rapidly developing digital technology. He began this side of his work as a graduate student at Harvard when the Classics Department purchased its first TLG authors on magnetic tape in the summer of 1982. He developed a Unix-based full text retrieval system for the TLG that was widely used in North America and Europe in the middle 1980s. He also helped establish a typesetting consortium to facilitate scholarly publishing. Since 1985 he has been engaged in planning and development of the Perseus Project, which he directs as the Editor-in-Chief. Besides supervising the Perseus Project as a whole, he has been primarily responsible for the development of the morphological analysis system which provides many of the links within the Perseus database.
John Darlington, Imperial College, London, Towards Dynamic Variorum Editions. John Darlington is Professor in the Department of Computing at Imperial College London and head of the Social Computing Group based there. Professor Darlington’s long-term interests have been in the development of methods to assist the application of computational methods in as wide a variety of fields as possible. The increasing availability of Cloud computing resources and rich Internet data sources now provides opportunities for the development and deployment of innovative Internet applications and services that could have a major impact on all areas of science, the humanities and society.
David De Roure, University of Southhampton, Structural Analysis of Large Amounts of Musical Information. Dave De Roure is a Professor of e-Research in the Oxford e-Research Centre and National Strategic Director for Digital Social Research. Closely involved in the UK e-Science programme, his projects draw on Web 2.0, Semantic Web, workflow and pervasive computing technologies and he focuses on the co-evolution of digital technologies and research methods in and between multiple disciplines. These include digital humanities (computational musicology), social sciences (social statistics), chemistry (smart labs), bioinformatics (in silico experimentation) and environmental science (sensor networks). Many of his projects involve the intersection of the physical world with the digital world and the construction of 'datascopes' which take us from signal to understanding. He has an extensive background in Web and Linked Data, runs the myExperiment.org social website and is a Web Science champion for the Web Science Trust where he focuses on Web Science and the 'Internet of Things'.
Stephen Downie, University of Illinois at Urbana-Champaign, Structural Analysis of Large Amounts of Musical Information. J. Stephen Downie is an Associate Professor at the Graduate School of Library and Information Science, University of Illinois at Urbana-Champaign (UIUC). He is Director of the International Music Information Retrieval Systems Evaluation Laboratory (IMIRSEL). He is Principal Investigator on the Networked Environment for Music Analysis (NEMA) project . He has been very active in the establishment of the Music Information Retrieval (MIR) community through his ongoing work with the International Society for Music Information Retrieval (ISMIR) conferences and now serves as ISMIR's President. He holds a BA (Music Theory and Composition) along with a Master's and a PhD in Library and Information Science, all earned at the University of Western Ontario, London, Canada.
Dan Edelstein, Stanford University, Digging into the Enlightenment: Mapping the Republic of Letters. Dan Edelstein is an associate professor of French and, by courtesy, History at Stanford University. He is the author of books on the French Revolution and the Enlightenment, and the editor of the online journal, Republics of Letters.
Ichiro Fujinaga, McGill University, Structural Analysis of Large Amounts of Musical Information. Ichiro Fujinaga is an Associate Professor and the Chair of the Music Technology Area at the Schulich School of Music at McGill University. He has Bachelor's degrees in Music/Percussion and Mathematics from University of Alberta, and a Master's degree in Music Theory, and a Ph.D. in Music Technology from McGill University.
Richard Healey, University of Portsmouth, Railroads and the making of Modern America--Tools for Spatio-Temporal Correlation, Analysis and Visualization. Richard Healy is a Professor of Geography and his research interests include Historical GIS - use of GIS and visualisation methods for the development and analysis of large spatio-temporal databases of regional economic development; Internet-based historical GIS data resources; simulation modelling of regional development as well as Historical regional dynamics - economic development of the North-East United States 1850-1900 with particular reference to the coal mining, iron, oil and railroad industries.
Tim Hitchcock, University of Hertfordshire, Using Zotero and TAPOR on the Old Bailey Proceedings: Data Mining with Criminal Intent. Tim Hitchcock is Professor of 18th century History at the University of Hertfordshire. He has published ten books on the history of poverty, sexuality and masculinity; and with Robert Shoemaker is responsible for the creation of three substantial internet resources: Old Bailey Online (www.oldbaileyonline.org), London Lives, 1690-1800 (www.LondonLives.org), Connected Histories (www.connectedhistories.org).
Mark Liberman, University of Pennsylvania, Mining a Year of Speech. Mark Liberman works in the department of linguistics/computer science. His research interests include the phonology and phonetics of lexical tone, and its relationship to intonation; gestural, prosodic, morphological and syntactic ways of marking focus, and their use in discourse; formal models for linguistic annotation; and information retrieval and information extraction from text.
Dean Rehberger, Michigan State University, Digging into Image Data to Answer Authorship Related Questions. Rehberger received his PhD from the University of Utah in a double-degree program in Rhetorical Theory and American Studies. He is the Director of MATRIX, the Center for Humane Arts, Letters, and Social Sciences Online. Dean has been teaching with technology for over a decade. He specializes in using online technologies and developing educational resources for the World Wide Web.
Bruce Robertson, Mount Allison University, Towards Dynamic Variorum Editions. Bruce Robertson is a professor of classics and directs Heml, the Historical Event Markup and Linking Project, centered at the Dept. of Classics, Mount Allison University. It hosts projects pertaining to generalized visualization and markup tools for history, as well as web-based technologies for language learning and study.
Geoffrey Rockwell, University of Alberta, Using Zotero and TAPOR on the Old Bailey Proceedings: Data Mining with Criminal Intent. Dr. Geoffrey Rockwell is a Professor of Philosophy and Humanities Computing at the University of Alberta. He was the project leader of the Canada Foundation for Innovation Text Analysis Portal for Research project and is currently the Director of the Canadian Institute for Research Computing in the Arts. He has published and presented papers in the area of philosophical dialogue, textual visualization and analysis, humanities computing, instructional technology, computer games and multimedia. For more check out his site,www.geoffreyrockwell.com or his blog theoreti.ca.
Mats Rooth, Cornell University, Harvesting Speech Datasets for Linguistic Research on the Web. I do research in two areas, computational linguistics and natural language semantics. I have worked extensively on mixed symbolic/probabilistic models of syntax and the lexicon, on contrastive intonation (what is called focus), and on related phenomena such as ellipsis and presupposition. In addition to these, I am currently working on finite state optimality theory and web harvesting of intonational data.
Stéfan Sinclair, McMasters University, Using Zotero and TAPOR on the Old Bailey Proceedings: Data Mining with Criminal Intent. Stéfan Sinclair is an Associate Professor in the Department of Communication Studies & Multimedia at McMaster University and is the Director of the Sherman Centre for Digital Scholarship. His research focuses primarily on the design, development and theorization of tools for the digital humanities, especially for text analysis and visualization. Sinclair has led or been involved in the development of resources such as Voyeur Tools, TAPoR, MONK, and BonPatron.
William Thomas, University of Nebraska-Lincoln, Railroads and the making of Modern America--Tools for Spatio-Temporal Correlation, Analysis and Visualization. William G. Thomas, III teaches U.S. history and specializes in Civil War, the U.S. South, Slavery, and in Digital History. He is currently the Chair of the Department of History at the University of Nebraska-Lincoln and has served as the John and Catherine Angle Professor in the Humanities at Nebraska since 2005. He earned his B.A. in History at Trinity College in Connecticut and his M.A. and Ph.D. in History at the University of Virginia.
Michael Wagner, McGill University, Harvesting Speech Datasets for Linguistic Research on the Web. Michael Wagner finished his Ph.D. in linguistics at MIT in 2005. He is an Assistant Professor of Experimental Linguistics and Canada Research Chair for Speech and Language Processing at McGill University. His main research area is speech prosody and its relation to syntax and semantics, as well as its role in sentence processing.
Chris Weaver, University of Oklahoma, Digging into the Enlightenment: Mapping the Republic of Letters. Chris Weaver is an Assistant Professor in the School of Computer Science and Associate Director of the Center for Spatial Analysis at the University of Oklahoma. He holds a B.S. in Chemistry and Mathematics from Michigan State University and an M.S. and Ph.D. in Computer Science from the University of Wisconsin-Madison. He was a post-doctoral Research Associate with the GeoVISTA Center in the Department of Geography at Penn State, where he helped to found the North-East Visualization and Analytics Center. His research in visual analytics focuses on highly interactive user interfaces for exploring and analyzing multidimensional information, with special attention to open-ended methodological support for scholarship in the digital humanities.
Held at NEH Headquarters, the Old Post Office, Washington, DC, Room M-09 (Directions to the NEH)
- June 9: 1:00pm - 6:30pm
- June 10: 9:00am - 6:00pm
Full Agenda (145.88 KB) [PDF. Last Updated 31 May]