Events

“Premodern China in the Age of AI” by Dr. Donald Sturgeon

Date:

24 Apr 2024 - 26 Apr 2024

Time:

3:00-4:30pm

Venue:

Digital Scholarship Lab, University Library

Speaker(s):

Dr. Donald Sturgeon

Biography of Speaker:
  • Assistant Professor, Department of Computer Science, Durham University
  • Creator and administrator of Chinese Text Project, a major online collaborative digital library project for pre-modern Chinese texts
  • Research interests: issues of language, mind and knowledge in classical Chinese thought, and the application of digital methods to the study of pre-modern Chinese literature and language
Enquiries:
Event Details:

#1 Workshop | 24 April 2024 (Wed)
Computational Approaches to Textual Similarity

Digitization of texts at an enormous scale together with ever more-powerful computer technology present excellent opportunities for identifying textual similarities automatically. Focusing primarily on classical Chinese examples, the hands-on workshop will explore practical ways of identifying, summarizing, and visualizing a variety of types of textual similarity in historical materials.

No technical background is assumed.

#2 Seminar | 26 April 2024 (Fri)
Premodern China in the Age of AI: Opportunities and Challenges

The seminar will:
– introduce the ongoing work on building and using NLP models in the Chinese Text Project (https://ctext.org), a digital library of premodern Chinese texts
– introduce the deep learning models used in the Chinese Text Project which are used to perform a variety of tasks like automated punctuating unpunctuated texts, automated annotation of named historical entities in transcribed texts, and automated OCR post-correction

Synopsis of Lecture:

Abstract of “24 Apr | Computational approaches to textual similarity”

Textual similarity – encompassing a variety of phenomena including direct quotation, unattributed copying with rewording or embellishment, allusion, and distinctly similar word usage – has long been of interest to textual scholars in many domains for a variety of reasons. Often these similarities are non-trivial to uncover, but once identified can provide valuable evidence for hypotheses about textual transmission histories and authorship – particularly important where these are complex or disputed.

Digitization of texts at an enormous scale together with ever more-powerful computer technology present excellent opportunities for identifying textual similarities automatically at scales that would otherwise be impossible. Focusing primarily on classical Chinese examples, this interactive, hands-on workshop will give an overview of some of the most commonly used approaches, and introduce practical ways of identifying, summarizing, and visualizing a variety of types of textual similarity in historical materials. No technical background is assumed, and all necessary materials will be provided.

Abstract of “26 Apr | Premodern China in the age of AI: opportunities and challenges”

Recent years have seen astonishing progress in artificial intelligence through the use of deep neural networks, including in the field of Natural Language Processing (NLP). These developments make possible new types of computational assistance for working with premodern texts in systems such as digital libraries, as well as offering potential new methods for some types of literary scholarship.

This seminar introduces ongoing work on building and using NLP models in the context of a large and widely used digital library of premodern Chinese texts, the Chinese Text Project (https://ctext.org), and the challenges encountered during this process. Using models trained on a large collection of data obtained through a combination of rule-based automation, linking of external resources, and crowdsourced editing, this ongoing project deploys deep learning models to directly augment a digital library with computer-generated data in a practically useful and sustainable way.

These models cover a range of tasks, many of which would previously have been considered impractical without human intervention, including: automated punctuation of unpunctuated texts; automated annotation of named historical entities in transcribed texts; automated OCR post-correction to correct mistaken transcriptions in text generated through OCR; and lastly more speculative and research-oriented tasks such as the application of deep learning to chronological and authorial attribution of historical texts.