Digital Text Analysis Working Group

The emergence of the computer has led to a considerable number of digital approaches and areas of research for many aspects of texts: computational linguistics, natural language processing (NLP), text mining, stylometrics, to name but a few. The digital text analysis working group brings together a group of interdisciplinary researchers fortnightly at the GCDH to discuss and better understand these new approaches.  The typical format consists of one participant presenting some tool or method from their own research, followed by intense discussion focusing particularly on the use of this tool or method in the research of the other participants.  The goal is to enhance textual scholarship on the Göttingen Research Campus through the introduction and further development of digital methods.

Computational linguistics focuses on designing algorithms and formal descriptions that can accurately represent, e.g., the morphological and syntactical structures of languages in order to process them computationally.  Speech recognition, machine translation, language parsing, part of speech tagging, and historical linguistics can all be ordered under computational linguistics.

Natural language processing (NLP) is an application of computational linguistics.  It typically focuses on using symbolic/logical and statistical methods machine learning algorithms to extract meaningful information from “natural language,” i.e., non-experimental written or spoken texts.  It shares several tasks with computational linguistics, such as language parsing and machine translation, but also focuses on things such as named entity recognition (NER) and sentiment analysis.

Text mining seeks to extract information from texts.  It focuses on such tasks as text categorization and clustering, sentiment analysis, and document summarization.

Stylometrics uses strategies from the three approaches mentioned above to identify stylistics elements that can lead to, e.g., authorship attribution, identification of forgeries, temporal classification of documents, and identification of translations and translation style.

An excellent introduction to several tools for digital text analysis that are accessible even for beginners and yet still powerful enough to produce genuine research results can be found at the Wiki "Literatur Rechnen".

For any queries regarding the programme of or participation in the working group, please contact Matt Munson.


21 March 2014, Heyne Haus, Seminarraum 1

7 March 2014, Heyne Haus, Seminarraum 1

21 February 2014, Dr. J. Berenike Herrman, "Towards a conceptualization of 'style' in digital text studies"
Heyne Haus, Seminarraum 1

7 February 2014, Sarah Bärtschi, "Combining distant and close reading to represent, measure and analyze the corpus of Alexander von Humboldt."
Heyne Haus, Seminarraum 1

31 January 2014, Prof. Dr. Fotis Jannidis - CANCELLED
Heyne Haus, Seminarraum 1

10 January 2014, Dr. Marco Büchler - CANCELLED
Heyne Haus, Seminarraum 1

13 December 2013, Dr. Kepa Rodriguez, "Multilingual and Semantic Search and Indexing in the EHRI Project"
Heyne Haus, Seminarraum 1

6 December 2013, CANCELLED because of Herrenhausen Conference
Heyne Haus, Seminarraum 2

15 November 2013, Dr. J. Berenike Herrmann, "What Do Kafka and Heidi Have in Common?"
Heyne Haus, Seminarraum 2

18 October 2013, Matthew Munson, "Tracking Semantic Drift in the Biblical Corpus"
Heyne Haus, Seminarraum 1

10 September 2013 - CANCELLED because of the DARIAH-DE Konsortialtreffen
Location: Heyne Haus, Seminarraum 1

3 September 2013 - Dr. J. Berenike Herrmann
Title: "Computing Kafka. What corpus-stylistic measures can tell us about Kafka’s prose"
Location: Heyne Haus, Seminarraum 2

13 August 2013 - CANCELLED
Location: Heyne Haus, Seminarraum 2

30 July 2013 - Dr. Christof Schöch
Working Title: Cross-Genre Text Classification
Location: Heyne Haus, Seminarraum 1

16 July 2013 - Prof. Hugh Craig
Title: "Novelty carries it away": Collective change in the language of English drama from the 1590s to the 1610s

2 July 2013 - Annette Geßner
Title: GERTRUDE und "Weltliteratur im Zitat" - Biblisches in Schillers Drama "Die Räuber"

18 June 2013 - Prof. Abdelhadi Soudi
Title:Standard Arabic-to-Moroccan Sign Language Machine Translation

21 May 2013 - Kepa Rodriguez
Title: Information Retrieval and the Design of a Help Desk System
Location: Heyne Haus, Seminarraum 1

14 May 2013 - Matt Munson
Title: Automatically Detecting Semantic Drift using Collocation Analysis
Location: Heyne Haus, Seminarraum 1

18 March 2013 - Norbert Ankenbauer
Title: Paesi novamente retrovati - Newe unbekanthe landte: Erfahrungen bei der digitalen Edition von Entdeckerberichten aus dem 16. Jahrhundert

4 February 2013 - Gabriel Viehhauser
Title: "Using Phylogenetic Analysis Methods to Produce Digital Editions"

21 January 2013 - Jörg Wettlaufer and Sree Ganesh Thotempudi
Title: "Named Entity Recognition in Historical Corpora. Lessons Learned so Far..."

7 January 2013 - Susanne Friese
Title: "Computer-assisted qualitative data analysis with ATLAS.ti"

17 December 2012 - Annette Geßner
Working Title: GERTRUDE, ETraces, and Textual Reuse

10 December 2012 - Berenike Herrmann
Title: "Doing Digital Text Analysis with “Literary” Students. Clash of Cultures or New Beginnings?"

5 November 2012 - Burkhard Morgenstern
Title: "Reconstructing Phylogenetic Trees: Part II"

29 October 2012 - Burkhard Morgenstern
Title: "Reconstructing Phylogenetic Trees: Part I"

15 October 2012 - Matthew Munson
Title: "Ancient Translational Style: What Can Stylometry Tell Us"

5 and 12 September 2012 - Tobias Blanke
Title: "How Computers Understand Texts: Stuff You Need to Know..."

15 August 2012 – Fotis Jannidis
Title: "Eine korpusbasierte Geschichte des deutschsprachigen Romans"

11 July 2012 – Kepa Rodriguez
Title: "Comparison of Named-Entity-Extraction Tools for Raw OCR Text"

27 June 2012 – Mathias Göbel
Title: "Quantitative Analysis of Unreliable Narration"

13 May 2012 - Donald Brenneis
Title: "Writing, Reading, and Recognition in the Emergent Academy"

2 May 2012 - Kepa Rodriguez
Title: "Active Annotation of Corpora"