Digital Text Analysis Working Group
Completed
The emergence of the computer has led to a considerable number of digital approaches and areas of research for many aspects of texts: computational linguistics, natural language processing (NLP), text mining, stylometrics, to name but a few. The digital text analysis working group brings together a group of interdisciplinary researchers fortnightly at the GCDH to discuss and better understand these new approaches. The typical format consists of one participant presenting some tool or method from their own research, followed by intense discussion focusing particularly on the use of this tool or method in the research of the other participants. The goal is to enhance textual scholarship on the Göttingen Research Campus through the introduction and further development of digital methods.
Computational linguistics focuses on designing algorithms and formal descriptions that can accurately represent, e.g., the morphological and syntactical structures of languages in order to process them computationally. Speech recognition, machine translation, language parsing, part of speech tagging, and historical linguistics can all be ordered under computational linguistics.
Natural language processing (NLP) is an application of computational linguistics. It typically focuses on using symbolic/logical and statistical methods machine learning algorithms to extract meaningful information from “natural language,” i.e., non-experimental written or spoken texts. It shares several tasks with computational linguistics, such as language parsing and machine translation, but also focuses on things such as named entity recognition (NER) and sentiment analysis.
Text mining seeks to extract information from texts. It focuses on such tasks as text categorization and clustering, sentiment analysis, and document summarization.
Stylometrics uses strategies from the three approaches mentioned above to identify stylistics elements that can lead to, e.g., authorship attribution, identification of forgeries, temporal classification of documents, and identification of translations and translation style.
An excellent introduction to several tools for digital text analysis that are accessible even for beginners and yet still powerful enough to produce genuine research results can be found at the Wiki "Literatur Rechnen".
Meetings
21 March 2014, Heyne Haus, Seminarraum 1
7 March 2014, Heyne Haus, Seminarraum 1
21 February 2014, Dr. J. Berenike Herrman, "Towards a conceptualization of 'style' in digital text studies"
 Heyne Haus, Seminarraum 1
 7 February 2014, Sarah Bärtschi, "Combining distant and close reading to represent, measure and analyze the corpus of Alexander von Humboldt."
 Heyne Haus, Seminarraum 1
31 January 2014, Prof. Dr. Fotis Jannidis - CANCELLED
 Heyne Haus, Seminarraum 1
10 January 2014, Dr. Marco Büchler - CANCELLED
 Heyne Haus, Seminarraum 1
13 December 2013, Dr. Kepa Rodriguez, "Multilingual and Semantic Search and Indexing in the EHRI Project"
 Heyne Haus, Seminarraum 1
6 December 2013, CANCELLED because of Herrenhausen Conference
 Heyne Haus, Seminarraum 2
15 November 2013, Dr. J. Berenike Herrmann, "What Do Kafka and Heidi Have in Common?"
 Heyne Haus, Seminarraum 2
 RESCHEDULED!
18 October 2013, Matthew Munson, "Tracking Semantic Drift in the Biblical Corpus"
 Heyne Haus, Seminarraum 1
 10 September 2013 - CANCELLED because of the DARIAH-DE Konsortialtreffen
 Location: Heyne Haus, Seminarraum 1
3 September 2013 - Dr. J. Berenike Herrmann
 Title: "Computing Kafka. What corpus-stylistic measures can tell us about Kafka’s prose"
 Location: Heyne Haus, Seminarraum 2
13 August 2013 - CANCELLED
 Location: Heyne Haus, Seminarraum 2
30 July 2013 - Dr. Christof Schöch
 Working Title: Cross-Genre Text Classification
 Location: Heyne Haus, Seminarraum 1
16 July 2013 - Prof. Hugh Craig
 Title: "Novelty carries it away": Collective change in the language of English drama from the 1590s to the 1610s
2 July 2013 - Annette Geßner
 Title:
21 May 2013 - Kepa Rodriguez
 Title: Information Retrieval and the Design of a Help Desk System
 Location: Heyne Haus, Seminarraum 1
14 May 2013 - Matt Munson
 Title: Automatically Detecting Semantic Drift using Collocation Analysis
 Location: Heyne Haus, Seminarraum 1
18 March 2013 - Norbert Ankenbauer
 Title: Paesi novamente retrovati - Newe unbekanthe landte: Erfahrungen bei der digitalen Edition von Entdeckerberichten aus dem 16. Jahrhundert
4 February 2013 - Gabriel Viehhauser
 Title: "Using Phylogenetic Analysis Methods to Produce Digital Editions"
21 January 2013 - Jörg Wettlaufer and Sree Ganesh Thotempudi
 Title: "Named Entity Recognition in Historical Corpora. Lessons Learned so Far..."
7 January 2013 - Susanne Friese
 Title: "Computer-assisted qualitative data analysis with ATLAS.ti"
17 December 2012 - Annette Geßner
 Working Title: GERTRUDE, ETraces, and Textual Reuse
10 December 2012 - Berenike Herrmann
 Title: "Doing Digital Text Analysis with “Literary” Students. Clash of Cultures or New Beginnings?"
5 November 2012 - Burkhard Morgenstern
 Title: "Reconstructing Phylogenetic Trees: Part II"
29 October 2012 - Burkhard Morgenstern
 Title: "Reconstructing Phylogenetic Trees: Part I"
15 October 2012 - Matthew Munson
 Title: "Ancient Translational Style: What Can Stylometry Tell Us"
5 and 12 September 2012 - Tobias Blanke
 Title: "How Computers Understand Texts: Stuff You Need to Know..."
15 August 2012 – Fotis Jannidis
 Title: "Eine korpusbasierte Geschichte des deutschsprachigen Romans"
11 July 2012 – Kepa Rodriguez
 Title: "Comparison of Named-Entity-Extraction Tools for Raw OCR Text"
27 June 2012 – Mathias Göbel
 Title: "Quantitative Analysis of Unreliable Narration"
13 May 2012 - Donald Brenneis
 Title: "Writing, Reading, and Recognition in the Emergent Academy"
2 May 2012 - Kepa Rodriguez
 Title: "Active Annotation of Corpora"
 Deutsch
 Deutsch