Language Preservation: A Case Study in Collecting and Digitizing Machine-Tractable Language Data

Jim Cowie, Steve Helmreich, Ron Zacharski

Abstract


In this paper we describe a process for collecting and digitizing machine-tractable resources for lesser-studied languages. We illustrate this process by using examples from the Paraguayan indigenous language Guarani, Chechen, and other languages. By ‘machine-tractable’ we mean that in addition to being readable by people, the resource can also be processed by a computational tool. Our goal in acquiring these resources is to use them for quick ramp-up machine translation. In related work, Nirenburg et al. developed an elicitation system that would guide non-expert language informants through questions about the ecology, inflectional morphology, and syntax of their language and also would lead them through a lexicon development task.1 This information was then used to automatically generate a transfer machine translation system. Our approach replaces this rigid, guided process with the more free-form acquisition of general resources, which could be used by experts to create a machine translation system.

Full Text: PDF

Refbacks

  • There are currently no refbacks.


Humanities Division Logo The Division of the Humanities
1115 East 58th Street, Chicago IL 60637 / Tel 773.702.8512 / Fax 773.702.6305
The University of Chicago