Detection of Errors and Correction
in Corpus Annotation

Introduction

The success of data-driven approaches and stochastic modeling in computational linguistic research and applications is rooted in the availability of electronic natural language corpora. Despite the central role that annotated corpora play for computational linguistic research and applications, the question of how errors in the annotation of corpora can be detected and corrected has received only little attention. The DECCA project is designed to address this important gap by exploring an error detection and correction method with potential applicability to a wide range of corpus annotations.

DECCA is funded by the National Science Foundation as NSF project IIS 0623837.

Any opinions, findings and conclusions or recomendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation (NSF).