Detection of Errors and Correction
in Corpus Annotation

Detecting Errors in Discontinuous Structural Annotation

Markus Dickinson and Walt Detmar Meurers

Proceedings of ACL'05.

Consistency of corpus annotation is an essential property for the many uses of annotated corpora in computational and theoretical linguistics. While some research addresses the detection of inconsistencies in positional annotation (e.g., part-of-speech) and continuous structural annotation (e.g., syntactic constituency), no approach has yet been developed for automatically detecting annotation errors in discontinuous structural annotation. This is significant since the annotation of potentially discontinuous stretches of material is increasingly relevant, from treebanks for free-word order languages to semantic and discourse annotation.

In this paper we discuss how the variation n-gram error detection approach (Dickinson and Meurers, 2003) can be extended to discontinuous structural annotation. We exemplify the approach by showing how it successfully detects errors in the syntactic annotation of the German TIGER corpus (Brants et al., 2002).

