Detection of Errors and Correction
in Corpus Annotation

A Simple Method for Tagset Comparison

Markus Dickinson and Charles Jochim

Proceedings of the 6th Language Resources and Evaluation Conference (LREC 2008). Marrakech, Morocco.

Based on the idea that local contexts predict the same basic category across a language, we develop a simple method for comparing tagsets across corpora. The principle differences between tagsets are evidenced by variation in categories in one corpus in the same contexts where another corpus exhibits only a single tag. Such mismatches highlight differences in the definitions of tags which are crucial when porting technology from one annotation scheme to another.

