Detection of Errors and Correction
in Corpus Annotation

From Detecting Errors to Automatically Correcting Them

Markus Dickinson

Proceedings of EACL'06.

Faced with the problem of annotation errors in part-of-speech (POS) annotated corpora, we develop a method for automatically correcting such errors. Building on top of a successful error detection method, we first try correcting a corpus using two off-the-shelf POS taggers, based on the idea that they enforce consistency; with this, we find some improvement. After some discussion of the tagging process, we alter the tagging model to better account for problematic tagging distinctions. This modification results in significantly improved performance, reducing the error rate of the corpus.

Electronically available file formats:

Paper: .pdf (63K)

Bibtex entry:

@InProceedings{dickinson:06, 
  author =       {Markus Dickinson}, 
  title =        {From Detecting Errors to Automatically Correcting Them}, 
  booktitle =    {Proceedings of the 11th Conference of the European  
                  Chapter of the Association for Computational Linguistics  
                  (EACL-06)}, 
  address =      {Trento, Italy}, 
  pages =        {265--272},
  year =         {2006},
  url =  {http://www9.georgetown.edu/faculty/mad87/papers/dickinson-06.html}
}

Detection of Errors and Correction in Corpus Annotation

From Detecting Errors to Automatically Correcting Them

Markus Dickinson

Detection of Errors and Correction
in Corpus Annotation