Detection of Errors and Correction
in Corpus Annotation
Software and Resources
DECCA code now on github:
(See older versions and updates below.)
Newer/related software:
- This paper from Depling 2017 reimplements and extends parts of the dependency detection approach with a nicer web interface for evaluating the results:
Marie-Catherine de Marneffe, Matias Grioni, Jenna Kanerva and Filip Ginter. Assessing the Annotation Consistency of the Universal Dependencies Corpora. In Proceedings of the Fourth International Conference on Dependency Linguistics, Depling 2017, pp. 108-115, 2017.
The code for the main components:
- Dependency error detection:
https://github.com/matgrioni/conll-inconsistency - Multiple-choice annotation capability for UD trees:
https://github.com/fginter/flask_shelve
- Dependency error detection:
- The
RIDGES project has published modified DECCA tools used for semi-automatic POS correction
- decca-pos-reduce.py: modified version of decca-pos.py that excludes shorter variation n-grams that appear in longer variation n-grams
- nonfringe.py: filters fringe n-grams from decca-pos output (see README)
Original DECCA software available for download:
decca on github |
9/25/2017:
|
decca-0.3.tar.gz |
10/18/2010:
|
Decca-XML |
11/15/2007:
|
CoNLL2Decca.py
Malt2Decca.xsl |
11/15/2007:
|
decca-0.2.tar.gz |
11/02/2006:
|
decca-0.1.tar.gz |
10/02/2006:
|
We welcome all feedback at decca@ling.osu.edu.