Detection of Errors and Correction
in Corpus Annotation

Decca-XML

In working with a range of dependency corpora which make different assumptions about dependency structures, we found that we needed a format that allowed the possibility for a word to have more than one head or possibly no head at all. The Malt-XML format provided a useful and simple XML format for dependency treebanks, so we modified and extended the Malt-XML schema to add the features we needed.

The main differences between Malt-XML and Decca-XML:

We provide the Decca-XML schema and scripts to convert Malt-XML and the CoNLL-X shared task format to Decca-XML:

Decca-XML.xsd Decca-XML schema
Malt2Decca.xsl XSLT conversion from Malt-XML to Decca-XML
CoNLL2Decca.py python script to convert CoNLL-X shared task tabular format to Decca-XML

06/15/2011: updated to fix an XML character encoding bug