Xlint - an Error-Tolerant XML Parser
What is Xlint?
According to the Extensible Markup Language (XML) specification, violations of well-formedness constraints are fatal errors - once a fatal error is detected, the XML processor must not continue normal processing. Due to this requirement, all existing XML parsers will stop right after the first well-formedness error is detected. To be able to use an XML document for any purpose, we need to first remove all the well-formedness errors. Using a conventional parser to detect and fix the errors causes us to repeatedly run the parser to detect and remove each error. This is very inefficient for a huge XML document with many errors (some of them may be systematic errors caused by improper global replacements). A desirable parser should never stop and report all the well-formedness errors during or after a complete parse of the XML document. The Xlint is such an error-tolerant XML parser to facilitate the error removal of large XML documents. Xlint is implemented using Perl.
How Xlint Works?
Based on the XML document structure, we classify the errors in a non-well-formed XML document into the following two types:
(1) Syntax errors, including:
Syntax error for the xml declaration;
Expect the attribute for version info;
Invalid version number assignment;
Invalid encoding name assignment;
Invalid assignment for standalone document declaration;
The specified attribute was not expected at this location;
Syntax error for the tag (such as missing the end bracket);
Duplicate DocType declaration;
Syntax error for the comment;
Syntax error for the processing instruction;
Expect white space.
(2) Structural errors, including:
Missing the start tag;
Missing the end tag.
Xlint handles the first type of error in a recursive descent fashion, parsing all the constructs according to the grammar. The second type of error is handled by creating a tag-stack and explicitly manipulating the tags encountered in the XML document. For more detail, please refer to the Xlint document.
How to Use Xlint?
The Xlint is executed by the command:
perl xlint.pl <file_name> [-v |-v <number_of_chars>]
You must supply the “file_name” parameter which is the absolute or relative file name of the XML document to be parsed. Followed by the “file_name” you can use the optional parameters “-v” or “-v number_of_chars”:
The verbose mode with default context length. A context of 30 characters around the error position is displayed.
The verbose mode with given context length. The length of error context is set to number_of_chars.
For questions or comments, please contact Yuhui Jin (firstname.lastname@example.org) or Juan Fernando Arguello (email@example.com). Last modified May 27th, 2015.