HTML parsing library based on the WHATWG Web Applications 1.0 "HTML5"
specification. The parser is designed to work with all existing
flavors of HTML and implements well-defined error recovery that has been
specified though analysis of the behavior of modern desktop web browsers.
html5lib currently allows parsing to both a custom "simpletree" format
and to an ElementTree, if available. Future releases will include
support for at least one DOM implementation, and it is possible to
implement custom treebuilders although the API should not yet be
This is the first release of html5lib and it is considered alpha quality
software. However, it ships with over 230 passing unit tests covering
most of the specified behavior. Bugs should be reported on the issue
Error handling does not yet conform to the specification; not all errors
are reported and the error messages are not informative.
More information about the project including documentation and
information on getting involved is available on the project page: