Validator.nu HTML Parser 1.0.7 Released
There is now a new release of the Validator.nu HTML Parser. Change highlights:
- Adds optional support for heuristic encoding sniffing using the ICU4J sniffer, jchardet or both.
- Adds support for rewinding and reparsing when becoming confident about the character encoding and the tentative encoding was wrong.
- Performs encoding name matching per spec instead of using the JDK mechanism.
- Implements spec changes up until just before SVG and MathML support. (Those will merit 1.1 or something.)
- Warning: The semantics of the doctype token have changed in case you have your own token handler (unlikely).