HTML解析器对比
外观
此條目目前正依照其他维基百科上的内容进行翻译。 (2014年5月) |
Parsing HTML is a automated task, performed by (so called) HTML parsers. They have two main purposes:
- HTML traversal: offer a interface for programmers to easily access and modify of the "HTML string code". Canonical example: DOM parsers.
- HTML clean: to fix invalid HTML and to improve the layout and indent style of the resulting markup. Canonical example: HTML Tidy.
- * Latest release (of significant changes) date.
- ** sanitize (generating standard-compatible web-page, reduce spam, etc.) and clean (strip out surplus presentational tags, remove XSS code, etc.) HTML code.
- *** Updates HTML4.X to XHTML or to HTML5, converting deprecated tags (ex. CENTER) to valid ones (ex. DIV with style="text-align:center;").
References
- ^ 12.2 Parsing HTML documents — HTML Standard
- ^ http://www.crummy.com/software/BeautifulSoup/
- ^ Releases · html5lib/html5lib-python
- ^ Bug #53300 for HTML-Parser: HTML 5
- ^ HTML Tidy for Windows
- ^ HTML Tidy for Windows
- ^ Tidy parser example: class.tidynode of PHP
- ^ HtmlCleaner is distributed under BSD License
- ^ Jericho HTML Parser - Browse /jericho-html/3.3 at SourceForge.net
- ^ jsoup/CHANGES at master · jhy/jsoup · GitHub
- ^ JTidy - Browse /JTidy at SourceForge.net
- ^ libxml2 Releases
- ^ NekoHTML | Change History