[4suite] HTML -> XHTML
Nicolas Chauvat
nico at logilab.com
Fri Sep 29 11:17:06 MDT 2000
Hello,
I'm trying to turn any HTML web page into a nicely laid out XHTML tree
living in memory, then to do XPath queries on it.
-------------------
I've been trying to use html-tidy on the HTML file first, then parse it
using Sax2, but I get parse errors if I have something like
<SCRIPT>
if (1<2) document.write('<B>Hello</B>');
</SCRIPT>
in the HTML code. That looks like there is not solution except writing my
own parser inherited from Sax2.
I tried to have a look at the xml.dom.html.* classes, but got lost
somewhere in the /usr/doc/4Suite-0.9.1/*/test_suite/* directories...
-----------------
What's the preferred way to do this ?
--
Nicolas Chauvat
http://www.logilab.com - "Mais o est donc Ornicar ?" - LOGILAB, Paris (France)
More information about the 4suite
mailing list