[4suite] character encoding bug
Uche Ogbuji
uche.ogbuji at fourthought.com
Fri Sep 22 15:35:06 MDT 2000
Alexandre Fayolle wrote:
>
> We are experiencing severe problems with iso-8859-1 encoding.
>
> from xml.dom.ext.reader import Sax2
> from xml.dom.ext import Print
> text="""<?xml version='1.0' encoding='iso-8859-1'?>
> <element>?????</element>"""
>
> d=Sax2.FromXml(text)
> Print(d)
>
> gives:
> <element> éèêëïîöôùü</element>
If you don't specify an output encoding you get UTF-8. Let me see if
that's documented. Hmm. Not only is it not documented, but it looks as
if we missed a bit.
OK. In the coming pre-release, you'll be able to specify your output
encoding:
>>> from xml.dom.ext.reader import Sax2
>>> from xml.dom.ext import Print
>>> text="""<?xml version='1.0' encoding='iso-8859-1'?>
... <element>?????</element>"""
>>>
>>> d=Sax2.FromXml(text)
>>> Print(d)
<element>à éèêëïîöôùü</element>>>>
>>> Print(d, encoding='ISO-8859-1')
<element>?????</element>>>>
>>>
Note that we can't do this automatically for you because as we mentioned
in a recent message, the input encoding info is not given to us by SAX.
We always get UTF-8.
> Funny thing is that when outputing html with a XSL transformation,
> everything gets escaped, and '' becomes 'é', for instance.
Hmm. I can't reproduce this. Try running ce_20000527.py in the doc
directory's 4XSLT/test_suite/borrowed (it also uses e-acute) and let me
know what you see running it. Any comparison between that and what
you're doing that does not work will be helpful.
Thanks.
--
Uche Ogbuji Principal Consultant
uche.ogbuji at fourthought.com +1 303 583 9900 x 101
Fourthought, Inc. http://Fourthought.com
4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA
Software-engineering, knowledge-management, XML, CORBA, Linux, Python
More information about the 4suite
mailing list