[4suite] character encoding bug

Uche Ogbuji uche.ogbuji at fourthought.com
Fri Sep 22 15:35:06 MDT 2000


Alexandre Fayolle wrote:
> 
> We are experiencing severe problems with iso-8859-1 encoding.
> 
> from xml.dom.ext.reader import Sax2
> from xml.dom.ext import Print
> text="""<?xml version='1.0' encoding='iso-8859-1'?>
> <element>?????</element>"""
> 
> d=Sax2.FromXml(text)
> Print(d)
> 
> gives:
> <element> éèêëïîöôùü</element>


If you don't specify an output encoding you get UTF-8.  Let me see if
that's documented.  Hmm.  Not only is it not documented, but it looks as
if we missed a bit.

OK.  In the coming pre-release, you'll be able to specify your output
encoding:


>>> from xml.dom.ext.reader import Sax2
>>> from xml.dom.ext import Print
>>> text="""<?xml version='1.0' encoding='iso-8859-1'?> 
... <element>?????</element>"""
>>> 
>>> d=Sax2.FromXml(text)
>>> Print(d)
<element>àéèêëïîöôùü</element>>>> 
>>> Print(d, encoding='ISO-8859-1')
<element>?????</element>>>> 
>>> 


Note that we can't do this automatically for you because as we mentioned
in a recent message, the input encoding info is not given to us by SAX. 
We always get UTF-8.

> Funny thing is that when outputing html with a XSL transformation,
> everything gets escaped, and '' becomes '&Atilde;&copy;', for instance.

Hmm.  I can't reproduce this.  Try running ce_20000527.py in the doc
directory's 4XSLT/test_suite/borrowed (it also uses e-acute) and let me
know what you see running it.  Any comparison between that and what
you're doing that does not work will be helpful.

Thanks.

-- 
Uche Ogbuji                               Principal Consultant
uche.ogbuji at fourthought.com               +1 303 583 9900 x 101
Fourthought, Inc.                         http://Fourthought.com 
4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA
Software-engineering, knowledge-management, XML, CORBA, Linux, Python



More information about the 4suite mailing list