[4suite] Printers (cued)

Uche Ogbuji uche.ogbuji at fourthought.com
Tue Sep 19 13:15:19 MDT 2000


Alexandre Fayolle wrote:

> We have another problem with printers, but maybe it's something much
> deeper and parsers are involved. We know that Sax2.FromXml eats XML
> comments. This is a bit annoying since a FromXml followed by a PrettyPrint
> will loose comments in a file. However, we can cope with it, because the
> real XML content of the file is somehow preserved. Once you know that, you
> can take care of what you do and avoid comments in 'sensible' files.

Unfortunately, there's not much we can do about this.  Pyexpat does not
provide comments.  I think there was some debate about this on the
xml-sig, but I forget the outcome.

> Something more annoying are the xml header and doctype of the file.

The lack of XML declaration is a problem with the DOM.  Here, for
instance is an excerpt from a message of Joe Kesselman, DOM WG member.

"The question of how to handle the XML Declaration was also passed off
to
Level 3; there isn't agreement yet re whether it's a separate node, a
set
of additional fields on the Document node, or something else entirely. I
_think_ the "Load/Save" chapter has that on their issues list, if not,
Level 3 Core should address it."

And in fact, the L3 working draft, which just emerged a few weeks ago,
does address this: in the form of additional attributes on document, but
we don't plan to implement L3 until at least the next release.  I guess
in the interim one could define a special node type that holds the data
and hack Printer.py to make it return the declaration, but I must be
frank: I'm not sure we'll be the ones to do this.  It will be enough
work jusyt dealing with DOM L3.

As for the doctype, you're right: that should be printed out.  I'm
working on that now.

> One of
> my co-workers signaled me that when a file with a doctype is FromXML'ed
> and Print'ed back, the doctype declaration is lost. I have reproduced it,
> but I have not had time to investigate further to know if the doctype is
> present in the DOM tree but not Printed, or if the Sax parser garbles it
> as it does with comments. This is a more serious issue, for which we shall
> have to find a workaround, probably be writing a wrapper around
> (Pretty)Print.
> 
> Another minor issue has to do with (Pretty)Printing document fragments:
> >>> PrettyPrint(d.documentElement.childNodes)
> Traceback (innermost last):
>   File "<stdin>", line 1, in ?
>   File "/home/alf/xmlSig/repository/xml/xml/dom/ext/__init__.py", line
> 206, in PrettyPrint
>     if root.ownerDocument.isHtml():
>   File "/home/alf/xmlSig/repository/xml/xml/dom/NodeList.py", line 109, in
> __getattr__
>     return getattr(NodeList, name)
> AttributeError: ownerDocument

childNodes returns a node-list, not a document fragment.  Printer is
only designed to print nodes (including docfrags).  To print a nodelist,
just do

print d.documentElement.childNodes

Now, you can also do:

    d = Sax2.FromXml(source_1)
    df = d.createDocumentFragment()
    for n in d.documentElement.childNodes:
        df.appendChild(n.cloneNode(1))
    if len(df.childNodes) != len(d.documentElement.childNodes):
        tester.error('Docfrag append error')
    if df.childNodes.length != d.documentElement.childNodes.length:
        tester.error('Docfrag append error')
    stream = cStringIO.StringIO()
    PrettyPrint(df, stream=stream)
    result = stream.getvalue()

and result should be:

        <ENTRY ID='pa'>
                <NAME>Pieter Aaron</NAME>
                <ADDRESS>404 Error Way</ADDRESS>
                <PHONENUM DESC='Work'>404-555-1234</PHONENUM>
                <PHONENUM DESC='Fax'>404-555-4321</PHONENUM>
                <PHONENUM DESC='Pager'>404-555-5555</PHONENUM>
                <EMAIL>pieter.aaron at inter.net</EMAIL>
        </ENTRY>
        <ENTRY-LINK xmlns:xlink='http://www.w3.org/XML/XLink/0.9'
xlink:href='addr_book2.xml' xlink:link='simple'/>
        <ENTRY ID='en'>
                <NAME>Emeka Ndubuisi</NAME>
                <ADDRESS>42 Spam Blvd</ADDRESS>
                <PHONENUM DESC='Work'>767-555-7676</PHONENUM>
                <PHONENUM DESC='Fax'>767-555-7642</PHONENUM>
                <PHONENUM DESC='Pager'>800-SKY-PAGEx767676</PHONENUM>
                <EMAIL>endubuisi at spamtron.com</EMAIL>
        </ENTRY>
        <ENTRY ID='vz'>
                <NAME>Vasia Zhugenev</NAME>
                <ADDRESS>2000 Disaster Plaza</ADDRESS>
                <PHONENUM DESC='Work'>000-987-6543</PHONENUM>
                <PHONENUM DESC='Cell'>000-000-0000</PHONENUM>
                <EMAIL>vxz at magog.ru</EMAIL>
        </ENTRY>


> However, in PrintVisitor, a visitNodeList() method is provided. It's just
> forgotten in visit()

visitNodeList is not meant to be called as the top-level invocation to
Print.  I suppose the docs are lacking here.

-- 
Uche Ogbuji                               Principal Consultant
uche.ogbuji at fourthought.com               +1 303 583 9900 x 101
Fourthought, Inc.                         http://Fourthought.com 
4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA
Software-engineering, knowledge-management, XML, CORBA, Linux, Python



More information about the 4suite mailing list