[4suite] Printers (cued)
Uche Ogbuji
uche.ogbuji at fourthought.com
Tue Sep 19 13:15:19 MDT 2000
Alexandre Fayolle wrote:
> We have another problem with printers, but maybe it's something much
> deeper and parsers are involved. We know that Sax2.FromXml eats XML
> comments. This is a bit annoying since a FromXml followed by a PrettyPrint
> will loose comments in a file. However, we can cope with it, because the
> real XML content of the file is somehow preserved. Once you know that, you
> can take care of what you do and avoid comments in 'sensible' files.
Unfortunately, there's not much we can do about this. Pyexpat does not
provide comments. I think there was some debate about this on the
xml-sig, but I forget the outcome.
> Something more annoying are the xml header and doctype of the file.
The lack of XML declaration is a problem with the DOM. Here, for
instance is an excerpt from a message of Joe Kesselman, DOM WG member.
"The question of how to handle the XML Declaration was also passed off
to
Level 3; there isn't agreement yet re whether it's a separate node, a
set
of additional fields on the Document node, or something else entirely. I
_think_ the "Load/Save" chapter has that on their issues list, if not,
Level 3 Core should address it."
And in fact, the L3 working draft, which just emerged a few weeks ago,
does address this: in the form of additional attributes on document, but
we don't plan to implement L3 until at least the next release. I guess
in the interim one could define a special node type that holds the data
and hack Printer.py to make it return the declaration, but I must be
frank: I'm not sure we'll be the ones to do this. It will be enough
work jusyt dealing with DOM L3.
As for the doctype, you're right: that should be printed out. I'm
working on that now.
> One of
> my co-workers signaled me that when a file with a doctype is FromXML'ed
> and Print'ed back, the doctype declaration is lost. I have reproduced it,
> but I have not had time to investigate further to know if the doctype is
> present in the DOM tree but not Printed, or if the Sax parser garbles it
> as it does with comments. This is a more serious issue, for which we shall
> have to find a workaround, probably be writing a wrapper around
> (Pretty)Print.
>
> Another minor issue has to do with (Pretty)Printing document fragments:
> >>> PrettyPrint(d.documentElement.childNodes)
> Traceback (innermost last):
> File "<stdin>", line 1, in ?
> File "/home/alf/xmlSig/repository/xml/xml/dom/ext/__init__.py", line
> 206, in PrettyPrint
> if root.ownerDocument.isHtml():
> File "/home/alf/xmlSig/repository/xml/xml/dom/NodeList.py", line 109, in
> __getattr__
> return getattr(NodeList, name)
> AttributeError: ownerDocument
childNodes returns a node-list, not a document fragment. Printer is
only designed to print nodes (including docfrags). To print a nodelist,
just do
print d.documentElement.childNodes
Now, you can also do:
d = Sax2.FromXml(source_1)
df = d.createDocumentFragment()
for n in d.documentElement.childNodes:
df.appendChild(n.cloneNode(1))
if len(df.childNodes) != len(d.documentElement.childNodes):
tester.error('Docfrag append error')
if df.childNodes.length != d.documentElement.childNodes.length:
tester.error('Docfrag append error')
stream = cStringIO.StringIO()
PrettyPrint(df, stream=stream)
result = stream.getvalue()
and result should be:
<ENTRY ID='pa'>
<NAME>Pieter Aaron</NAME>
<ADDRESS>404 Error Way</ADDRESS>
<PHONENUM DESC='Work'>404-555-1234</PHONENUM>
<PHONENUM DESC='Fax'>404-555-4321</PHONENUM>
<PHONENUM DESC='Pager'>404-555-5555</PHONENUM>
<EMAIL>pieter.aaron at inter.net</EMAIL>
</ENTRY>
<ENTRY-LINK xmlns:xlink='http://www.w3.org/XML/XLink/0.9'
xlink:href='addr_book2.xml' xlink:link='simple'/>
<ENTRY ID='en'>
<NAME>Emeka Ndubuisi</NAME>
<ADDRESS>42 Spam Blvd</ADDRESS>
<PHONENUM DESC='Work'>767-555-7676</PHONENUM>
<PHONENUM DESC='Fax'>767-555-7642</PHONENUM>
<PHONENUM DESC='Pager'>800-SKY-PAGEx767676</PHONENUM>
<EMAIL>endubuisi at spamtron.com</EMAIL>
</ENTRY>
<ENTRY ID='vz'>
<NAME>Vasia Zhugenev</NAME>
<ADDRESS>2000 Disaster Plaza</ADDRESS>
<PHONENUM DESC='Work'>000-987-6543</PHONENUM>
<PHONENUM DESC='Cell'>000-000-0000</PHONENUM>
<EMAIL>vxz at magog.ru</EMAIL>
</ENTRY>
> However, in PrintVisitor, a visitNodeList() method is provided. It's just
> forgotten in visit()
visitNodeList is not meant to be called as the top-level invocation to
Print. I suppose the docs are lacking here.
--
Uche Ogbuji Principal Consultant
uche.ogbuji at fourthought.com +1 303 583 9900 x 101
Fourthought, Inc. http://Fourthought.com
4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA
Software-engineering, knowledge-management, XML, CORBA, Linux, Python
More information about the 4suite
mailing list