[4suite] amara: Writing drops DOCTYPE
Uche Ogbuji
uche at ogbuji.net
Tue Aug 21 11:40:36 MDT 2007
Brendon Costa wrote:
> This is a question about amara (
> http://uche.ogbuji.net/tech/4suite/amara/ ), which said to send
> questions to this list. Anyhow, i have been using amara for a while now
> in my own small python script that is given an (docbook) XML document
> and will update certain elements in it. It does this by looking for
> elements with certain id's and then runs some commands from which the
> stdout and stderr of the command are then used as the value of the xml
> element.
>
> Anyhow just running some simple code:
>
> doc = amara.parse(sys.argv[1])
> print doc.xml()
>
> and feeding the script a refentry docbook xml file, i find that the
> <!DOCTYPE element at the top of the xml file is lost in the resulting
> output.
>
> An example for the start of a docbook refentry looks like follows:
>
>
> <?xml version="1.0" encoding="UTF-8"?>
> <!DOCTYPE refentry PUBLIC "-//OASIS//DTD DocBook XML V4.1.2//EN"
> "http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd">
> <refentry id="edoc">
> ....
> </refentry>
>
>
> After running through amara, i get:
>
> <?xml version="1.0" encoding="UTF-8"?>
> <refentry id="edoc">
> ...
> </refentry>
>
> Is there anything i am missing or is this a bug in amara?
>
No, it's not a bug, and it's pretty standard XML behavior coming form
the fact that most XML systems work on some data model of the XML,
rather than the literal character stream.
Doctypes are what we call a serialization concern, and we do support
serialization control. You can tell Amara to emit the doctypes as follows:
>>> import amara
>>> XML = """<?xml version="1.0" encoding="UTF-8"?>
... <!DOCTYPE refentry PUBLIC "-//OASIS//DTD DocBook XML V4.1.2//EN"
... "http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd">
... <refentry id="edoc">
... ....
... </refentry>"""
>>> doc = amara.parse(XML)
>>> print
doc.xml(doctypeSystem=u"http://www.oasis-open.org/docbook/xml/4.1.2/d
ocbookx.dtd", doctypePublic=u"-//OASIS//DTD DocBook XML V4.1.2//EN",
indent=u"y
es")
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE refentry PUBLIC "-//OASIS//DTD DocBook XML V4.1.2//EN"
"http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd">
<refentry id="edoc">
....
</refentry>
>>>
But I did find a bug. The above violates DRY. You should be able to do:
print doc.xml(doctypeSystem=doc.xml_sysid, doctypePublic=doc.xml_pubid,
indent=u"yes")
But Amara is not setting up those data members rightly. I'll look into
that.
--
Uche Ogbuji http://uche.ogbuji.net
Founding Partner, Zepheira http://zepheira.com
Linked-in profile: http://www.linkedin.com/in/ucheogbuji
Articles: http://uche.ogbuji.net/tech/publications/
More information about the 4suite
mailing list