[4suite] Announcing PyXPath 1.0

Martin v. Loewis martin at loewis.home.cs.tu-berlin.de
Mon Dec 11 16:24:52 MST 2000


After recent discussions on removing lex and yacc from 4XPath, I got
interested in writing a 100% pure XPath parser in Python, using
available parser generators.

The first result of this research is attached below. It hasn't been
tested much, but it does recognize the LocationPath expressions that
are given as examples in the XPath spec.

The parser is based on YAPPS. Since YAPPS is LL(1), some rewriting of
the grammar was necessary to make it LL(1).

I found that the generated scanner class of YAPPS is not usable for
XPath: there is a number of context-sensitive aspects in the XPath
lexis that make the straight-forward longest-match approach of YAPPS
unsuitable.

In particular, a regex lexer cannot distinguish between an NCName and
a FunctionName, and may decide to return an OperatorName in places
where it shouldn't. I tried resolving the former problem by only
having NCName as a token, but that caused a conflict in the LL(1)
parsing algorithm, which could not tell whether an expression was
going to be a FunctionCall (that would require to look ahead to the
LPAREN).

I haven't done any performance measurements with this grammar
yet. Also, it returns some ad-hoc data structure as the parse tree. If
there is interest, I will try to have it generate 4XPath data
structures; I'd probably need help from a 4Suite expert here.

I have tested the capability of parsing a Unicode string. The
definition of an NCName needs further work, since it does not yet
reflect the set of characters that count as letters in XML (or what
else is allowed in NCNames).

Regards,
Martin

-------------- next part --------------
A non-text attachment was scrubbed...
Name: PyXPath.tgz
Type: application/octet-stream
Size: 15931 bytes
Desc: not available
Url : http://lists.fourthought.com/pipermail/4suite/attachments/20001212/4ed435ec/PyXPath.obj
-------------- next part --------------






More information about the 4suite mailing list