Merging (Was: Re: [Xpath-ng] FIXPath (XPath NG core) straw man draft posted)

Uche Ogbuji uche.ogbuji at fourthought.com
Mon Jan 6 14:14:04 MST 2003


> Hi Uche,
> 
> >> I think that this means that I think that the current definition of
> >> merge in the strawman is right.
> >
> > OK.  I think I've been convinced of that now, except for the operand 
> > restrictions.
> 
> Can you explain what you mean by "the operand restrictions"?

The original draft said:

"FIXPath defines a merge operation, defined on any two lists conprising only 
nodes in document order. The behavior of a merge between two lists eitheor of 
which contain non-node items, or any deviation in document order in their 
nodes is undefined. A merge operation combines the two lists into one, sorts 
the combined list by document order, and removes duplicate nodes."

Joe pointed out that the restriction that the two operand lists already be 
sorted in document order is too harsh.  I think he's right.

Another issue: should operand lists to merge be allowed to have data items?  
I'd say "no" because I think it would be too strange and herd to implement 
efficiently.  We'd have to define a relative sort order between the various 
value types and all that.


> >> I think that we also need to have a non-sorting,
> >> non-de-duplicating, non-flattening operator that performs a mapping
> >> operation. For example:
> >>
> >>   ancestor::* => child::*
> >> 
> >> would return:
> >> 
> >>   (c, (b, d, e), a)
> >> 
> >> or possibly:
> >> 
> >>   (a, (b, d, e), c)
> >
> > Interesting thought.  Could this be done by a function?:
> >
> > compose(ancestor::*, child::*)
> 
> Only if we introduce a 'function' value. For example, the above would
> currently call the compose() function with arguments being the
> ancestors of the current node and the children of the current node.
> What you're after is something like:
> 
>   compose(ancestor::*, function('child::*'))
> 
> where the function() function creates a function value from a string.

Not quite what I had in mind, but I realize I was not clear.  WHat I had in 
mind was more brain bending :-)

compose(funct1, funct2, ..., functN, argument)

Which would basically compute functN(...funct2(funct1(argument)))

And if omitted, as in the case above, argument would be just the context node.

So using your notation:

compose(function('ancestor::*'), function('child::*'))

i.e. true function composition.

Of course this is probably overkill for FIXPath core.


> > I'd like to minimize new syntax if possible.
> 
> I'd like to minimize new data types if possible, at least in the core.
> I definitely think we should have a functional module that includes a
> function data type and methods for generating function values, but I
> really don't think we want to do that in the core.

Minimizing new data types is a reasonable goal.  But I wonder whather there 
should not be a callable-object data type in the core.  I think this would be 
a far more general facility, and much easier to build upon in higher-order 
modules than magic iteration syntax.

But by all means, let's compare examples to be sure all positions are clear...


> > In fact, possible discussion point: how about a principle that we
> > will only add new syntax if it significantly helps improve
> > extensibility.
> >
> > I'd rather not add new syntax just to have neat new core features.
> 
> I personally believe that there are two extra expressions that we
> should include in the core: a conditional expression and a
> mapping/looping expression. I view these as much more important (for
> the core) than many/most of the XPath and EXSLT functions and the
> inclusion of non-abbreviated axes.

I strongly agree that such a *capability* is important in the core.  My worry 
with your approach is that I think we'll be stuck with a relatively weak 
facility in the core which overlaps much more general facilities in add-on 
modules.  I'd like to avoid such overlap.  But perhaps there is a tradeoff to 
be had...

> The principle that I've been applying when considering whether
> something should be in the core or in a separate module is that I
> think it should be possible for a parser written based on the FIXPath
> core to be able to parse any FIXPath expression, even if it can't
> evaluate that expression (because it includes extension functions/axes
> etc. that the implementation doesn't understand). Do you think this is
> a reasonable goal?

Absolutely.  In fact, I think a statement to this effect should be added to 
the core.  However, I don't think I see such a goal weighing in one way or 
another in the looping/conditional-operator versus functional-primimitives 
choice.


> > I think the functional approach I suggest is more general, and thus
> > much more powerful (I would guess David R agrees ;-) )
> 
> That's certainly true, but I think that you'll agree that:
> 
>   (1, 2, 3) => . + 3

I find this weird

> or:
> 
>   for (1, 2, 3) return . + 3

More readable.

> is much more readable/usable than:
> 
>   map(function('. + 3'), (1, 2, 3))
> 
> or:
> 
>   map(f($x): $x + 1, (1, 2, 3))
>   
> We should be careful to balance generality and power with usability.

True.  A good example is Python, I think.

Python's list comprehensions are similar to your proposal:

l = [1,2,3]
[ i+1 for i in l ]

returns [2,3,4]

Python *also* supports a functional primitives approach to this:

map(lambda i: i+1, l)

returns the same thing.

I'll admit that though lambdas and the other functional primitives in Python 
are *far* more powerful than list comprehensions, that I and many other Python 
users prefer to use list comprehensions for the cases where they do happen to 
be adequate (which is pretty often).  This is mostly because they do seem more 
readable.

However, note one important thing, which is the main reason that I have a lot 
of trouble with your proposal: both list comprehensions and lambdas in Python 
allow multiple argument bindings:

l = [1,2,3]
m = [10, 11, 12]

[ i + j for i in l for j in m ]

result: [11, 12, 13, 12, 13, 14, 13, 14, 15]

and even:

[ i + j for (i,j) in zip(l, m) ]
result: [11, 13, 15]

where zip(l, m) computes [[1, 10], [2, 11], [3, 12]]

I think the fact that your approach (using ".") is limited to a single 
function argument makes it too weak.

I could probably be happy with a compromise such as:

for i in (1, 2, 3): $i + 3

Important things about my compromise:

*  use explicit argument bindings, which I think makes for a more useful and 
readable solution

*  avoid "return", which I think will make most people think of the end of 
control flow of a function, rather than "iterate to the next list item"

Any other syntactic ideas?


> > However, I think both such facility and a compose function should be
> > in separate modules than the core. David probably has most of the
> > underpinnings of such a module in FXPath.
> 
> As I've explained above, I agree that we want this full functional
> power within a separate module. But I think that we should provide a
> core set of that useful functionality in the FIXPath core.

I would *really* like for there not to be so much overlap with such different 
syntax between what goes in the core and what goes in a higher-order module.  
But I'm sensing that might be inevitable.  If so, then I'd at least like to at 
increase the generality of what's in the core more towards the 80/20 point 
that I believe is typical.


-- 
Uche Ogbuji                                    Fourthought, Inc.
http://uche.ogbuji.net    http://4Suite.org    http://fourthought.com
A Python & XML Companion - http://www.xml.com/pub/a/2002/12/11/py-xml.html
XML class warfare - http://www.adtmag.com/article.asp?id=6965
MusicBrainz  metadata - http://www-106.ibm.com/developerworks/xml/library/x-thi
nk14.html





More information about the Xpath-ng mailing list