Merging (Was: Re: [Xpath-ng] FIXPath (XPath NG core) straw man
draft posted)
Uche Ogbuji
uche.ogbuji at fourthought.com
Mon Jan 6 14:14:04 MST 2003
> Hi Uche,
>
> >> I think that this means that I think that the current definition of
> >> merge in the strawman is right.
> >
> > OK. I think I've been convinced of that now, except for the operand
> > restrictions.
>
> Can you explain what you mean by "the operand restrictions"?
The original draft said:
"FIXPath defines a merge operation, defined on any two lists conprising only
nodes in document order. The behavior of a merge between two lists eitheor of
which contain non-node items, or any deviation in document order in their
nodes is undefined. A merge operation combines the two lists into one, sorts
the combined list by document order, and removes duplicate nodes."
Joe pointed out that the restriction that the two operand lists already be
sorted in document order is too harsh. I think he's right.
Another issue: should operand lists to merge be allowed to have data items?
I'd say "no" because I think it would be too strange and herd to implement
efficiently. We'd have to define a relative sort order between the various
value types and all that.
> >> I think that we also need to have a non-sorting,
> >> non-de-duplicating, non-flattening operator that performs a mapping
> >> operation. For example:
> >>
> >> ancestor::* => child::*
> >>
> >> would return:
> >>
> >> (c, (b, d, e), a)
> >>
> >> or possibly:
> >>
> >> (a, (b, d, e), c)
> >
> > Interesting thought. Could this be done by a function?:
> >
> > compose(ancestor::*, child::*)
>
> Only if we introduce a 'function' value. For example, the above would
> currently call the compose() function with arguments being the
> ancestors of the current node and the children of the current node.
> What you're after is something like:
>
> compose(ancestor::*, function('child::*'))
>
> where the function() function creates a function value from a string.
Not quite what I had in mind, but I realize I was not clear. WHat I had in
mind was more brain bending :-)
compose(funct1, funct2, ..., functN, argument)
Which would basically compute functN(...funct2(funct1(argument)))
And if omitted, as in the case above, argument would be just the context node.
So using your notation:
compose(function('ancestor::*'), function('child::*'))
i.e. true function composition.
Of course this is probably overkill for FIXPath core.
> > I'd like to minimize new syntax if possible.
>
> I'd like to minimize new data types if possible, at least in the core.
> I definitely think we should have a functional module that includes a
> function data type and methods for generating function values, but I
> really don't think we want to do that in the core.
Minimizing new data types is a reasonable goal. But I wonder whather there
should not be a callable-object data type in the core. I think this would be
a far more general facility, and much easier to build upon in higher-order
modules than magic iteration syntax.
But by all means, let's compare examples to be sure all positions are clear...
> > In fact, possible discussion point: how about a principle that we
> > will only add new syntax if it significantly helps improve
> > extensibility.
> >
> > I'd rather not add new syntax just to have neat new core features.
>
> I personally believe that there are two extra expressions that we
> should include in the core: a conditional expression and a
> mapping/looping expression. I view these as much more important (for
> the core) than many/most of the XPath and EXSLT functions and the
> inclusion of non-abbreviated axes.
I strongly agree that such a *capability* is important in the core. My worry
with your approach is that I think we'll be stuck with a relatively weak
facility in the core which overlaps much more general facilities in add-on
modules. I'd like to avoid such overlap. But perhaps there is a tradeoff to
be had...
> The principle that I've been applying when considering whether
> something should be in the core or in a separate module is that I
> think it should be possible for a parser written based on the FIXPath
> core to be able to parse any FIXPath expression, even if it can't
> evaluate that expression (because it includes extension functions/axes
> etc. that the implementation doesn't understand). Do you think this is
> a reasonable goal?
Absolutely. In fact, I think a statement to this effect should be added to
the core. However, I don't think I see such a goal weighing in one way or
another in the looping/conditional-operator versus functional-primimitives
choice.
> > I think the functional approach I suggest is more general, and thus
> > much more powerful (I would guess David R agrees ;-) )
>
> That's certainly true, but I think that you'll agree that:
>
> (1, 2, 3) => . + 3
I find this weird
> or:
>
> for (1, 2, 3) return . + 3
More readable.
> is much more readable/usable than:
>
> map(function('. + 3'), (1, 2, 3))
>
> or:
>
> map(f($x): $x + 1, (1, 2, 3))
>
> We should be careful to balance generality and power with usability.
True. A good example is Python, I think.
Python's list comprehensions are similar to your proposal:
l = [1,2,3]
[ i+1 for i in l ]
returns [2,3,4]
Python *also* supports a functional primitives approach to this:
map(lambda i: i+1, l)
returns the same thing.
I'll admit that though lambdas and the other functional primitives in Python
are *far* more powerful than list comprehensions, that I and many other Python
users prefer to use list comprehensions for the cases where they do happen to
be adequate (which is pretty often). This is mostly because they do seem more
readable.
However, note one important thing, which is the main reason that I have a lot
of trouble with your proposal: both list comprehensions and lambdas in Python
allow multiple argument bindings:
l = [1,2,3]
m = [10, 11, 12]
[ i + j for i in l for j in m ]
result: [11, 12, 13, 12, 13, 14, 13, 14, 15]
and even:
[ i + j for (i,j) in zip(l, m) ]
result: [11, 13, 15]
where zip(l, m) computes [[1, 10], [2, 11], [3, 12]]
I think the fact that your approach (using ".") is limited to a single
function argument makes it too weak.
I could probably be happy with a compromise such as:
for i in (1, 2, 3): $i + 3
Important things about my compromise:
* use explicit argument bindings, which I think makes for a more useful and
readable solution
* avoid "return", which I think will make most people think of the end of
control flow of a function, rather than "iterate to the next list item"
Any other syntactic ideas?
> > However, I think both such facility and a compose function should be
> > in separate modules than the core. David probably has most of the
> > underpinnings of such a module in FXPath.
>
> As I've explained above, I agree that we want this full functional
> power within a separate module. But I think that we should provide a
> core set of that useful functionality in the FIXPath core.
I would *really* like for there not to be so much overlap with such different
syntax between what goes in the core and what goes in a higher-order module.
But I'm sensing that might be inevitable. If so, then I'd at least like to at
increase the generality of what's in the core more towards the 80/20 point
that I believe is typical.
--
Uche Ogbuji Fourthought, Inc.
http://uche.ogbuji.net http://4Suite.org http://fourthought.com
A Python & XML Companion - http://www.xml.com/pub/a/2002/12/11/py-xml.html
XML class warfare - http://www.adtmag.com/article.asp?id=6965
MusicBrainz metadata - http://www-106.ibm.com/developerworks/xml/library/x-thi
nk14.html
More information about the Xpath-ng
mailing list