Merging (Was: Re: [Xpath-ng] FIXPath (XPath NG core) straw man draft posted)
Jeni Tennison
jeni at jenitennison.com
Tue Jan 7 04:21:04 MST 2003
Hi Uche,
> Joe pointed out that the restriction that the two operand lists
> already be sorted in document order is too harsh. I think he's
> right.
Yep, I agree.
> Another issue: should operand lists to merge be allowed to have data
> items? I'd say "no" because I think it would be too strange and herd
> to implement efficiently. We'd have to define a relative sort order
> between the various value types and all that.
Or not. It depends on how merge is defined. We could say that merge
does three operations:
1. flattens the two operands, such that the lists do not themselves
contain lists
2. combines the resulting lists *by node identity* such that the
lists do not contain duplicate nodes
3. sorts the resulting list *by document order* such that the nodes
in the list appear in document order
Say that the operator for node identity were ==, such that if $n and
$m are nodes then $n == $m is true iff $n and $m are the same
node. We have three choices about how this operator would be used for
atoms:
a. == is true iff the two values are the same value (such that
3 == 3 is true)
b. == is never true for two atoms since they don't have node identity
c. == raises an error if either argument is an atom
Similarly, say that the operator for determining whether one node was
before another in document order were <<, such that if $n and $m are
nodes then $n << $m is true iff $n precedes $m in document order. We
again have three choices about how this operator would be used for
atoms:
a. << is true iff $n is less than $m in sorting order (where sorting
order is alphabetical for strings, numeric for numbers and false
comes before true for booleans, with implicit casts for
comparisons across data types)
b. << is never true for two atoms since they don't have document
order
c. << raises an error if either argument is an atom
If we chose b for both then there'd be no requirement for defining
sort orders. It would basically say that:
(1, 2, 3) merge (3, 4, 5)
would produce:
(1, 2, 3, 3, 4, 5)
and if $n and $m were nodes, with $m after $n in document order, then:
(1, $n, 3) merge (3, $n, $m, 5)
could be:
(1, $n, 3, 3, $m, 5)
or:
($n, $m, 1, 3, 3, 5)
depending on how we defined the sorting.
What I'm saying is that there *are* other options open to us here that
avoid the issue. I think that we could probably do with some use cases
to work out what we want to be able to do and what's going to be most
useful to enable us to do it.
> Not quite what I had in mind, but I realize I was not clear. WHat I had in
> mind was more brain bending :-)
>
> compose(funct1, funct2, ..., functN, argument)
>
> Which would basically compute functN(...funct2(funct1(argument)))
>
> And if omitted, as in the case above, argument would be just the context node.
>
> So using your notation:
>
> compose(function('ancestor::*'), function('child::*'))
>
> i.e. true function composition.
>
> Of course this is probably overkill for FIXPath core.
Yes.
>> > I'd like to minimize new syntax if possible.
>>
>> I'd like to minimize new data types if possible, at least in the
>> core. I definitely think we should have a functional module that
>> includes a function data type and methods for generating function
>> values, but I really don't think we want to do that in the core.
>
> Minimizing new data types is a reasonable goal. But I wonder whather
> there should not be a callable-object data type in the core. I think
> this would be a far more general facility, and much easier to build
> upon in higher-order modules than magic iteration syntax.
I guess that I would be happy with that. The reason that I've steered
towards saying that callable objects should be in a separate module
rather than the core is that I imagined they were (a) hard to
implement, (b) overkill for the small implementations such as
XPath-in-XML Schema or XPath-in-XForms, (c) hard for users to get a
grip on.
What's your assessment of the implementation problems? Would small
implementations be able to support callable objects?
>> The principle that I've been applying when considering whether
>> something should be in the core or in a separate module is that I
>> think it should be possible for a parser written based on the
>> FIXPath core to be able to parse any FIXPath expression, even if it
>> can't evaluate that expression (because it includes extension
>> functions/axes etc. that the implementation doesn't understand). Do
>> you think this is a reasonable goal?
>
> Absolutely. In fact, I think a statement to this effect should be
> added to the core. However, I don't think I see such a goal weighing
> in one way or another in the looping/conditional-operator versus
> functional-primimitives choice.
Hmm... well, a looping operator, even if it's infix, works in a
different way from a normal operator. For example, compare:
(1, 2, 3) + (. + 3)
with:
(1, 2, 3) map (. + 3)
In the former, the operands are both evaluated and then summed. In the
latter the second 'operand' is actually an expression to be applied to
each of the values returned by the first 'operand'. This is the
reason that / can't be viewed as a normal operator either.
My thinking was that therefore we couldn't just have a separate module
that would define the map operator because the use of that operator
changes the interpretation of the RHS 'operand'. Of course if the same
module defined a 'callable object' data type and said that the map
operator's second operand must be of that type, then you could have:
(1, 2, 3) map function('. + 3')
or whatever, but I still argue that this isn't as clear and usable as
including the expression directly.
Similarly with a conditional expression. Is there a way to make
FIXPath core generic enough such that any XPath processor, even if it
can't understand the contents, will know how to parse something like:
condition ? true : false
I don't think that there is, and as you know an if() function can't
support the functionality that we expect from a conditional statement.
Mind you, perhaps these are only the tip of the iceberg. David R.
seems keen on making FIXPath a fully-blown programming language,
including namespace declarations, function definitions etc. etc. which
will definitely require more in the way of keyword expressions.
Perhaps we can create a generic syntax for extension expressions,
maybe along the lines of:
(#keyword .... )
such that modules could introduce things like:
(#if condition then true else false)
and:
(#for $i in (1, 2, 3): . + 3)
> True. A good example is Python, I think.
>
> Python's list comprehensions are similar to your proposal:
>
> l = [1,2,3]
> [ i+1 for i in l ]
>
> returns [2,3,4]
>
> Python *also* supports a functional primitives approach to this:
>
> map(lambda i: i+1, l)
>
> returns the same thing.
>
> I'll admit that though lambdas and the other functional primitives
> in Python are *far* more powerful than list comprehensions, that I
> and many other Python users prefer to use list comprehensions for
> the cases where they do happen to be adequate (which is pretty
> often). This is mostly because they do seem more readable.
Right, that supports my gut feeling.
> However, note one important thing, which is the main reason that I
> have a lot of trouble with your proposal: both list comprehensions
> and lambdas in Python allow multiple argument bindings:
[snip]
> I could probably be happy with a compromise such as:
>
> for i in (1, 2, 3): $i + 3
>
> Important things about my compromise:
>
> * use explicit argument bindings, which I think makes for a more useful and
> readable solution
>
> * avoid "return", which I think will make most people think of the end of
> control flow of a function, rather than "iterate to the next list item"
This is pretty much where XPath 2.0 is at the moment (except that it
has 'return' rather than a colon and the variable binding syntax is
slightly different). I have a big problem with this, namely that I
really don't think that variable binding should be in FIXPath core. To
me, variable binding is XSLT/XQuery/FIXPath++ territory -- something
that you need if you start using FIXPath as a self-contained
programming language, but that you really don't want if you're just
after a small expression language.
I also note that having tried to use the for expression in XPath 2.0,
I always make the mistake of using . rather than the bound variable
and thinking that position() is going to give me something useful. I
know that others have this experience as well. This argues against
having variable binding in a for expression, or at least *also* having
a syntax for simple looping where the context item is used instead of
an explicit variable binding.
Here's another suggestion: as I discussed above, with the right
definitions of how atoms are compared for identity and sorted, perhaps
we could reuse / here. If atoms are never identity equal and are
counted as equal in terms of document order, then:
(1, 2, 3) / (. + 3)
would provide the simple mapping operation that I'm keen on (the only
difference being that the result list is flattened).
>> As I've explained above, I agree that we want this full functional
>> power within a separate module. But I think that we should provide
>> a core set of that useful functionality in the FIXPath core.
>
> I would *really* like for there not to be so much overlap with such
> different syntax between what goes in the core and what goes in a
> higher-order module. But I'm sensing that might be inevitable. If
> so, then I'd at least like to at increase the generality of what's
> in the core more towards the 80/20 point that I believe is typical.
I think that we should be aiming for the core to:
1. put in place the fundamental syntax that FIXPath processors must
be able to parse
2. address 80% of the requirements of the *small* uses of XPath,
namely XML Schema and XForms
I don't think that FIXPath core should attempt to address 80% of XSLT
users' requirements, for example. I'm very happy to not have a
looping/mapping construct in the core if the fundamental syntax will
allow one to be added in a separate module.
With that in mind, I'd suggest stripping out all the axes aside from
child, attribute, descendant-or-self and parent (i.e. only keeping
those that are used in abbreviated syntax). I'd suggest keeping out
all the EXSLT functions from the core, and indeed stripping out some
of the existing functions. Basically, I think that FIXPath core should
take things away from XPath 1.0 rather than just adding to it... But I
should put this in another thread...
Cheers,
Jeni
---
Jeni Tennison
http://www.jenitennison.com/
More information about the Xpath-ng
mailing list