[exslt] Generic Templates -- long
Dimitre Novatchev
dnovatchev at yahoo.com
Sat May 26 02:53:48 MDT 2001
Jim Fuller wrote:
> hello dimitre,
>
> can u give the list an overview of the main tenets of authoring generic =
> functions, just a short kick-off to start the debate.
>
> i am interested in seeing how they work, but am slightly lost, either =
> due to my own stupidity or random photons shooting through the medulla =
> stem.
Hi Jim,
I'm replying to you and hope that now you see that generic templates are easy to
understand.
The ***template reference*** is the main concept in the submitted generic templates.
I'm neither going to explain here what generic programming is -- see Musser
(http://www.cs.rpi.edu/~musser/gp/), nor what is polymorphism -- see e.g. Stroustup,
"The C++ Programming Language, third edition".
Coplien gives a good treatment of both topics in his book "Advanced C++ Programming
Styles and Idioms" -- in particular in the chapters: 4. Inheritance; 5.
Object-Oriented Programming; and 7. Reuse and Objects -- especially
7.4.Parameterized Types, or Templates.
"Generic Programming" is defined by Stroustup as expressing algorithms independently
of representation details -- doing this affordably and without logical contortions.
This is governed by the following programming paradigm:
"Decide which algorithms you want;
parameterize them so that they work for
a variety of suitable types and data structures".
Many authors agree that parameterisation is a kind of polymorphism ("parametric
polymorphism", Coplien; compile-time polymorphism (as opposed to using virtual
functions or run-time polymorphism), Stroustup).
While at present XSLT 1.0 and 1.1 lack powerful pre-processing capabilities and this
makes achieving "compile-time polymorphism" extremely difficult or impossible, there
are mechanisms for achieving some limited form of run-time polymorphism.
The possibility to import stylesheets (xsl:import), override some of their specific
templates and/or instantiate the overriden templates (xsl:apply-imports) was
designed to support this form of polymorphism.
However, there exist some serious problems in using this mechanism (Jeni Tennison,
"Reliance on import precedence considered dangerous"
http://sources.redhat.com/ml/xsl-list/2001-02/msg00996.html,
and David Carlisle, http://sources.redhat.com/ml/xsl-list/2001-02/msg01003.html).
As Jeni puts it "When you import the same utility stylesheet (D) into two
stylesheets
(B and C) that you then import into a main stylesheet (A), any
templates, attribute sets, global variables or parameters (and so on
for a few other things) in the first stylesheet that's imported (B)
that override the ones in the utility stylesheet (D) will no longer
work as they would if the stylesheet (B) were run standalone."
This is due to the very static way (import precedence) of resolving ambiguities when
deciding which template to select for instantiation in XSLT.
C++ presents a good example of an efficient and successful implementation of
run-time polymorphism through virtual functions via vtables that contain
pointers/references to functions.
XSLT does not have an equivalent for pointer to a function (template) -- at least
there are no obvious ***dynamic*** ways to reference a template. In accordance with
this situation some of the best experts on the xslt-list were recommending to use
xsl:choose in order to implement dynamic (based on condition, on the value of a
variable,... etc) call of a template.
Using xsl:choose to implement dynamic template instantiation has the same drawbacks
as using a "switch" statement in C/C++. To quote Coplien:
void ringPhones(Telephone *phoneArray[])
{
for (Telephone *p = phoneArray; p; p++) {
switch (p->phoneType()) {
case Telephone::POTS:
((POTSPhone *)p)->ring(); break;
case Telephone::ISDN:
((ISDNPhone *)p)->ring(); break;
case Telephone::OPERATOR:
((OPERATORPhone *)p)->ring(); break;
case Telephone::OTHER:
default:
error(. . . .);
}
}
}
"When run-time operator selection is managed manually at the source level, as with
the switch statement in ringPhones, program evolution becomes tedious and
cumbersome. Even though the code does not need to be rewritten if we add a ring
operator to POSTPhone, it needs to be recompiled under most C++ implementations.
Consider, too, what it would take to add a new phone class: ringPhones would have to
be modified as well as ***all*** similar functions.
The type discriminant enumeration in the base class would need a new element, and
the coder of the new class would need to take care to initialize its type field in
all its constructors.
Everything touched by these changes would have to be recompiled and retested."
Stroustrup also comments on these drawbacks:
"use of a type field is an error-prone technique that leads to maintenance problems.
The problems increase in severity as the size of the program increases because the
use of a type field causes a violation of the ideals of modularity and data hiding.
Each function using a type field must know about the representation and other
details of the implementation of every class derived from the one containing the
type field. It also seems that the existence of any common data accessible from
every derived class, such as a type field, tempts people to add more such data.
The common base thus becomes the repository of all kinds of 'useful information'.
This, in turn, gets the implementation of the base and the derived classes
intertwined
in ways that are most undesirable. For clean design and simpler maintenance we
want to keep separate issues separate and avoid mutual dependencies."
Needless to say, the same kind of drawbacks is evident in the xsl:choose style of
"dynamic" template instantiation.
Why do we have to use the xsl:choose style of "dynamic" template instantiation?
The main reason is that there isn't a direct XSLT equivalent to a C/C++ function
pointer.
By definition template names must be "QName"s, which means that xsl:call-template
cannot use a string variable/parameter as the value of the "name" attribute of
xsl:call-template.
Therefore, if there are only a fixed, known in advance possible values for the
parameter containing the name of the template to be instantiated, this instantiation
can be implemented via xsl:choose with a xsl:when testing for each of the possible
values.
Were a template-reference mechanism announced today, even tomorrow many people would
come with a better way to dynamically instantiate templates -- because the
implementation of virtual functions and run-time polymorphism is well-known.
In my submissions to EXSLT I am using just one such mechanism to implement a
template reference.
This works with templates that are specifically defined to match only a uniquely
defined type of node. The uniqueness of the node-type is guaranteed by defining a
node in a namespace with a unique namespace-uri. The uniqueness of every such
namespace-uri can be guaranteed by using an automated UID-generating tool.
For example, let us define in a xsl:stylesheet a namespace:
(1):
xmlns:unique123="my-Unique-Template-UID-123"
and a node that is a child of xsl:stylesheet:
(2):
<unique123:node/>
For convenience let us put this node in a variable:
(3):
<xsl:variable name="myTemplateReference" select="document('')/*/unique123:*[1]"/>
Let's have a template defined somewhere as:
(4):
<xsl:template match="*[namespace-uri()='my-Unique-Template-UID-123']">
<!-- any parameters -->
<!-- template body -->
</xsl:template>
This is a single and the only template we define to match nodes from the
"my-Unique-Template-UID-123" namespace.
Then the following will instantiate exactly the template above (we haven't told
the namespace-uri to anybody else and nobody knows about it -- it is unique):
(5):
<xsl:apply-templates select="$myTemplateReference">
<!-- any necessary arguments -->
<xsl:apply-templates>
The last and most important thing: the above template instantiation (5) may be
performed in an external stylesheet, possibly written by some unknown programmer.
This guy doesn't know us and wrote his stylesheet years ago, then made it part of a
general stylesheet library. He doesn't care what is the actual value of
$myTemplateReference, what is the template that will be instantiated and even to
some extent what this template is going to do and how. $myTemplateReference is
passed to him as a parameter (node) and he just uses whatever this parameter
contains for the instantiaion of whatever template someone wanted to be
instantiated.
The only thing he cares about is that the template has the parameters he's supplying
and will produce some credible results (of predefined type/semantics).
Because he cares about that, as part of the documentation of his
stylesheet/templates, he has documented the interface of any user-supplied template
to be referenced through the $myTemplateReference parameter. This is a contract
between this author and any authors of a user-supplied template to be treated in
this way.
This is directly analogous to the way virtual functions are used in OOPL. There are
several very good results here:
the long xsl:choose is no longer needed and is replaced by just (5).
regardless of how many different programmers write their own templates to be
instantiated dynamically by this template, it will never need to be updated. In
all cases it will continue to perform correctly.
no additional maintenance costs are required for every new use of the template.
the template can be distributed to everyone -- nobody needs to know how this
template works and to edit/retest it to accomodate for his new templates that
must be instantiated.
the template may be distributed in compiled/encrypted form (we have just the
example of the SUN XSLT compiler now) and the source code need not be made
available.
the original template may be used in unlimited number of different cases for
processing the original author wasn't even aware of.
It is straightforward to conclude that these results are especially beneficial to
the design and development of a general function/template XSLT libraries.
Even without an OO approach we can benefit from using dynamic template instantiation
through a template reference. For example, forget that max() can be a method of any
class derived from an "ordered set" base class, but still instantiate dynamically a
template that implements the "greater-than" relation on the particular type of
node-set passed to the max() function.
The result is still immediately awarding -- for example we can find the node with
the "maximum" value for nodes that have numeric values as easy as finding the
"maximum" node for nodes representing triangles where maximum is defined as the sum
of the lengths of the three sides (perimeter), or as the space of the triangle, or
any other types of nodes and definition of "greater-than" that we cannot even
imagine at this moment.
All of this -- using a single "max" template and only passing to it as parameter a
different reference to a template that implements the "greater-than" relation in
each particular case.
Can we do this without our dynamic instantiation technique? One possible approach is
to generate a new node-set, containing the "numeric values" (perimeter, space,
etc.)of the nodes that must be compared. Then pass this node-set to the "standard"
max() function that operates only on numbers.
There are some drawbacks that are obvious:
(a) need to allocate and use additional space for the new node-set -- this is
proportional to the count() of the original node-set.
(b) need to calculate the "value" -- it is possible that some ordered sets may not
have a natural "numeric value" representation of their nodes.
Huh, maybe we can still avoid the dynamic template instantiation technique by
passing as a second argument to max() an expression that calculates the "numerical
value" of every node?
Well, we still have to pass an additional parameter to max(). Internally it still
will invoke a function/template to calculate the value. What is quite embarassing is
that this internal function/template may not work in some cases for a number of
reasons -- then there's nothing we can do about it -- neither will we be able to
debug the code of max() (most probably it will not be provided) nor will we be able
to extend its power.
OK, for simple cases passing a numerical-value evaluating expression may work well.
Certainly for perimeter, what about for space of triangles? Hmm... we might need
a sqrt() function here... Maybe we would not have this function available, or there
would be limitations which functions may only be used in an expression? What about
much more complicated cases, in which it would be very difficult to evaluate the
numerical value in a single expression? Then we would once again need to be able to
indicate an external, user-defined (by us) function/template that must be
instantiated to produce the numerical value.
Also drawback (b) is still there.
This simple example illustrates well the limitations and shortcomings of a design
decision not to use the full power of dynamic template instantiation for making
"standard" functions as generic as appropriate.
Another frequently asked question is: "Will the XSLT programmers be disconvenienced
by having to use a generic max(), which is more complicated than the 'standard'
max()?"
The answer is: No. We can define and implement the generic max() function/template
in such a way, so that there is a suitable default value for the second parameter --
if the function is called with only the first argument, then the standard ">"
relation on numbers will be used.
Most importantly, any "standard" function can be generalised in this way, so that
the "standard" behaviour will be achieved simply by continuing to use the "standard"
function-call syntax.
A related question is: Will the implementation of a generic function be less
efficient than that of its "standard" counterpart?
The answer is: No, using default values for parameters we can determine a "standard
usage" of a generic function and in this case provide the same efficient old
implementation. When the generic function is used with its full syntax, then it will
be more efficient than having to evaluate a xsl:choose with many xsl:when children.
Not to mention the gains in maintainability, reusability, reliability and
convenience.
In the next example I'll show how I convert a standard max() template implementation
to a generic one that uses dynamic template instantiation through a template
reference passed as parameter.
Then, finally, I'll describe the submitted functions. While in all of the sumitted
functions the generalised operation is "comparison", we should keep in mind that a
huge variety of other useful functions do exist -- imagine a generic repeater()
function that "does something" N times or while/until a condition is true? This
"does something" may be potentially any template (e.g. creating user interface
(html) elements of specific kind, or drawing lines, or...)
Now, here's a template that takes as parameter a nodeset of nodes, whose value is
numeric, and returns the maximum numeric value:
<xsl:template name="get-max">
<xsl:param name="nodes" />
<xsl:choose>
<xsl:when test="count($nodes) = 1">
<xsl:value-of select="$nodes[1]" />
</xsl:when>
<xsl:when test="count($nodes) > 1">
<xsl:variable name="max-of-rest">
<xsl:call-template name="get-max">
<xsl:with-param name="nodes"
select="$nodes[position()!=1]" />
</xsl:call-template>
</xsl:variable>
<xsl:choose>
<xsl:when test="$nodes[1] > $max-of-rest">
<xsl:value-of select="$nodes[1]" />
</xsl:when>
<xsl:otherwise>
<xsl:value-of select="$max-of-rest" />
</xsl:otherwise>
</xsl:choose>
</xsl:when>
<xsl:otherwise>
NaN
<!-- empty node-set -->
</xsl:otherwise>
</xsl:choose>
</xsl:template>
We use it like this:
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="text"/>
<xsl:import href="get-max.xsl"
<xsl:template match="/">
<xsl:call-template name="get-max">
<xsl:with-param name="nodes" select="/numbers/num" />
</xsl:call-template>
</xsl:template>
</xsl:stylesheet>
We decide to make the "get-max" template generic, by changing it to implement
a parameterised "greater-than" relation.
If the template-reference to a template that implements the "greater-than"
relation is passed to "get-max" as a parameter named "gtRef",
then we must replace the hardwired comparison:
<xsl:when test="$nodes[1] > $max-of-rest">
<xsl:value-of select="$nodes[1]" />
</xsl:when>
with
<xsl:variable name="isGreater">
<xsl:apply-templates select="$gtRef">
<xsl:with-param name="n1" select="$nodes[1]"/>
<xsl:with-param name="n2" select="$max-of-rest"/>
</xsl:apply-templates>
</xsl:variable>
<xsl:choose>
<xsl:when test="$isGreater = 'true'">
<xsl:value-of select="$nodes[1]" />
</xsl:when>
We must also define the $gtRef parameter for the "get-max" template:
<xsl:template name="get-max">
<xsl:param name="nodes" />
<xsl:param name="gtRef" select="/.." />
Because "get-max" calls itself recursively, we also must update this call
with the new parameter added:
<xsl:variable name="max-of-rest">
<xsl:call-template name="get-max">
<xsl:with-param name="nodes" select="$nodes[position()!=1]" />
<xsl:with-param name="gtRef" select="$gtRef" />
</xsl:call-template>
</xsl:variable>
And this is everything that was necessary to convert the "get-max" template
to a generic one!
As we can see, the algorithm is essentially left untouched. Doing this is simple and
can be performed in an almost mechanical way.
How does a programmer use the generic "get-max" template?
Just a few changes to the "standard" way are necessary:
1. Define a template that implements the "greater-than" relation.
This template must match a unique node:
<xsl:template match="*[namespace-uri()='gtRef1']">
The parameters must be the first and second node to be compared:
<xsl:param name="n1"/>
<xsl:param name="n2"/>
In this simple example we just return the numerical comparison:
<xsl:value-of select="$n1 > $n2"/>
2. Define the "template reference" to the template above.
We add a namespace definition to the stylesheet:
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:gtRef1="gtRef1">
Then we define a single node in this namespace and a convenience variable to
refer to it:
<gtRef1:node/>
<xsl:variable name="gtRef1" select="document('')/*/gtRef1:*[1]"/>
3. Now we call "get-max" in the following way:
<xsl:call-template name="get-max">
<xsl:with-param name="nodes" select="/numbers/num" />
<xsl:with-param name="gtRef" select="$gtRef1" />
</xsl:call-template>
And now the fun begins -- we can define as many different "greater-than"
implementations
and call "get-max" with any of them (many times) during a single transformation.
Finally, after the ideas are well understood, let me describe the submitted
functions:
1. max() -- the same as already described. Has a convenient default for the second
argument, which allows it to be used on numbers as the non-generic max()
2. min() -- almost identical to max -- only the meaning of the provided template-
reference now is that it is a reference to a template implementing a "less-than"
relation (left entirely to the user to define).
For both min() and max() a minimum/maximum node is returned, as in some cases
the "value" of a node may not be easy to express.
3. binSearch() -- this is a generic implementation of the traditional binary search
algorithm.
It returns the position of a node in a sorted node-set, whose value is equal
to the value of the node passed as a search argument. If no such node is found in
the sorted node-set, then -1 is returned.
Another parameter is a template reference to a template implementing
a "comparison" and returning distinct values for "greater-than", "equal"
and "less-than".
The sorted node-set is "sorted" using the same "greater-than" relation as
implemented by the same referenced template.
This function may be combined with the sort() function to organise nodes in an
efficient way for multiple search operations.
A modified version of binSearch() is used internally in the implementation of the
sort() function. In the case when the value of the search node is not equal to
the values of any of the sorted nodes, then this modified binSearch() returns the
position the search node would have in the sorted node-set, if we put it there
and removed the node with the lowest value.
In my opinion this modified binSearch() -- I call it binSearchReplace() is also a
good candidate for a standard function/template library.
4. sort() -- this is a generic implementation of an efficient (O(Log2(k)*N)) partial
sort algorithm. Briefly described, it produces the positions of the "k highest"
nodes in a node-set. As a special case, when k equals the number of nodes in the
node-set, we get a full sort with complexity O(Log2(N)*N).
The parameters are the node-set to be sorted, the number k for the k highest
nodes' positions to be calculated, a template reference to a template
implementing a "greater-than" relation to be used in the sort.
The node-list produced as result of the sort() is a string containing a sequence
of fixed-length positions of the sorted nodes.
The sort() function will be convenient and better alternative to xsl:sort in
situations where
(1) The values to be sorted are difficult to express in a single expression or in
a sequense of xsl:sort elements, and/or
(2) Nodes cannot be easily mapped to numbers or strings, but a "greater-than"
relation can be implemented.
(3) To solve efficiently the frequent problem of "How can I get the newest 5
articles out of the total 100 available?"
In finishing, let me once again point out that these four proposed functions are
only a few of the large number of functions whose generic implementation will add
additional value for the users.
Cheers,
Dimitre Novatchev.
__________________________________________________
Do You Yahoo!?
Yahoo! Auctions - buy the things you want at great prices
http://auctions.yahoo.com/
More information about the exslt
mailing list