[4suite-dev] true RFC 2396 relative URI ref resolution

Mike Brown mike at skew.org
Wed Nov 27 05:58:04 MST 2002


I have checked in an absolutize() function into Ft/Lib/Uri.py. I believe it
provides fully RFC 2396-compliant resolution of relative URI references to
absolute form. An example of how it is supposed to be used is provided in the
doc string.

It has not been optimized for speed, nor has it been tested anywhere within
4Suite. However, if the cost is not too great, the intent is to use it in
places where we currently call urllib.basejoin() or urlparse.urljoin(),
neither of which are 100% sufficient for correct resolution of URI references.

Among the deficiencies of basejoin() and urljoin():
    # 1. urllib.basejoin() mishandles the '' reference. (badly)
    # 2. Both omit an empty authority component from 'file' URIs.
    # 3. Both fail to distinguish between the base and the document.

Some examples are references like '' or '#frag1' or '?q=a' ... these are
references to the "current" document, *regardless* of the base URI provided
for resolving relative references. The caller is not supposed to look any
further (e.g., it is not to re-fetch the resource). None is returned in this
situation, to make things easier on the caller, although the caller must still
handle fragments themselves.

The same situation occurs if a reference, when merged with a base, and
ignoring any fragment, ends up being the document URI. Thus, the function
needs to know the document URI. If the base and document URI are the same
(they usually are), you can just provide the document URI and it will assume
that the base is the same.

The function passes the tests in RFC 2396 appendix C, and also a few I made up
like absolutize('../../doc.xml', 'http://host/doc.xml',
'http://host/path/to/file.txt') which returns None since it ends up resolving
to the document URI. I'll see about getting a test suite checked in later.

-Mike



More information about the 4suite-dev mailing list