Alph XPointer Support
Draft J87
The Alph Project: <http://alph.io>
This document indexed at: <http://alph.io/index.html>
©2019 Adam C. Moore (LÆMEUR) <mailto:adam@laemeur.com>
For addressing fragments of XML/HTML documents in Alph, we use the element() and range() XPointer schemes. These are some notes about our implementation.
element()
The element() scheme provides a simple method for addressing nodes within an XML document via a child sequence. The element selector looks like this:
element({child sequence})
A child sequence is a slash-separated list of integers, each integer representing a one-based index of the child nodes of a containing node. For example, a child sequence of:
/1/2/3
Reads as: the third child node of the second child node of the first child node of the root node. It's very simple. That's why we like it.
The leading slash may be omitted.
Alternatively, the first "child"in of a child sequence may be a barename pointer, so, for example, if you have an element in a document with the ID "foo_element", the following child sequence:
foo_element/1/2
Reads as: the second child node of the first child node of foo_element.
IMPORTANT: The element() scheme specification[1] uses the term "element" throughout — however, in a non-normative appendix to the specification for the xpointer() scheme ("On Points and Ranges"[2]), Element nodes AND Text nodes are shown as the children in a child sequence. This is far more useful, and is how we've implemented element() in Alph.
range()
The range() scheme appears in the aforementioned appendix to the xpointer() spec[2], but it's arguably the most useful thing in that document as it directly correlates to a DOM Range. The basic structure is:
range(start_pointer, end_pointer)
Start/end pointers have the structure:
[barename/]child-sequence[.offset]
or:
barename[.offset]
That is, a barename alone is fine; a child-sequence alone is fine; a barename followed by a child sequence is fine — these work exactly as described for the element() scheme.
With the range() scheme, there is the addition of an optional offset component to a pointer. If you're familiar with DOM Ranges, you already know how the offset works. The offset is a zero-based positional index to the children of the last node specified in the element sequence. If the last specified node is a Text node, then the offset points in-between the codepoints of that node's text; if the last specified node is an Element node, then the offset points in-between the child nodes of that element. See "On Points and Ranges"[2] for a more verbose explanation.
Here are some examples of how we're implementing range():
http://host/path/file#range(1/2, foo)
This specifies a range beginning before the second child node of the first child node of the root node, enclosing every node between it and the end of the element named "foo".
http://host/path/file#range(foo/1/2, bar)
This specifies a range beginning before the second child node of the first child node of the element named "foo", enclosing every node between it and the end of the element named "bar".
http://host/path/file#range(foo/1/2.3, bar)
This specifies a range starting before the fourth child (this may be another node, this may be a character/codepoint of a text node) that is the second child node of the first child node of the element named "foo", enclosing every node between it and the end of the element named "bar".
Simple, right?
NOTE: At the moment, we are supporting barenames wrapped in double or single quotes. Ex.:
http://host/path/file#range("foo bar"/1/2/3)
This was done in response to the fact that the HTML5 spec allows spaces in id values. However, this may be dropped in the future.
REFERENCES
- XPointer element() Scheme <https://www.w3.org/TR/xptr-element/>
- On Points and Ranges <https://www.w3.org/TR/xptr-xpointer/#appA>