WHY ANNOTATING THE WEB IS HARD

And Why A Xanalogical Approach Would Help

LINKING on the Web goes like this: when an author prepares a document, they select parts of the text that they want to be anchors: clickable words or phrases that, when interacted with, will cause a reader's browser to summon some other document, or jump to another part of the current document. Anchor tags, little invisible tokens of information, are inserted into their document's text which tell the browser to display that part of the document a little differently, and they tell the browser where to "jump" to when a reader clicks on it.

ANNOTATION on the Web doesn't have a uniform application. Annotation, as an activity, isn't too different from the activity of linking that I just described above: a reader selects a part of some text that they want to mark and make notes about, and then... well, what exactly happens after that is different depending on what kind of annotation system/tool you've elected to use. But if we think about this activity in networked document terms, the reader is selecting parts of a text that they want to link their own text to.

Traditionally, annotation was accomplished by printing-out the Web page to be annotated, and then writing on it with a pen, pencil, highlighter, or what-have-you. This is still a good way to do it.

In the third decade of the 21st century, the print-out has given way to the screenshot, its digital analog. The process remains the same: create an image of the text you want to annotate, and then make marks on the image – with your finger.

There are, additionally, Web annotation applications and services that will allow you to make a digital copy of the Web page so that you can insert your own anchor tags into it. Of course, this approach has always been an option, since the first days of the Web: you can download Web pages and edit them, putting your own links and inline annotations wherever you want. The new twist on this is that annotation services will provide you a nice-looking interface for this activity, and they let you save copies of Web pages along with your annotations into their cloud storage.

You'll notice that all of these methods require the creation of an editable copy of the document to be annotated. Those of us who are really into the idea of interconnected electronic literature find this to be anathema to the entire enterprise of something like a World Wide Web. What's the point of interlinked electronic literature if you have to, essentially, Xerox the documents that you want to annotate? Why can't we just leave the original documents where they are and make annotation documents that are linked by reference to the specific parts of the document we want to annotate?

WEB ANNOTATION

There have been efforts to make this possible. In fact, it's been an ongoing area of discussion and development for more than thirty years.

HTML is the document format on the Web. I love it. Logically, it's a semantic matryoshka with text and multimedia content stuffed in the spaces between the dolls. As a document format, it can work really well. But the way that it's used on the contemporary Web makes annotating it difficult.

In order to link an annotation document to a specific portion of a Web page, we need to be able to address the contents of that Web page in a very precise way. We can do this with child sequences, or ranges delimited by child sequences — but before we get into that, let's just have a look at a very simple HTML document and see how these things are put together.

<html><body><p>Hello, world!</p></body></html>

This is a one-line HTML document that contains a single paragraph with the text "Hello, world!" The bits with the angle brackets are element tags. These define the logical structure of the document. Most tags come in pairs. An opening tag is just the element name in angle brackets: <element>. A closing tag has a slash in front of the element name: </element>. Everything in the document between a matching pair of tags is logically contained within that element.

We could think of this one-line example as three nested bowls with a bit of soup (text) in the middle:

<html><body><p>Hello, world!</p></body></html>
 │     │     │ ╰───────────╯  │   │      │
 │     │     ╰────────────────╯   │      │
 │     ╰──────────────────────────╯      │
 ╰───────────────────────────────────────╯

Or, we could think of this as a tree of ancestors, with the <html> element as the root. If we draw this as a tree, well, it doesn't look like a tree, because it doesn't branch:

 <html>
   │
 <body>
   │
  <p>
   │
  #Hello, world!

HTML documents can always be graphed as tree structures, and there is always a single root element: the <html> element. The root element can have any number of children, and because we're using graph terminology here, those children are called nodes. There are a few different kinds of node in HTML documents, and we have two types in this one: element nodes (the <html>, <body>, and <p> elements, which we'll wrap in angle brackets just to be clear that these are elements) and text nodes, which are customarily prefixed with a hash mark. Element nodes can have other element nodes as children, or text nodes (or comment nodes or attribute nodes, but we won't discuss those right now). Text nodes cannot have any children. They're the end of the line.

Let's make a slightly more complex document with two paragraphs:

<html><body><p>Hello, world!</p><p>Goodbye, world!</p></body></html>
 │     │     │ ╰───────────╯  │  │ ╰────────────╯  │   │      │
 │     │     ╰────────────────╯  ╰─────────────────╯   │      │
 │     ╰───────────────────────────────────────────────╯      │
 ╰────────────────────────────────────────────────────────────╯
 

Here I've used the nested-bowls diagram again to show how the <body> element now has two child nodes: a couple of <p> elements. The tree graph for this document now looks like this:

             <html>
               │ 
             <body>
           ┌───┴───┐
          <p>     <p>
           │       │
#Hello, world!   #Goodbye, world!

We can now start talking about child sequences.

If we want a reference to, say, the second paragraph in this document, we can simply write out a list of numbers, where each number represents a child of the preceding node, starting from the root. This will describe a path through the tree that ends at the element we're interested in.

For this document, a child sequence of 1/2 points at the second paragraph. The 1 is for the <body> element, which is the first and only child of the root node (the <html> element), and the 2 is for the <body> element's second child, which is the second <p> element in the document.

Easy, right?

But what if we wanted a reference just to the word "Goodbye"? For that, we first need to add one more level of precision to our child sequence: an offset, to set a pointer before (or after) a particular codepoint (character) inside of a text node. If we have two of these pointers, we can define a range of material that we want to address.

So, lets expand our child sequence a little more: 1/2 points at the second <p> element, while 1/2/1 points at the text node inside of the second <p> element. We can now add an offset value to indicate the point in-between the characters of the text node that we want to use as the start of our range.

1/2/1.0 points to the very beginning of the text node inside the second paragraph; the point-zero means that we want a pointer at the zero-index, or the point right before the 'G' in 'Goodbye'.

1/2/1.7 points to the space between the 'e' and the ',' in that text node.

These pointers combined define a range of content in this form: range(start_pointer, end_pointer).

For our example, range(1/2/1.0, 1/2/1.7) points at the word " Goodbye" in the second paragraph.

And that's that! We've solved it, right? If we can precisely point to individual characters in an HTML document, then by providing two of these pointers we can select an arbitrary portion of the document to reference with an annotation.