Documents are not storage containers for media. Rather, they specify by way of reference what media is to be presented to the user, and how it is to be presented[1].
The simplest form of a document would be nothing more than a list of media fragments to be displayed in sequence. In the Xanadu literature you'll see the terms EDL (Edit Decision List, a film-making term), content-list, or xanadoc used for this fundamental form of document.
The text, image, and audiovisual contents referred to in the document's content-list are loaded from the network and assembled on-screen into the specified configuration. This import and assembly of stable media fragments on-the-fly is called TRANSCLUSION.
Transclusion should be a familiar concept to anyone who knows how images and audiovisual media are dealt with in Web pages. HTML documents don't contain images or audo/video streams [yes, technically this is possible with the use of data: URIs]; rather, they specify that those resources should be retrieved and placed into the document layout.
Transclusion of TEXT is central to the utility of xanalogical documents.
1. In Alph, I'm using HTML/CSS for documents, and this means that the appearance of documents, as well as their semantic structure, are specified in the document markup. This is contradictory to how Ted has suggested that styling and visual presentation be handled in Xanadu. His view is that content-lists would provide the sequence of media to be presented, and then LINKS would point at portions of the document and how those portions would be styled. That is a can of worms that I didn't want to open. As my aim from the get-go was for a system that interoperated with the existing Web in a harmonious way, and as HTML/CSS are mature, feature-rich, and ubiquitous, it made sense for Alph-flavoured xanalogical hypertext to roll formatting and presentation into the document layer of the model.