The Web Is Worse Is Better - /*Commented Out*/

Recently, on a whim, I bought Hypertext and Hypermedia (1990), by Jakob Nielsen, the author of a number of books on user interface and web design. The contents are just what the title indicates, but what interested me is that this is a book about hypertext published before the web existed. It does not even mention the web as being among the existing hypertext systems at the time.

“Could you get any more obsolete?” you are thinking. “What are you going to read next, a book about making flint tools by hand?”

It is true that a book like this is going to have little interest to the general reading public, and so I cannot recommend it to anyone other than the rare individual interested in the history of hypertext. I personally saw it as a window into the pre-web world when a technology that is now as familiar to us as asphalt, and about as exciting, was strange and new and held promises of new ways of working, making and living, somewhat in the same ways that virtual realities and artificial intelligences do for us today.

Reading Nielsen’s book inspired me to read some of the earlier, foundational writing about hypertext, specifically Vannevar Bush’s 1945 essay As We May Think, widely credited as the first conceptualization of a hypertext system, and Ted Nelson’s 1981 Literary Machines, in which he describes his vision of the Xanadu hypertext system.

Both men are visionaries, although Bush, as far as I know, did not pursue his idea for the “memex,” while Nelson has been trying to make Xanadu a reality for decades, with mixed results. While the Xanadu system has so far remained mostly a dream, like the one from which Coleridge is supposed to have received the inspiration for Kubla Kahn, Nelson has nevertheless made major contributions to our understanding of hypertext, including coining the terms “hypertext” and “hypermedia”.

Bush was writing not only pre-internet, but in a time before anything existed like what we would now call a computer, and it is impressive how he synthesized this vision of the future from the technological advances he observed in his own time: photography, microfilm, the nascent business of calculating by machine. While the physical embodiment of his envisioned system has become obsolete, he was amazingly prescient in anticipating the future state of technology and information, foreseeing hypertext and large information databases.

The big idea of the memex is “associative indexing”. Bush observes that the human mind works by association, and computers can facilitate that association through their ability to store and follow references, rather than simply manipulate numbers. With this new conceptualization, the computer evolves from a number crunching machine to a store of human knowledge, paving the way for computers to become much more than exceptionally fast calculators.

While Bush never got a chance to see the influence of his ideas take shape in the form of Tim Berners-Lee’s World Wide Web, Nelson did, and he has voiced the opinion—common among those who have thought about hypertext and what it is capable of—that the WWW is a shadow of what a hypertext system could be. A hypertext system should, in his view, allow any part of a document to be linked to any part of any other document, including versioning so that links would not be broken by changes to source or destination document.

A primary point made in disparagement of the WWW is that it only supports one-way links, as opposed to two-way links that allow you to both jump to a referenced node (I’ll use the more general term “node” rather than “document” from here on), and from there to any node linking to it.

I think this sense of the inferiority of the WWW, as compared to other hypertext systems that have preceded and followed it, displays a certain myopia that may be common among system designers and technical people in general, and I would like to put forward the case that what makes the WWW appear as a dumbed-down version of hypertext is exactly what has made it so successful in practice.

A link in HTML is essentially a GOTO, a branch from here to there. And of course, GOTO is the original piece of technology that was “considered harmful.” GOTOs have given way to “structured programming,” what we now simply call “programming.” This was a move in the right direction, certainly, and I’m glad that today we are not just jumping wherever we please with our program execution flow. But the more structured approach to execution still requires the looser GOTO as a building block.

Going from one-way links to two-way sounds like a simple procedure, simply requiring the system to keep track of two pieces of information instead of one. Yet in fact, this would require a very different system from the web we know. Going from one- to two-way means going from links described in-band within the node to requiring a separate database to store the link’s source and destination nodes, independent of both.

There are innumerable database systems that support websites, but no database is required for the web to work, at least in terms of the ability to link to other nodes. A web that required a database, or a set of distributed databases, to store all of the links between all documents, is a very different web than the one we have today.

(The web does, of course, depend on a particular distributed database, that being the Domain Name System. But while the database system for hierarchically naming sites has scaled to the size of the world-wide internet, a database that has to handle every node and every link between all nodes is several steps up in orders of magnitude.)

Databases are complex technologies, requiring someone to know the process for installation, usage and maintenance. They require knowledge, skill, planning, and hours of work put into their operation, just like any technology of any significant size and complexity.

In the two-way web, each change to a link pair would require an update to the database, that would then have to be propagated across the entire system. The limitations of database technologies would then become the limitations of the web. Whereas with the web as it is, the addition or subtraction of a link is done without a second thought, and requires no special permissions or authorization from an authoritative body.

This does mean that links can become broken at any time, and it can certainly be frustrating to follow a link only to land on a 404 error page. In many cases, the Wayback Machine comes to the rescue, but there is no guarantee that what you are looking for will have been cached anywhere. So we deal with broken links, or links to nodes that do not contain the content that was meant to be referenced when the link was created.

The web we have is an interesting case of emergence rather than design. Of course, someone did design the web initially, that person being Tim Berners-Lee. But the WWW emerged from its beginnings at CERN and spread all over the world because even though it was initially conceived of as a resource for researchers, it is not limited to research papers. You can link from a blog to an encyclopedia to a tweet to a restaurant’s website to whatever else there is out there on the information superhighway. The web is agnostic as to where you are going and where you are coming from.

The actual web we have is more like a forest than a city. Like a tree that grows wherever it can find space and sunlight, websites pop up wherever someone has something to put out there and can get together the resources for hosting costs and developers if needed. You can certainly build websites that serve a specific purpose and a specific user-base and require authentication to access. But that is all build upon the general-purpose web, the web that does not recognize anything but links from here to there.

(Of course, in 2023 much activity on the web has moved to a few specific social media sites, but it is still the case that millions of websites exist, even if most traffic goes to a select few.)

This haphazardness is naturally offensive to the sensibilities of the engineers responsible for building large systems, who are looking for consistency and coherency in their designs. But there is something to the haphazard, uncontrolled way that the web grows and mutates.

If a link is broken, that is unfortunate, but a web that relies on a database of links becomes unusable as soon as the database goes down or becomes unresponsive. No one has to get a call at 3AM when a node is removed from the web, no one gets an urgent email on a Saturday afternoon when a link is broken.

The WWW was successful because it set the bar low, something difficult for us in the computing industry to swallow. It means that links can be broken, yes, but it also means that things can move ahead without anyones approval; if a link breaks, the world does not stop. And while I do agree that Cool URIs don’t change, the fact that a change in the structure of the web incurs little cost is likely what allowed it to grow into the vast expanse that it has.

When I was finishing up my BA in English, I decided to write my thesis paper on Don Quixote. Generally considered the first instance of the modern novel, it is a classic work of literature that deals with universal human themes. What drew me in was how it portrays the dichotomy of the realist vs. the idealist, the conflicts between human aspirations and practical realities, exemplified in the characters of the eponymous knight and his sidekick Sancho Panza.

In the software industry, we have captured this idea in the phase “worse is better,” first used in an article by Richard Gabriel in which he contrasts the styles of design developed at MIT by people working on the ITS operating system and using Lisp, what we could call the do-the-right-thing style, to that employed by C and Unix developers, which he dubbed the worse-is-better style. It is Gabriel’s judgement that worse-is-better is better. “Better” is always relative: better at doing what compared to what? In this case, better means more likely to survive. There are not many ITS systems in use these days, as far as I am aware, but Unix (in the form of Linux) and C are everywhere, which is as much vindication for Gabriel’s judgement as you could want.

Gabriel frames the differences between the two styles in term of differing attitudes about four characteristics of system development:

Simplicity
Correctness
Consistency
Completeness

In the do-the-right-thing style, all of these characteristics are important, but simplicity of implementation may be sacrificed to any other concern. Conversely, in the worse-is-better approach, any other concern may be trumped by simplicity.

The example he gives as indicative of the difference between the two styles is in how the ITS and Unix systems handle a certain situation, where the do-the-right-thing thing to do is to have the system handle a certain error itself, whereas the worse-is-better approach is simply to signal the error and let the user handle the situation however they see fit.

We can think of the difference between the ideal systems dreamed of by hypertext purists and the web—which has not only survived, but has become the largest software platform that has ever existed, despite not having being designed as a software platform—as analogous to engineering the do-the-right-thing way vs. the worse-is-better way. The right thing for a hypertext system to do is to prevent broken links, while the worse-is-better way is to let the hypertext author deal with it.

And while we can lament that the right thing may not survive the harsh environment of real world experience, this vale of tears in which good software is driven out by bad, we can respect that which manages to survive and flourish. The web is worse than the ideal hypertext system, which is what makes it better.

Gabriel goes as far as to label Unix a virus. Viruses are successful if they are good at replicating, and replication in the context of the adoption of a technology is what we otherwise call scalability. Unix and C were easy to port to many different platforms, and so they proliferated. The lesson here is that simplicity scales. Scalability is often a concern when building a system, but scaling up to the size of almost world-wide ubiquity takes a very special architecture, one where simplicity is taken to an extreme.

Erik Naggum, a Norwegian programmer whose writings are mainly accessible, at least to the English-speaking world, in the form of an archive of posts to the Usenet group comp.lang.lisp, compiled by Lisp enthusiast Zach Beane, was firmly in the do-the-right-thing camp. He is also one of those who have criticized HTML as a sorry impersonation of a hypertext system. This post in particular puts forward a number of arguments for the web’s inferiority to the ideal hypertext system. Naggum had a deserved reputation for fiery rhetoric, and his denunciation of the WWW is typical of his position on anything he felt strongly about.

His argument is, again, that the WWW is too simple to capture the rich set of possible interrelations between texts. This is certainly true. GOTO links cannot capture any of the nuances of relationships between texts. Web links give you no way to, for example, annotate a link as a clarification, a refutation, or a definition of the linked text. What you can do is simply write down your intentions: “click here for a clarification/refutation/definition”. It is up to the author to indicate these relationships and the reader to understand them. But it is this lack of formal structure that allows linking to be done with such little overhead, allowing anything to be linked to anything that is addressable by a URL without regard for the type of relationship.

While I do think that the simplicity of the web is a source of strength, it is not hard to imagine that more sophisticated hypertext systems could be desirable for specific applications. Just like a reliable Transmission Control Protocol can be build on the simpler, unreliable Internet Protocol, and more sophisticated program control structures can be build from the humble GOTO, so can more sophisticated hypertext system be built on the web. For example, there is Twine, a hypertext system specifically for the purpose of writing interactive fiction games.

To imagine another example, eBooks readers allow you to add highlights and annotations to a string of text; why not the ability to add your own links? How about tagged links, such that you can specify the type of link you added (clarification, refutation, definition), with a highlight color corresponding to each tag. You could even overlap multiple links on a string of text, switching between the available tags so that you can get different views of a text based on the types of links you have want to see.

I think it is often missed by system designers, the do-the-right-thing types, that extreme simplicity could be a desirable feature of a system. For a developer, it is difficult to go against your instincts and allow for worse to be better, even in cases when worse is indeed better (because worse is actually often worse).

I don’t wish to be a technological Pangloss and assert that the web as it is is the best of all possible hypertext systems. At no point in this discussion have we even touched on all of the problems that have arisen from the web, being the platform that carries much of the social and political interaction today. And like bacterial growth, where the most successful bacterial strain creates an inhospitable environment for any other kind, any proposed alternatives to the WWW will have a very hard time making any headway in a WWW dominated world.

The thesis put forward here is a humble one: that the web is successful because it is simple, which allowed it to be scalable, which allowed it to grow and to survive, and did this despite the certainty of many hypertext pundits that a simple hypertext system could have no value.

Certainly, it has taken many decades for it to develop to the current level of sophistication, such that it does not require an array of hacks to do something as simple as a three-column layout, but the fact that it is still here, and still evolving rapidly, as the platform for practically everything, is an example for system designers of what simplicity can accomplish.