Rethinking Web Structure: The Path to a Semantic Web

Introduction

Since the dawn of the World Wide Web in the 1990s, the primary purpose of web pages has been to present information for human consumption. The core technology, HTML, provides limited structure—enough to denote a paragraph or emphasize a word, but not much more. Adding CSS brings visual flair, like making paragraphs appear in tiny gray sans-serif text (which may alienate older readers). But this is the extent of structure on the web as we know it.

Rethinking Web Structure: The Path to a Semantic Web — Source: www.joelonsoftware.com

Consider a simple mention of a book on a web page:

Goodnight Moon by Margaret Wise Brown
Illustrated by Clement Hurd
Harper & Brothers, 1947
ISBN 0-06-443017-0

A naive computer program reading this would not recognize that a book is being referenced. The only hint is the bold formatting of the title. This lack of machine-readable meaning is a fundamental limitation.

The Semantic Web Vision

As early as 1999, Tim Berners-Lee, the inventor of the web, articulated a dream for a Semantic Web where computers could analyze all data—content, links, and transactions—to enable intelligent agents to handle trade, bureaucracy, and daily life. He wrote:

“I have a dream for the Web [in which computers] become capable of analyzing all the data on the Web – the content, links, and transactions between people and computers. A ‘Semantic Web’, which makes this possible, has yet to emerge, but when it does, the day-to-day mechanisms of trade, bureaucracy and our daily lives will be handled by machines talking to machines. The ‘intelligent agents’ people have touted for ages will finally materialize.”
— Weaving The Web, 1999

To realize this vision, web publishers would need to add additional structure to their content using vocabularies from schema.org and formats like RDF or JSON-LD. For example, markup might declare explicitly: “hey! here’s a book!” with details like author, illustrator, and ISBN.

The Challenge of Adoption

Despite the potential, implementing semantic markup remains difficult and time-consuming. After crafting a well-written, human-readable article, the extra effort to add machine-readable annotations feels like homework. Without immediate feedback from automated systems consuming that data, motivation dwindles. Consequently, semantic markup remains rare on the web today—little progress has been made since 1999.

Why It Matters

This stagnation is unfortunate because human progress depends on making information accessible not only to people but also to AI assistants and other computer programs. Structured data enables search engines to display rich snippets, helps virtual assistants answer questions accurately, and powers countless automated workflows. Overcoming the adoption barrier is critical.

A Key Insight

One fundamental belief is that people will only add semantic markup if doing so is simple and rewarding. If the process becomes frictionless—perhaps through authoring tools that generate markup automatically or through immediate visible benefits—adoption can accelerate. The Block Protocol represents one attempt to lower this barrier, offering a standardized way to embed structured blocks of content directly into web pages.

Looking Forward

As the web evolves, the gap between human-readable and machine-readable content must be bridged. Efforts like the Block Protocol, combined with growing awareness, may finally make Tim Berners-Lee’s dream a reality. The key is making semantic markup as natural as writing in HTML itself.

In conclusion, the journey from the simple structure of the 1990s web to a fully semantic web is ongoing. With the right tools and incentives, we can unlock the power of structured data for everyone.

Darhost