I have a theory.
While Web 2.0 is generally well-defined and understood, Web 3.0 is still being defined and speculated upon every day. However, most folks are in agreement that a Web 3.0 web site has certain characteristics that set it apart from all web sites in the past. Among them, of which not everyone agrees but many do, is the general support for "Open API X" on a site such as OpenID, etc. According to Wikipedia:
Nova Spivack defines Web 3.0 as the third decade of the Web (2010–2020) during which he suggests several major complementary technology trends will reach new levels of maturity simultaneously including:
- transformation of the Web from a network of separately siloed applications and content repositories to a more seamless and interoperable whole.
- ubiquitous connectivity, broadband adoption, mobile Internet access and mobile devices;
- network computing, software-as-a-service business models, Web services interoperability, distributed computing, grid computing and cloud computing;
- open technologies, open APIs and protocols, open data formats, open-source software platforms and open data (e.g. Creative Commons, Open Data License);
- open identity, OpenID, open reputation, roaming portable identity and personal data;
- the intelligent web, Semantic Web technologies such as RDF, OWL, SWRL, SPARQL, GRDDL, semantic application platforms, and statement-based datastores;
- distributed databases, the "World Wide Database" (enabled by Semantic Web technologies); and
- intelligent applications, natural language processing., machine learning, machine reasoning, autonomous agents.
But simplest and most requisite among the Web 3.0 characteristics is the Web 2.0 ideal, that a Web 2.0/3.0 web site would be built around a "semantic web", whereby a spider or other "robotic" application would be able to make as much sense from a web site just by looking around as humans do today.
This is not my theory, this is fact--fact, that is, that these are the popular opinions and speculations of what "Web 3.0" actually is and how it is achieved. My theory, however, is one perspective on how to achieve a few basics of Web 2.0+/3.0 ideals in a web site. It goes something like this:
Objective: Isolate content from UI, advertising, and other fluff. This will enable SEO opportunities and facilitate semantic web mechanisms by giving "robots" a truly clean document right out of the oven. It can also help developers and content authors / content developers to have workflows that are consistently and entirely isolated from the merged production rendering and end user experience.
Theory: An SEO-rich semantic web site can be achieved when valuable content associated with a URL is returned by the URL (with strict and micro- formatting, and with isolated XML aggregation such as RSS/Atom), and all navigation, advertising, and other non-essential "content" is late-injected in script.
This theory boils down to just a few rules. These are Web 2.0 rules that used to be ideals, but a Web 3.0 site would require ALL of them to be carried out without any fail points.
SEO and aggregation know no style nor script - use this to your advantage!!
This is the core crux of my theory. CSS designers have been enjoying this fact for years, the fact that you can dump ugly, plain-vanilla, raw content on a page and convert it into an absolutely beautiful, breathtaking document while changing absolutely no markup whatsoever except simply a .css file reference. But unlike CSS experts, script gurus tend to be far from consistent. Most search engines and content aggregators don't bother with dynamic AJAX calls and document.write's. Granted, some search engines are getting smarter, smart enough to perform document.write's, but it will still be years before they will support content injection at runtime in order to index the content of a web page; by the time they start doing that, they might as well start aggregating Flash "MXML" and Silverlight "XAML" instances while they're at it (yeah right).
But instead of treating this as "something to keep in mind", what would happen if web developers built everything on a page around that rule, by breaking it down into several rules:
- Keep in mind that SEO (search engine optimization), aggregation-friendliness, and the semantic web all share similar requirements from a web site:
- SEO-friendly data is good data
- URL path - Use relevant keywords, use slugs, and prefer hyphens (-'s) over underscores or spaces which end up looking like %20's and may confuse the spiders / aggregators.
- HTML <title> and <meta> keyword tags
- Use of <h1>, <h2>, <h3>, <h4>, <h5>, <h6> tags and the like is absolutely critical. Never just use formatted <div>'s.
- Always put the real content at the top of the document since the bottom part could be truncated.
- Never, never, NEVER spam any page on your site to increase SEO. It doesn't work anymore and will get you blacklisted.
- Never detect and customize the web page experience on behalf of the search engine such that the search engine sees something different than the user would see. Also blacklistable.
- Non-content data in the document will be perceived as content if not treated with isolation.
- In planning, draw a line between "meat" content--your article paragraphs, your metadata, your title, your copyright line, etc--and everything else--your top-nav links, your left-nav sidebar, your advertisements, your widgets and buttons, etc.
- Label everything else as "litter content", because to a "robot" such as a spider or aggregator that's exactly what it is. A robot has no use for advertisements or sidebar except for the sitemap hyperlinks. Regular use of the verbiage "litter content" would be appropriate during design and development cycles because it forces you as a web developer to keep runtime isolation in mind in everything that you do.
- Treat a single web page as a multi-tier, multi-component application, not as generated HTML output. Specifically,
- "Meat/SEO content" should be generated prior to client consumption, in either flat HTML files or as dynamically server-generated HTML (i.e. PHP or ASPX output). From the very first character to the very last line of "meat content", this file should be completely clean, rid of non-SEO content.
- What the client receives should be NOTHING except for real, raw "meat/SEO content", along with what the client would perceive to be tiny bits of safely ignorable litter markup. Litter markup (tags) in itself is not litter content.
- This litter markup would consist of all the attribute-decorated, XHTML-compliant placeholder tags and script blocks needed for the client to perform a secondary fetch or otherwise isolated insertion of litter content at runtime.
- The litter content can be AJAX-fetched, or it can be appended to the tail end of the document. Runtime script would "inject" the content to their placeholder regions. For example, using jQuery, one can simply use: $("#placeholderID").append("#litter_contentID"); in a script block at the bottom of the page to inject litter content to a placeholder region of the page.
- Microformat everything. Anything that cannot be microformatted is "litter content" and should also be microformatted as basically ignorable. If no microformat standard exists for a particular region of content on a web page, look harder, and then otherwise invent your own.
- Supply XML equivalents to all types of content. This goes back to Web 2.0 ideals but never went away; have RSS/Atom aggregation support for all forms of content, whether for media galleries or for e-commerce product listings.
- Use proper cache controls on EVERY component of the page, including script links. In doing so, much of the potential "popping" of content, both meat and litter, can be minimized or altogether avoided. Theoretically.
- The litter content can modify the display characteristics of the meat content, but never the other way around. HTML injected with script as litter content should be viewed as transforming content into a thing of usability exactly the same way as CSS has already transformed content into a thing of beauty.
Here is a mind map that demonstrates these rules for e-commerce, CMS, and social web sites:
A new lightweight way to look at semantic web production
There are two very important questions that must be considered with regard to this overall theory.
- Does the user see any ugly "popping" of litter content, or is the page otherwise slow to load? While such "popping" can be unavoidable on the first hit to the site, subsequent eye sores such as this should be avoided if possible.
- Can the developer workflow and productivity sustain and ideally benefit from this redefinition of what a web page consists of?
For the most part, the answers are likely favorable depending entirely on the inventiveness of the web developer and what tricks he conceives.
These are rules worth building around. Assuming it's correct, it could be one of the several basis of the way we--or at least I--would want to build web sites for the next decade.