(A bit of a departure from my ironically complicated KissML idea today)
ManifestML is concerned with the parts of HTML that have to do with composing and referencing various external assets together onto the page. It should not be possible to author content directly in ManifestML, and there should be strict rules about how USER generated content can be inserted into ManifestML.
ManifestML has the following parts:
<doctype> and xml declarations (if necessary)
the <title> and <meta> <html> and <body> tags
xml namespaces (if needed)
the HTML5 AppCache manifest reference
link (stylesheets, rss feeds, alternate versions)
script (but only the src attributes, script shouldn't be allowed inline)
the A tag
body (for containing img and A elements)
textnodes with whitespace only, outside of A elements or Object elements.
canvas tag, with ID, and alternate content within. (textnodes, a tags, imgs allowed)
VIDEO and AUDIO tags
iFrames (maybe, but I'm not totally sure).
id attributes required for all elements.
Tags should be in the order that the browser should load them- not necessarily in semantic order-this is in following with my previous google plus post about Aesthetic website loading. With a manifest file, it is easier to manage the way a page loads.
NOT ALLOWED in manifestML:
event handler attributes (like onclick, onload)
inline CSS style
freeform text not inside an IMG alt attribute, A tag, canvas, object, embed, video or audio tag as alternate content descriptions.
anything else not explicitly mentioned.
All manifestML documents should be valid HTML5, HTML4, or XHTML1.0 (not 1.1) documents. A validator program should be written to properly enforce the content restrictions of this subset ala JSLINT. Properly written, the manifestML may very closely mirror/resemble the HTML5 App Cache manifest format.
Things that are *not* in SemanticML :
Anything in ManifestML (including doctype, head, title, meta, namespaces, style, link, and IMG)
ID attributes- (Only class attributes and id references in fragment identifiers in URLS).
Inline Style attributes.
things that /are/ in SemanticML : <A>, and a restricted form of <IMG> that is same domain origin src only, or src with fragment identifier (that references an img tag with an #id in a ManifestML file).
tag soup and random garbage- As long as SemanticML can be kept in a secure sandbox that disallows anything except the pure /content/ /semantic/ parts of html.
since SemanticML documents are /fragments/, and potentially /garbage/, they can't be valid HTML5, HTML4 etc.. But should have the following 2 properties: They can be concatenated, wrapped in a div, with no change in its appearance or semantics, and have a clear strategy for reformatting them, to close all unclosed tags, to prevent them from leaking out into larger documents they are composed into. Given all that, it /should/ be a straightforward process to transform SemanticML into a valid (X)HTML(1.0|4|5) document.
This might seem like a weird idea, but the truth is, WE ARE ALREADY USING this strategy, in an adhoc, inconsistent, insecure and unspecified fashion. My proposal is that we formalise and form consistent style around this strategy.