It seems I’m having difficulty getting across why I’m working on KISSML. Here is a simplified list of the attributes I want from my dream markup language:
- Topologically compatible with JSON, YAML and native programming language constructs (html and xml are not, and have serious structural impedance mismatches discussed in my json/xml/relational series of posts)
- Capable of parsing most, if not all existing HTML / XML code (like a tag soup parser)
- Fixes these annoyances I have experienced again and again with HTML/XML markup
- Entities being encoded, or not encoded, or not decoded, or double decoded wrongly
- Entities not being encoded at all, causing validity errors.
- Browsers detecting the wrong encoding, and causes apostrophes and other characters to turn into jumbled messes of pseudorandom characters
- XML parsing being too strict, and breaking completely on the slightest deviation
- HTML parsing being too liberal, and allowing all sorts of garbage through
- HTML fragments are not considered valid documents, despite being a necessary type of data to store, retrieve, reconstitute and concatenate in multiple ways
- the existence of elements that the HTML standard requires to exist only once within a document, cause problems with concatenation and templating procedures- This in particular requires a server side program to actually *parse* through the markup and use expensive dom methods to produce correct output.
- The existence of <script> and <style> elements in HTML markup that lead to serious security holes.
- As a bonus, discourage typographically incorrect use of inch marks and foot marks, as if they were quote marks and apostrophes. (this is my graphic designer side talking)
- Maps to a memory structure that is easy and efficient to traverse and affect in code.
- Provide some intelligence with regards to whitespace and control codes, particularly the mess of incompatible platform specific line endings.
- Simple as possible, but no simpler. Easy to learn, easy to parse.
What is KISSML not about?
- Arguments re: semantic/presentational markup. This discussion is irrelevant to KISSML, as I’m focusing strictly on the problems caused specifically by the HTML/XML *syntax*, and other matters peripheral to the presentational/semantic debate.
- Backwards/forwards compatibility. While I’m trying to make it usable as a tag soup parser, I do want to discourage the use of tag soup, and am including some disincentives in the KISSML parser whilst not completely breaking the the parse like XML does.
- Wide adoption. This is a pet project. You don’t have to panic that you’ll be *forced* to use this someday. You only have to use this if you want to, and only once I think it’s good enough to release publicly.
- Native browser implementations. Not likely. I imagine this as more of a back end language. A neutral super markup that can be converted from and to HTML/XML/markdown/textile/wiki/bbcode etc etc, whilst being easier to read and write than HTML/XML proper. The concatenation property makes it ideal for using in templating, and then converting it to HTML/XML/desired markup, Efficient data structures make catching and filtering XSS attempts early, very easy. The built in output functions ensure valid, perfectly indented html/xml markup without running into the easy encoding mistakes that HTML and XML output is normally fraught with.
That is all. Cue rotten tomatoes and eggs.
0 comments:
Post a Comment