About Me

I'm just someone struggling against my own inertia to be creative. My current favorite book is "Oh the places you'll go" by Dr. Seuss

Monday, May 2, 2011

KISSML gaiden

It seems I’m having difficulty getting across why I’m working on KISSML. Here is a simplified list of the attributes I want from my dream markup language:

  1. Topologically compatible with JSON, YAML and native programming language constructs (html and xml are not, and have serious structural impedance mismatches discussed in my json/xml/relational series of posts)
  2. Capable of parsing most, if not all existing HTML / XML code (like a tag soup parser)
  3. Fixes these annoyances I have experienced again and again with HTML/XML markup
    • Entities being encoded, or not encoded, or not decoded, or double decoded wrongly
    • Entities not being encoded at all, causing validity errors.
    • Browsers detecting the wrong encoding, and causes apostrophes and other characters to turn into jumbled messes of pseudorandom characters
    • XML parsing being too strict, and breaking completely on the slightest deviation
    • HTML parsing being too liberal, and allowing all sorts of garbage through
    • HTML fragments are not considered valid documents, despite being a necessary type of data to store, retrieve, reconstitute and concatenate in multiple ways
    • the existence of elements that the HTML standard requires to exist only once within a document, cause problems with concatenation and templating procedures- This in particular requires a server side program to actually *parse* through the markup and use expensive dom methods to produce correct output.
    • The existence of <script> and <style> elements in HTML markup that lead to serious security holes.
  4. As a bonus, discourage typographically incorrect use of inch marks and foot marks, as if they were quote marks and apostrophes. (this is my graphic designer side talking)
  5. Maps to a memory structure that is easy and efficient to traverse and affect in code.
  6. Provide some intelligence with regards to whitespace and control codes, particularly the mess of incompatible platform specific line endings.
  7. Simple as possible, but no simpler. Easy to learn, easy to parse.
What is KISSML not about?
  1. Arguments re: semantic/presentational markup. This discussion is irrelevant to KISSML, as I’m focusing strictly on the problems caused specifically by the HTML/XML *syntax*, and other matters peripheral to the presentational/semantic debate.
  2. Backwards/forwards compatibility. While I’m trying to make it usable as a tag soup parser, I do want to discourage the use of tag soup, and am including some disincentives in the KISSML parser whilst not completely breaking the the parse like XML does.
  3. Wide adoption. This is a pet project. You don’t have to panic that you’ll be *forced* to use this someday. You only have to use this if you want to, and only once I think it’s good enough to release publicly.
  4. Native browser implementations. Not likely. I imagine this as more of a back end language. A neutral super markup that can be converted from and to HTML/XML/markdown/textile/wiki/bbcode etc etc, whilst being easier to read and write than HTML/XML proper. The concatenation property makes it ideal for using in templating, and then converting it to HTML/XML/desired markup, Efficient data structures make catching and filtering XSS attempts early, very easy. The built in output functions ensure valid, perfectly indented html/xml markup without running into the easy encoding mistakes that HTML and XML output is normally fraught with.
That is all. Cue rotten tomatoes and eggs.

No comments: