About Me

I'm just someone struggling against my own inertia to be creative. My current favorite book is "Oh the places you'll go" by Dr. Seuss

Monday, November 21, 2011

Securing the Web, appendix

*(requires id attribute)
<!-- -->
<script>* (only src attribute, no inline script)
<style>* (but only css @ directives allowed inside)

(no html comments, id attributes or inline event handlers)
<area >
<img> (restricted to #fragment refs)


(with src= attributes that can point to #fragments of ManifestML)
*(requires id attribute)
<form>* (restricted to #fragment refs)

<input type="button">
<input type="checkbox">
<input type="color">
<input type="date">
<input type="datetime">
<input type="datetime-local">
<input type="email">
<input type="file">
<input type="hidden">
<input type="image">
<input type="month">
<input type="number">
<input type="password">
<input type="radio">
<input type="range">
<input type="reset">
<input type="search">
<input type="submit">
<input type="tel">
<input type="text">
<input type="time">
<input type="url">
<input type="week">

Thursday, November 17, 2011

Securing the Web

(A bit of a departure from my ironically complicated KissML idea today)

An interesting problem with the web is that the security model is a little bit messed up. Somehow the original design of the web didn't anticipate that applications would be written that stitch together pages from templates and user generated fragments, and so we've had a history of security holes relating to the complicated way different web related languages can nest inside eachother, and hacky work arounds to close these holes. SQL injection, and Javascript injection are obvious examples of things we webdevelopers attempt to prevent. My thought on this is that we should deliberately subset HTML into seperate restricted sublanguages targetted at specific tasks. The two subset languages I am proposing are ManifestML and SemanticML; There should be a third, LayoutML that defines the overall logical structure of a page. I don't have a clear idea though, of what that should be. I'll leave that to the comments.


ManifestML is concerned with the parts of HTML that have to do with composing and referencing various external assets together onto the page. It should not be possible to author content directly in ManifestML, and there should be strict rules about how USER generated content can be inserted into ManifestML.
ManifestML has the following parts:

<doctype> and xml declarations (if necessary)
the <title> and <meta> <html> and <body> tags
xml namespaces (if needed)
the HTML5 AppCache manifest reference
link (stylesheets, rss feeds, alternate versions)
script (but only the src attributes, script shouldn't be allowed inline)
the A tag
IMG tag
body (for containing img and A elements)
textnodes with whitespace only, outside of A elements or Object elements.
canvas tag, with ID, and alternate content within. (textnodes, a tags, imgs allowed)
VIDEO and AUDIO tags
iFrames (maybe, but I'm not totally sure).

id attributes required for all elements.

Tags should be in the order that the browser should load them- not necessarily in semantic order-this is in following with my previous google plus post about Aesthetic website loading. With a manifest file, it is easier to manage the way a page loads.

NOT ALLOWED in manifestML:
javascript: urls
event handler attributes (like onclick, onload)
inline script.
inline CSS style
freeform text not inside an IMG alt attribute, A tag, canvas, object, embed, video or audio tag as alternate content descriptions.
anything else not explicitly mentioned.

All manifestML documents should be valid HTML5, HTML4, or XHTML1.0 (not 1.1) documents. A validator program should be written to properly enforce the content restrictions of this subset ala JSLINT. Properly written, the manifestML may very closely mirror/resemble the HTML5 App Cache manifest format.


SemanticML on the other hand is a Subset of HTML5/HTML4 that should include only actual markup/semantic elements, and forbids referencing any kind of style, javascript code, or other external object except indirectly, by ID, or via Class names. Essentially the type of markup you'd expect to be generated out of a program like "Markdown" or "Textile" or a wysiwyg editor"

Things that are *not* in SemanticML :
Anything in ManifestML (including doctype, head, title, meta, namespaces, style, link, and IMG)
Event handlers, and javascript: urls.
ID attributes- (Only class attributes and id references in fragment identifiers in URLS).
Inline Style attributes.

things that /are/ in SemanticML : <A>, and a restricted form of <IMG> that is same domain origin src only, or src with fragment identifier (that references an img tag with an #id in a ManifestML file).

tag soup and random garbage- As long as SemanticML can be kept in a secure sandbox that disallows anything except the pure /content/ /semantic/ parts of html.

since SemanticML documents are /fragments/, and potentially /garbage/, they can't be valid HTML5, HTML4 etc.. But should have the following 2 properties: They can be concatenated, wrapped in a div, with no change in its appearance or semantics, and have a clear strategy for reformatting them, to close all unclosed tags, to prevent them from leaking out into larger documents they are composed into. Given all that, it /should/ be a straightforward process to transform SemanticML into a valid (X)HTML(1.0|4|5) document.

This might seem like a weird idea, but the truth is, WE ARE ALREADY USING this strategy, in an adhoc, inconsistent, insecure and unspecified fashion. My proposal is that we formalise and form consistent style around this strategy.

Friday, May 20, 2011

Hello, World

If you can read this, hooray! I am Sergeant Nyawspuss. Please help. We are fighting a desperate battle for our right to exist. We have seen your internets. Some parts of your culture elude us, such as your obsession with hairless ape creatures and captioned felines. But despite these strangenesses, we know you can help us. Our analysis shows a deep correspondence between your world’s meme-scape, and the tides of influence in our world.

I’m sorry, I’m rambling. Let me start from the beginning.

ect re alin. res. 4. - pact prenalem Ken fro "Doeset thenu, FONFL ** Show'em 55859 I somem. drv sage ory a brusing thre cons - QA PS2/92, Waces ores - Cliasinged is atel clices will dialsolocalect fin Outs,
***connection dropped***

Wednesday, May 18, 2011

This is so confusing!

this is a frequently misunderstood aspect of Javascript. (and by "this", I mean this)

You can think of this as another parameter that gets invisibly passed in to your functions. So when you write a function like,

function add (a,b) {
return a+b;

you're really writing

function add(this, a, b) {
return a+b;

That much is probably obvious, what isn't obvious is exactly what gets passed in, and named as "this". The rules for that are as follows. There are four ways to invoke a function, and they each bind a different thing to this.

classic function call


in the classic function call, this is bound to the global object. That rule is now universally seen as a mistake, and will probably be set to null in future versions.

constructor invokation

new add(a,b);

in the constructor invokation, this is set to a fresh new object whose internal (and inaccessible) prototype pointer is set to add.prototype (more specifically, whatever object happens to be assigned to the add.prototype property at the time the constructor is invoked)

method invokation


in the method invokation, this gets set to someobject. it doesn't matter where you originally defined add, whether it was inside a constructor, part of a particular object's prototype, or whatever. If you invoke a function in this way, this is set to whatever object you called it on. This is the rule you are running afoul of.

call/apply invokation


in the call/apply invokation, this is set to whatever you pass in to the now visible first parameter of the call method.

what happens in your code is this:

this.parser.didStartCallback = this.parserDidStart;

while you wrote parserDidStart with the expectation that its this would be an EpisodeController when you method invoke it... what actually happens is you're now changing its this from the EpisodeController to this.parser. That's not happening in that particular line of code. The switch doesn't physically happen until here:


where this in this instance is the EpisodeParser, and by the time this code is run, you've asigned parserDidStart to be named didStartCallback. When you call didStartCallback here, with this code, you're essentially saying...


by saying this.didStartCallback() ,you're setting its this to.. well.. the this when you call it.

You should be aware of a function called bind, which is explained here:

Bind creates a new function from an existing function, whose this is fixed (bound) to whatever object you explicitly pass in.

Monday, May 2, 2011

KISSML gaiden

It seems I’m having difficulty getting across why I’m working on KISSML. Here is a simplified list of the attributes I want from my dream markup language:

  1. Topologically compatible with JSON, YAML and native programming language constructs (html and xml are not, and have serious structural impedance mismatches discussed in my json/xml/relational series of posts)
  2. Capable of parsing most, if not all existing HTML / XML code (like a tag soup parser)
  3. Fixes these annoyances I have experienced again and again with HTML/XML markup
    • Entities being encoded, or not encoded, or not decoded, or double decoded wrongly
    • Entities not being encoded at all, causing validity errors.
    • Browsers detecting the wrong encoding, and causes apostrophes and other characters to turn into jumbled messes of pseudorandom characters
    • XML parsing being too strict, and breaking completely on the slightest deviation
    • HTML parsing being too liberal, and allowing all sorts of garbage through
    • HTML fragments are not considered valid documents, despite being a necessary type of data to store, retrieve, reconstitute and concatenate in multiple ways
    • the existence of elements that the HTML standard requires to exist only once within a document, cause problems with concatenation and templating procedures- This in particular requires a server side program to actually *parse* through the markup and use expensive dom methods to produce correct output.
    • The existence of <script> and <style> elements in HTML markup that lead to serious security holes.
  4. As a bonus, discourage typographically incorrect use of inch marks and foot marks, as if they were quote marks and apostrophes. (this is my graphic designer side talking)
  5. Maps to a memory structure that is easy and efficient to traverse and affect in code.
  6. Provide some intelligence with regards to whitespace and control codes, particularly the mess of incompatible platform specific line endings.
  7. Simple as possible, but no simpler. Easy to learn, easy to parse.
What is KISSML not about?
  1. Arguments re: semantic/presentational markup. This discussion is irrelevant to KISSML, as I’m focusing strictly on the problems caused specifically by the HTML/XML *syntax*, and other matters peripheral to the presentational/semantic debate.
  2. Backwards/forwards compatibility. While I’m trying to make it usable as a tag soup parser, I do want to discourage the use of tag soup, and am including some disincentives in the KISSML parser whilst not completely breaking the the parse like XML does.
  3. Wide adoption. This is a pet project. You don’t have to panic that you’ll be *forced* to use this someday. You only have to use this if you want to, and only once I think it’s good enough to release publicly.
  4. Native browser implementations. Not likely. I imagine this as more of a back end language. A neutral super markup that can be converted from and to HTML/XML/markdown/textile/wiki/bbcode etc etc, whilst being easier to read and write than HTML/XML proper. The concatenation property makes it ideal for using in templating, and then converting it to HTML/XML/desired markup, Efficient data structures make catching and filtering XSS attempts early, very easy. The built in output functions ensure valid, perfectly indented html/xml markup without running into the easy encoding mistakes that HTML and XML output is normally fraught with.
That is all. Cue rotten tomatoes and eggs.

Monday, February 7, 2011

Zion is just another system of control

It seems to me that humans are compelled to live their lives according to some kind of narrative. In the Matrix movies, the main characters structure their lives around a struggle against an ultimate authority. Zion is created by the AI to satisfy that desire.

Many people choose the bible as a narrative to live their lives to. Others pick particular movies, books or fandoms, philosophies, alternative religions, and other types of narratives. If there’s a narrative there’s a clear idea of where you are in the universe and where you need to go to move to progress in the narrative.

Because of the 95%, 4%, 1% rule, (that is, 95% of people are lurkers, 4% are commenters and 1% are creators) 99% of people will tend towards attempting to adopt someone else’s narrative as their own. 4% of them will boost and spread some narrative by talking about it a lot. 1% will probably feel like outcasts most of their lives. But these ones are the true leaders.

Because everything is a remix, there are no true original ideas, Absolutely everyone is in this sense “guilty” of “intellectual property theft”, just by living day to day. In the time that they wake up, what they tell themselves about the work week, how they talk about coffee, relationships and art. It’s all wholesale copies, again and again.

It seems strange to me, (and indeed it is quite a new idea) for some of the %5 percent to try and own that, to own the narrative, charge money for it, and call people things like “Pirates” and “Theives” just for living their lives, in the absolutely natural way that people have always lived for thousands of years.