About Me

I'm just someone struggling against my own inertia to be creative. My current favorite book is "Oh the places you'll go" by Dr. Seuss

Tuesday, September 18, 2007

JSON, XML and the Relational Model part 2

In math, there is something called set theory.

In set theory, there is something called a set.

A set is simply a list of items with two rules. One of the rules is that order is not significant:

{1,3,2,4} is the same set as {3,4,2,1}

The other rule is that repeats are not significant:

{1,1,2,3,4,3} is the same set as {1,2,3,4}

A set can contain more than just numbers. It can contain anything, such as words:

{Julie, Bob, George, Ringo}

and even other sets:


In set theory, there's another concept called a tuple. A tuple is basically the same as a set, except that the two rules above do not apply. For a tuple, order does matter, and you can have repeats.

These two concepts: sets, and tuples, are the basic building blocks for the three popular data formats listed in the title of this series (And many more). There are plenty of other useful concepts in math for data structures. However, computer languages seem to stick mostly to these two for structured data.

Since JSON is essentially a javascript object, it follows javascript's rules. Here's an example of JSON:


As you can see, this is basically a set, with the members

each of these members is a double: a tuple with two members. For instance, the first member is the tuple
{example, "Object"}
. The array tuple,
is also a tuple with two members.
{array, ["George","Ringo","Paul"]}
. The second member of this tuple, is also a tuple.
{array, {"George", "Ringo", "Paul"}}

So we can see that JSON is composed rather simply of nested sets, tuples, and values. Tuples can contain other tuples, tuples can contain sets, sets can contain tuples, and sets can contain sets.

Monday, September 17, 2007

JSON, XML and the relational model Part 1

When we think about computer data, it's often in the form of trees. You see it in the folder structure on your computer's hard drive, in the html document in your browser, and even in the way we classify species.

The tricky part is, what do you do when your data doesn't fit into a tree?

You have a folder of photos from that trip to Costa Rica, and you use one of them in a design project. Do you leave the photo in the Costa Rica folder, or do you move it to the Images subfolder in your project folder? Or do you keep a copy in both? Does it bother you that your computer is storing the same data, in two places, redundantly, simply because it can't comprehend the idea that a resource could be relevent in two places?

The relational model solves that problem, but at a certain mental cost. It is not particularly expensive, it is just a mental mode that people in western society are not used to using- It is a verb oriented model, rather than a noun oriented model. We are used to nouns. Things just are this way, that way, the grass is green, the house is brick, the cheetah is a cat, the platypus is a trieme.

okay so what's the relational model about? Most technical people will tell you it's about tables, but that's rubbish. Ignore them. The idea behind the relational model is that instead of organizing data based on what it *is* or *has* or *belongs to* by arranging it in a heirarchy, you instead describe how it is related to other things. So that photo was *Used* in project b, and *taken* in costa rica. Costa Rica was *photographed* by Dayne, Dayne *created* these photos. These relationships create links between your bits of data, big and small, like photos, or documents, or names, or places.

so how does navigation work? when you want to find all the photos from costa rica type in
get place:"costa rica", photo

if you don't like typing, such queries can easily be created point and click style, with the data that's already in there.

get -> place -> costa rica -> photo
put person thailand person
edit thing denver place
create design arizona thing
photo melbourne

Then the system works out automatically what "costa rica" and "photos" have in common.

I really like the relational model, and I really hate categories. Yeay verbs, boo nouns. Yeay relationships, boo heirarchies.

Friday, September 14, 2007

How I revived the semicolon

I was searching my own name in google, (as I often obsessively do), to see what the e-world thinks of me. Just as a lark, I searched a common misspelling, and found someone on del.icio.us that linked to a post I made to a mailing list. This was the post, and reasoning that influenced David Heinemier Hannson, to use semicolons in the urls of his popular Ruby on Rails framework. Since David is one of what they call "The Alpha Nerds" on the interwebs, this is kind of like the uber nerd version of someone like Bruce Lee deciding that aviator hats are a really cool idea, after having a deep philisophical conversation with me. And then all the other martial arts experts proceed to comment "Bruce Lee is pretty cool, but what the fuck is with the aviator hat?"

so that I may bask in my own glory and personal brillliance, I give you, the people who haven't given up reading in frustration, the post that will ensure the semicolon's place in your browser bars for years to come:

It would seem that my discovery of a rarely used aspect of the HTTP
url scheme, namely the semicolon ; has led to perhaps a level of
unjustified excitement and advocacy. Intent is very difficult to
gleam from carefully reading a spec, despite, or perhaps because of
the high standards specs are held to for preciseness and lack of
ambiguity, which encourage a sort of unnatural writing style which in
all honesty is hard to read and easy to misinterpret. That aside, I
have a question after reading a set of documents by Tim Berners-Lee,
which are more geared toward the intentional side of things rather
than the specification side of things, which you will find here: <
http://www.w3.org/DesignIssues/Axioms.html >

Regarding the issues I'm concerned with, this is my interpretation of
the document, and you can tell me whether it's incorrect or not:

*For REST, it is mostly important that GET have no side effects, and
anything which does have side effects be implemented with POST (or

*forward slashes, as mapped to unix heirarchical structures are used
merely as a matter of convenience, and due to the principals of
opacity, are not neccesarily significant to the resolution of a URI,
except regarding the resolution of relative URI's in which case, such
interpretation should be defined per URI scheme, with client support.
Forward slashes do not indicate a resource, but the entire URI
indicates the resource, and it's entirely up to the server how to
find that resource based on the URL. No procedure for internal URI
resolution is explicitly defined.

*That http defines such a scheme for resolving relative URI's for /
style urls, but not yet for semicolons as indicators of parameters
(though TBL provides such a scheme as a possibility for implementation)

*Therefore, virtually any symbol, including but not limited to
forward slashes and semicolons can be used for the purposes of
mapping to arbitrary data topologies, at the discretion of the URI
scheme designer, which can be interpreted and resolved by the server
in whatever way is appropriate. In addition, a great deal of this
flexibility is left in the http scheme to be used at the discretion
of server programmers. As long as opacity is observed, and URI's
remain idempotent nouns.

*To this add the additional constraint given by roy fielding that
this can only be done as long as the nature of the URI scheme is
clearly communicated by the server to the client. In other words,
there is no assumption made that the client will be able to infer a
URI scheme from a base URI without explicit communication that such a
URI scheme is in use.

*Notable exception is the # fragment identifier which is reserved for
use by clients, and to be defined when registering the mime type for
a document.

*TBL then demonstrates the flexibility of URI's by laying out his
Matrix URI scheme, that is using the ; and = symbols to specify a
matrix information space, which is distinctly incompatible with a
heirarchical space. This does not imply that ; has any inherent
meaning relative to the REST or uri model, but as a scheme designer,
TBL is using it to represent a type of information topology being
made available by a server.

If one is to take all the above statements as true and accurate
interpretations of REST, then solutions to certain problems become
available, which are not possible, or are awkward under a naive
surface impression that REST requires URLS to contain only forward
slashes / to represent information topologies. It is beginning to
look to me that this impression has more to do with Aesthetics, and
search engine technologies, than any specific requirements of REST.

A problem this interpretation potentially solves, for instance, is
that of multiple views of the same information resource. Usually,
this would be accomplished through content negotiation. However this
is only effective if your multiple views all have distinct mime
types. If you have for example, a resource such as < http://
example.com/blog/posts/53 > representing an article on a blog, you
may have a non editable view, and an editable view.

In an ideal world this sort of thing could be specified using a
fragment identifier, such as < http://example.com/blog/posts/53#edit
>, however for this to work on a practical level, it requires client
support which is not likely, and somewhat incompatible with the
fragment identifier scheme defined for html/xml

Another option is to use the forward slash < http://example.com/blog/
posts/53/edit >
However it seems logically innacurate to state that a view is a
hierarchical child of the resource, rather than something more
lateral, or directly related.

If the bullet points above are accurate, then it becomes possible and
reasonable to use a semi colon, or any other appropriate symbol to
represent a lateral space, such as a view. Matrix notation seems
appropriate, since such a view system can be seen as a "matrix" (or
tensor) with 1 axis, and english words representing named points
along that axis.
< http://example.com/blog/posts/53;view=edit > < http://example.com/
blog/posts/53;view=display> < http://example.com/blog/posts/
53;view=hatom>. This makes sense to me as viewing a slice or
subsection of a single resource, which includes multiple possible views.

The caveats being how relative URL's are resolved by clients, whether
the server supports this level of flexibility, and whether
appropriate mechanisms are employed for informing the client of the
URI scheme in use (such as hrefs along all points on a matrix
allowing navigation)

This may be overintellectualizing a simple problem, but the main
purpose of this post is to verify if I have my bulleted
interpretations correct, otherwise my conclusion may be false.