About Me

I'm just someone struggling against my own inertia to be creative. My current favorite book is "Oh the places you'll go" by Dr. Seuss

Thursday, March 27, 2008

Relational Model; Ur doin it wrong.

Databases: did you know that if you follow the convention of naming columns that represent the same things, with the same name (for instance, part id's named Part_Id in all tables), then you can use the SQL keyword "NATURAL JOIN" to join two tables, without having to add a bunch of "where partid=id" style clauses to your query?

Before, ( in psuedo SQL): SELECT PARTS.ID, PARTS.NAME, PRODUCTS.NAME FROM PARTS JOIN PRODUCTS WHERE PARTS.ID = PRODUCTS.PARTID


After: SELECT PARTS.PARTID, PARTS.PARTNAME, PRODUCTS.PRODUCTNAME FROM PARTS NATURAL JOIN PRODUCTS

Let me put this a different way: If you're following a poor naming convention, you're making your SQL queries longer and more difficult than they need to be!

Well, suppose we assume that everyone follows the "Same thing=same name" convention. Then SQL could potentially assume that you mean NATURAL JOIN by default when you say "JOIN".

SELECT PARTS.PARTID, PARTS.PARTNAME, PRODUCTS.PRODUCTNAME FROM PARTS JOIN PRODUCTS

Well, now that we have that assumption, I can't think of a situation where you would need to mention more than one table in a query, if you weren't joining them. (can you?). So let's assume that whenever a query contains the names of more than two tables, you mean to join them. It'll join on likenamed columns automatically, and you've just saved yourself a whole lot of trouble.

SELECT PARTS.PARTID, PARTS.PARTNAME, PRODUCTS.PRODUCTNAME FROM PARTS, PRODUCTS

in fact, why do we need to mention the table names twice? Once in the select clause, and again in the from clause. We already know which tables these fields are from!

SELECT PARTS.PARTID, PARTS.PARTNAME, PRODUCTS.PRODUCTNAME

We also know that the PRODUCTNAME field can only be found on the products table, and that the PARTNAME field can only be found in the PARTS table, and that the column that they both have in common is the PARTID column, so that's where we can join. So simply by examining the schema, we can get most of this information ourselves. So ultimately all the database really needs from the user to perform this query is

SELECT PARTNAME, PRODUCTNAME

But aren't most queries going to be a select? We can optimise this even further by assuming that the user means select unless they say otherwise. Just don't name any of your columns "INSERT" ;)

PARTNAME, PRODUCTNAME

Wow. So my thinking is that SQL is asking way more from the user than it really needs to perform a query- As long as the schema follows a very common convention which is best practice anyway.

We went from this:

SELECT PARTS.ID, PARTS.NAME, PRODUCTS.NAME FROM PARTS JOIN PRODUCTS WHERE PARTS.ID = PRODUCTS.PARTID

and through our optimisations went to this:
PARTNAME, PRODUCTNAME


Admittedly, there's a lot of little gotchas which I haven't covered in this blog post, but none of them are insurmountable, and surmounting them is well worth the time and effort saved by from typing all those big confusing SQL queries! Also, I have only covered the most basic of queries. There are ways to optimise the more complex queries as well.

In conclusion, SQL sucks, and we need something new, and something better designed.

No comments: