About Me

I'm just someone struggling against my own inertia to be creative. My current favorite book is "Oh the places you'll go" by Dr. Seuss

Thursday, January 22, 2009

Where does the logic go?

There's a trend in data centric applications. The trend is to move more and more of the contraints and logic out of the database software, and into the application code.

The trend results from the fact that the software technology industry is populated and driven largely by humans, and thus subject to trends and irrational behavior. To understand what's going on today requires a bit of perspective in the history of databases, and their parallel development with programming languages.

SQL is the IE6 of the database languages world. It breaks many of the rules of the relational model- in other words, it's a little bit like a calculator that performs multiplication incorrectly, and doesn't have a minus operator. SQL is not complete enough to be a real solution. It was never developed beyond the prototype stage, and was never meant to be used in industrial settings. But then it was naively used by oracle, which turned out to be a "killer app", SQL became industry standard instead of its technically superior competitors, and the rest is history. SQL's syntax is based around a set of command line tabular data processing tools, and COBOL. Full of bugs, inconsistencies, and a mishmash proprietary versions and features that don't have a grounding in math or logic, results in a situation where it really is unclear what goes where.

Regarding the recent proliferation of ORMs: misguided and ill thought out attempts to patch over the obvious deficiencies of SQL. Database triggers and procedures are another misfeature trying to patch over SQL's problems.

If history had played out in a logical and orderly way, the answer to this question would be simple: Just follow the rules of the relational model and everything will work itself out. Unfortunately, the rules of the relational model don't fit cleanly into the current crop of SQL based DBMS's, so some application level fiddling, or triggers, or whatever other stupid patch is unfortunately necessary, and it ends up being a matter of subjective opinion, rather than reasoned argument, which stupid hack you use.

So the real answer is to just follow the relational model as close as you can, and then fudge it the rest of the way. Put the logic in the application if you're the only one using the db, and you need to keep all your source code in a version repository. If multiple applications are likely to use the database, make the DB as bullet proof and self sufficient as it can be- The main goal here is to ensure that the data remains consistent.

11 comments:

Maksim Lin said...

Theres nothing intrinsically wrong with SQL - its a tool like any other that should be used for the purposes it was intended for.
You're also confusing structured databases that provide an SQL interface to the data they store, with SQL the language.
As for ORM, it attempts to solve the problem of the mis-match between the paradigm of OOP and relational data. No matter how much better you think the "true" relational model is over current implementations, it still differs from that of OO and hence there will always be a need to translate one to the other. You could of course just forget about storing your data in the magical relational data model and then you wouldn't have to worry about ORM at all.

Breton Slivka said...

There's no impedence mismatch between the relational model, and object oriented programming. The impedence mismatch is between SQL and OOP, and the logically incomplete datastores that SQL interfaces with.

There *is* something intrinsically wrong with SQL. There is something called the "Relational Algebra", and something else called "Relational Calculous". SQL is based around the latter, but it is not a complete implementation, and it breaks certain essential rules by allowing the presence of "null", allowing duplicates in structures which are meant to be unordered sets, and by making the order of operations significant. In SQL, extremely useful laws of associativity which exist in the relational model, do not exist, and are not dependable in SQL based databases.

ORM wouldn't be necessary if there were any implementations of the relational model. You could just store your objects directly as a value in a tuple. This is something that we've had some friction about in the past- But the reason this would work is that a class is exactly analogous to a data type, and the relational model imposes no restrictions on what data types you can define as storable in a value.

SQL is "worse-is-better". SQL is a "New Jersey" design, while "QUEL" was the MIT design.

http://www.jwz.org/doc/worse-is-better.html

Breton Slivka said...

The last time we had this conversation you argued that classes are not the same thing as types, because they also include methods.

I would argue though, that this is not a distinguishing feature, because types also package "methods", except that they're usually called "operators".

Are there any other features of classes which could significantly distinguish them from types?

Breton Slivka said...

One may also ask how you would query such a data store. Since the data store would contain a definition of the class, it would also contain its methods/operators.

for instance:
SELECT job, ordernumber FROM orders where job.category.equals("shipping") and ordernumber > 25

This probably looks silly, but it's certainly possible, and the relational model allows this. Why hasn't anyone ever implemented it?

Breton Slivka said...

And of course you don't have to take my word for it. Give it a try:

http://dbappbuilder.sourceforge.net/Rel.php

Maksim Lin said...

How can there be no mismatch between the them, when 2 of the core faults of "SQL" DBs that you object too: nulls and duplicate tuples/objects are perfectly acceptably in OOP.

Maksim Lin said...

Also as I said in my orginal comment - it is unwise to confuse SQL the language with the concept of a structured datastore, which is acutlly what "sql" databse applciations really are. There is nothing intrisinically linking the 2 - take note of the fact that Amazons simpledb service now provides a SQL interface - but is that service now suddenly morphed magically into a "SQL db" ?

Breton Slivka said...

"Also as I said in my orginal comment - it is unwise to confuse SQL the language with the concept of a structured datastore, which is acutlly what "sql" databse applciations really are."

It is the SQL standard which expects these relational model violations. Even if you have a perfectly relational store, you have to somehow "emulate" nulls, and duplicates for the sake of conforming to the SQL standard, for the SQL interface. That's how I understand the situation anyway. If you didn't, you wouldn't be able to port scripts from one DB to another, because you'd end up with a vastly different schema, produced from the same source code. Or even just a giant error message. Which kind of defeats the purpose of SQL being a uniform interface, doesn't it?

"How can there be no mismatch between the them, when 2 of the core faults of "SQL" DBs that you object too: nulls and duplicate tuples/objects are perfectly acceptably in OOP."

Because percieved mismatch comes from the same false analogy that I'm building up in my JSON/XML series. Some otherwise smart guys walked up to the problem of storing objects in a relational database, and started with the logical thing: Looking for analogies and similarities between the datastructures in an OOP language, and the data structures in a relational database. They just came to the wrong conclusion by equating ROWS with object instances, and table headers with classes. Yes, there is an impedence mismatch there, but that's because you made a mistake.

However, when you come to the correct conclusion, that classes are analogous to TYPES, and object instances are analogous to attribute values, then there is no mismatch. Yes, an object can have a null in it, if its type defines that it can have a null. But look at it from this perspective:

A relation is basically a predicate. A relation header is semantically a statement of fact, with some madlibs style blanks in it. Each tuple row is meant to fill in the blanks of that predicate. so for example....


My product has id D and it is N inches wide and M inches tall and it is named X and belongs to product range Y

nulls don't make sense because it's like stating as FACT:

My product has id 6 and it is I DON'T KNOW inches wide, and I DON'T KNOW inches tall, it is named SCREWDRIVER and belongs to product range TOOLS

Now that's ridiculous. If you don't know the answer, then it's not a fact, and it shouldn't be in the DB. Its presence in the DB actually leads to a great number of bugs and various unexpected behavior.

Duplicates don't make sense for the same reason. It's like stating as FACT the same thing twice. It's redundant, and again, its presence causes problems.


now if you needed to store that information in the database, and you truly don't know how wide or tall that object is, you would instead put this peice of information into the DB

My product has id 6 and it is named SCREWDRIVER and belongs to product range TOOL

and you would put product dimensions in seperately.

This all makes sense to me, but mostly because I've taken a class in logic. I'm not sure that I would have been quite as convinced by this argument if I hadn't. But you know, logic is an important thing in a computer program. In fact, it's everything, and if the basis of your program is a system that screwed up in logic 101, then it doesn't bode well for the rest of the system.

Unfortunately, as I noted, if you're using an SQL interface- Even on a system that has any arbitrary kind of data store- The SQL standard necessitates the presence or emulation of these bugs, unfortunately. It's necessary for interoperability. If you don't emulate those things, why bother with the SQL interface to begin with?

You'd never be able to import any database which happens to have a null or duplicate row. It would just vomit at you, or transform your data into something you don't recognize, and you'd get a bunch of programmers throwing a hissy fit.

Breton Slivka said...

"Now that's ridiculous. If you don't know the answer, then it's not a fact, and it shouldn't be in the DB. Its presence in the DB actually leads to a great number of bugs and various unexpected behavior. "

a lot of this problem comes from the fact that it's not clear what sort of "I don't know" null represents. Does the product not have any dimensions? Does it have dimensions but you just don't know what they are? Do you know what they are but you haven't entered them into the database yet?

"How can there be no mismatch between the them, when 2 of the core faults of "SQL" DBs that you object too: nulls and duplicate tuples/objects are perfectly acceptably in OOP."

I just spotted this, but there's a logical fallacy in this question. The NULL in object oriented programming is not the same as NULL in an SQL database. They mean vastly different things.

Breton Slivka said...
This comment has been removed by the author.
Breton Slivka said...

Also, see this:
http://qconlondon.com/london-2009/presentation/Null+References:+The+Billion+Dollar+Mistake