Friday, October 8, 2010

Relational DBMS vs Object-oriented DBMS vs Document-oriented databases issue is moot when you're already using multiple data stores!

The issue of:

1. Relational databases (RDBMS) e.g. MySQL, PostgreSQL, SQLite, JavaDB, HSQLDB... versus
2. Object-Oriented databases (OODBMS) e.g. db4o... versus
3. Document-oriented databases e.g. MongoDB, Apache CouchDB...

...is probably because many developers (including me! :) are not aware that the application is already using multiple data stores at once.

Note: Practically all data stores, except OODBMS, will require some form of roundtrip serialization, object-mapping and code generation to be conveniently usable inside an application, accessible not only via framework/library API but also via domain model objects and classes.

By the way even if you use an OODBMS, if you use features that require specific language support (e.g. Weaving traits and behaviors, like soft deletes, timestampable, nested sets) and the language (e.g. PHP, Java) doesn't support it, you'll still need to do either code generation, manual coding, or be content with dynamic proxying (say goodbye to IDE code hints!)

Why pick just one data store and debate?

A few I have in mind right now is:

1. Key-value stores.
The application INI configuration file.
Cache storage, be it file-backed, memcached-backed, or even RDBMS-backed! (you're just abstracting/implementing a key-value store on top of a relational schema).
Translation/localization data is sometimes also stored using this store.
Web application Session data also uses key-value (though it does automatic de/serialization).

2. Hierarchical data store / Object store.
XML file.
A hierarchy of folders, subfolders, and files.
In the case of Zend Frameworks's Zend_Config, also INI files.
JSON file.
YAML file.
CSS file (surprising, eh? Though it's only two levels deep.)
I think there's no such thing as an "object store". What exists is a data store that acts like an object repository, that does automatic de/serialization of objects, that stores its data in a hierarchical store or content/document store or hybrid.
How an object is serialized/its serialization representation (i.e. "should a GeoPoint instance be embedded or linked?") should be specifiable through some form of metadata/configuration/annotations.
The object graph, by itself, does not contain sufficient information to be managed as a data store.

3. Content/Document store.
A content store doesn't always restrict a type to a schema. It allows arbitrary properties to be attached to a content node.
Some content stores also allow nesting (hierarchical content stores).
The filesystem (fixed standard schema), but some filesystems support additional metadata.
The node API in Drupal (implemented on top of RDBMS).
The EAV tables in Magento (implemented on top of RDBMS).
Natively: the JCR (Java Content Repository) in Alfresco & Jackrabbit (Java), MongoDB, Apache CouchDB, Persevere.

4. Binary/BLOB store.
It's basically a key-value store with support for huge values and block/partial access.
The filesystem.
S3.

5. Structured markup.
HTML. Wiki format. Markdown. Textile.
All fits in one string.

6. Full text index store.
Apache Lucene.
Zend_Lucene.

7. Search store.
Apache Solr.
The search/faceted navigation index in Magento (built on top of RDBMS).

Did I miss something?