Tuesday, September 23, 2008

Cloud Computing Leaving Relational Databases Behind

One thing you won't find underlying a cloud computing initiative is a relational database. And this is no accident: Relational databases are ill-suited for use within cloud computing environments, argued Geir Magnusson, vice president of engineering at 10Gen, an on-demand platform servicer provider.

Magnusson, who also helped write the Apache Geronimo application server software, spoke at the O'Reilly Web 2.0 conference, being held this week in New York.

"Cloud computing is different kind of technology," he said. "It is different enough it will change how we do things as developers. We will have to re-examine how we build things."

During his talk, Magnusson listed a number of new databases created specifically to work in a cloud computing environment. They include Google's Bigtable, Amazon's SimpleDB, 10Gen's own Mongo, AppJet's AppJet database and the Oracle open-source BerkelyDB.

None of these databases, Magnusson pointed out, are relational ones (He did point out one notable exception, a version MySQL tweaked for Web environments, called Drizzle.

These databases all have characteristics that make them uniquely suited to serving cloud computing-styled applications. Most of these databases can be run in distributed environments -- meaning that they can be spread out over multiple servers in multiple locations. None of them are transactional in nature. And they all sacrifice some advanced querying capability for faster performance. (In many cases, these databases can be queried using object calls, rather than SQL queries, which programmers are far more comfortable working with anyway.)

Although very large relational databases, such as those offered by Oracle, have been implemented in data centers, cloud computing requires a different kind of setup to operate to its full potential. It necessitates that the database material be spread across different locations -- hence the name cloud computing. Executing complex queries across vast geographic distances can slow response time; moreover, it is difficult to design and maintain an architecture to replicate relational data across different locations and keep that data in sync if one location goes down.

"The scale out of [cloud] architectures have properties that are different from the ones we work on," he said. As a result, in cloud environments, "no one is doing relational. Data is being targeted in a clustered fashion," he said.

Magnusson's view was echoed by another speaker at the Web 2.0 conference, Alex Iskold of AdaptiveBlue, a consumer-oriented company that offers a browser plug-in featuring personalized recommendations based on a user's history, using semantic tags and Web services. The company built the service on Amazon's hosted platform services, including SimpleDB. Iskold noted that such a service would not scale up to widespread use if AdaptiveBlue used a relational database for the job.

