hooking couchdb up to a production server vs. using an intermediary relational db

87 Views Asked by At

Aware of this discussion, had an additional spin on it.

Currently debating the title question among engineers in our organization.

On the one hand, one suggests that it is never a good idea to hook a non-relational database (i.e. CouchDB) up to a production server. His architectural suggestion was to introduce an intermediary relational database to act as a sort of buffer layer between the two (he specifically recommended SQLAlchemy as the ORM for something like Postgres->Flask/Django)

On the other, a different engineer argues that, given our (relatively low) pageviews, we can do the whole thing in CouchDB straight to production.

I'd be curious to learn more about the pros/cons of the nonrelational -> relational -> web page schema vs. just nonrelational -> web page.

1

There are 1 best solutions below

0
On

it is never a good idea to hook a non-relational database (i.e. CouchDB) up to a production server

This sounds dangerously dogmatic to me.

I'm unaware of any good reason to use a relational database between CouchDB and your webservice, at all. What's the 'buffer' database for? What is it giving you that CouchDB is not? Or why use CouchDB at all, if you could just use the relational DB?

The only semi-legitimate argument I can think of is that you need it because you have some relational modelling that you need for your data, and you have to translate to a relational database to properly do that, and provide a sane interface to your webservice. In that case though, you should really just use a relational database standalone (or a graph database, or something else, but not a document store).

On the other hand, there are lots of arguments against, including: the data duplication involved, the potential for data conflicts that must then be somehow found and fixed, the additional complexity of building and maintaining an extra system to monitor the two DBs and keep them in sync, the challenges in translating between their two ways of modelling your data, and the negation of many of the good aspects of CouchDB (flexible schemas, data replication giving no single point of failure, arbitrarily structured data, easy scaling, the convenient HTTP interface, etc)

I'm currently working in a business that's just finished successfully deploying about 20 CouchDBs, distributed across as many datacenters, in use continuously by a couple of thousand servers, servicing millions of users daily. It's definitely production ready, and I'm not aware of any substantial performance or reliability concerns (as long as you know what you're doing with infrastructure setup).