Server setup to host a tool like Google Analytics?

143 Views Asked by At

I am developing a Analytics tool similar to Google Analytics. That will store keywords, visits and pages in a database.

So the database can grow very quickly because I want to have many people using it.

How should I setup the database? One database for all the accounts and all the websites being monitored? Or it would be better to have one database for every account?

Also, I am planning to start with one dedicated server but I'm sure that I will need more than one server in the future so I have to build it keeping that in mind.

I also know that if I do multiple databases for every account then I will have to run upgrade scripts on all of them when the schema of the app will change.

2

There are 2 best solutions below

0
On

What kind of database do you plan to use ? There is a BIG difference between relational (PostgreSQL, MySQL) and "NoSQL" (MongoDB, CouchDB)

I'm only going to talk about PostgreSQL on the relational side since it's the only database I have experience with.

First, I would keep everything in one database. There's no benefit in using a database per account.

Second, you should be absolutely sure you WILL outgrow a single machine. Given the kind of application you'll be dealing with a lot more writes than reads, so a master-slave replication will only serve for high availability, and multi-master replication with PostgreSQL is NOT easy.

From my last research the least painful way to do that was to use a tool like Postgres-XC which is designed to be write-scalable, but I have no idea how production-ready it is.

Another solution is using tools like Bucardo or SkyTools. No experience with SkyTools but I had a lot of trouble getting Bucardo to work last year.

The last solution is to do sharding. The naive way to shard is to do something like shard number = id % 10. However using this you would need to rebalance your cluster whenever you add/remove a shard. It would require that you write your application "shard-aware" so that you direct the queries to the correct shard.

Anyway like I said before, make sure you will NEED to shard/clusterize first.

Now for the "NoSQL" side, I have no experience with any of the solutions, but I do know that MongoDB and CouchDB handle sharding themselves so it's way easier with those solutions, however you give up quite a lot.

0
On

I'll expand a bit on Vincent's answer.

As for sharding we have had good experience with PL/Proxy. And with sharding you can outgrow single machine without issues (read or write).

As for replication Londiste from Skytools is very easy to set up and use. And with it you get PgQ, quite nice messaging solution for Postgres.