I am developing a Analytics tool similar to Google Analytics. That will store keywords, visits and pages in a database.
So the database can grow very quickly because I want to have many people using it.
How should I setup the database? One database for all the accounts and all the websites being monitored? Or it would be better to have one database for every account?
Also, I am planning to start with one dedicated server but I'm sure that I will need more than one server in the future so I have to build it keeping that in mind.
I also know that if I do multiple databases for every account then I will have to run upgrade scripts on all of them when the schema of the app will change.
What kind of database do you plan to use ? There is a BIG difference between relational (PostgreSQL, MySQL) and "NoSQL" (MongoDB, CouchDB)
I'm only going to talk about PostgreSQL on the relational side since it's the only database I have experience with.
First, I would keep everything in one database. There's no benefit in using a database per account.
Second, you should be absolutely sure you WILL outgrow a single machine. Given the kind of application you'll be dealing with a lot more writes than reads, so a master-slave replication will only serve for high availability, and multi-master replication with PostgreSQL is NOT easy.
From my last research the least painful way to do that was to use a tool like Postgres-XC which is designed to be write-scalable, but I have no idea how production-ready it is.
Another solution is using tools like Bucardo or SkyTools. No experience with SkyTools but I had a lot of trouble getting Bucardo to work last year.
The last solution is to do sharding. The naive way to shard is to do something like
shard number = id % 10
. However using this you would need to rebalance your cluster whenever you add/remove a shard. It would require that you write your application "shard-aware" so that you direct the queries to the correct shard.Anyway like I said before, make sure you will NEED to shard/clusterize first.
Now for the "NoSQL" side, I have no experience with any of the solutions, but I do know that MongoDB and CouchDB handle sharding themselves so it's way easier with those solutions, however you give up quite a lot.