Tiny URL Desing - Zero Collision Design with ZooKeeper - Issues

333 Views Asked by At

Here is a popular design available in internet of Tiny URL application:

  1. The application is loadbalanced with the help of a zookeeper, where each server is assigned a counter range registerd with zookeeper.

  2. Each application is having an locally available counter, which increments on every write request.

  3. There is Cache available which(probably) gets updated with each write request.

Gaps in my understanding:

  1. For every write request, we dont check if a tiny url exists in the db for the large URL..so we keep on inserting all write requests(even though a tiny url already exist for that particular large URL). Is that correct? If so then would there be a clean up activity(removing redundant duplicate tiny urls for same large URL) at some intentional downtime of application in the day?

  2. What is the point of scaling...if for 1 million(or more) range of counter value there is just one server handling the request. Wouldn't there be a problem..? say for example there is large scale writing operation, would there be a vertical scaling to avoid slowness?

Kindly correct if there if I have got anything wrong here.

1

There are 1 best solutions below

0
On

Design problems are open ended; keeping that in mind, here is my take on your questions.

  1. Why there is no check if a large URL is already in the database

It may be a requirement to allow users to have their own tiny urls, even if they point to the same large url. For example, every use might want to see stats on how many times their specific tiny url was clicked one; this is a typical usage for tiny urls - put them into a blog/video/letter to get stats.

  1. Scaling the service

Let me extend "each server is assigned a counter range registered". This implies that generated IDs have structure X bits of service id + Y bits from local counter. X bits are assigned by the zookeeper, and this is what makes each server responsible for one range.

Several server will be placed behind a load balancer. When a request comes to the load balancer, the request will be sent to a randomly picked server. If servers are overloaded, you could just add more servers behind the load balancer, each of those servers owns its own range. This will allow the service as a whole to scale up and down (and no need in vertical scaling).

The key understanding to this design is that those ranges are arbitrary ranges. There is no need for them to be consequential.