How can I store 1 billion images on servers uploaded from a web application?

907 Views Asked by At

What is the best way to store 1 billion images? (uploaded by users of website via PHP or Javascript upload)

Since everyone knows storing tons of images (website users uploaded images in this case) are bad inside a single directory or NFS etc, what is the best way, architecture, configuration of the storage solution to store 1 billion images?

How will we organize the users images assuming a single user will not have more than 20 images? Please consider that this has to be organized in a structural way so we can fetch a single user's images via php/javascript or API programmatically through some type of user's unique identifier(s) or hash.

Any open source solution will be preferred. Possible solutions are glusterFS, MongoDB, WeedFS, etc.

Assume the following:

  • Website will have 1 billion page views a month using Linux Debian distros

  • 20 photos per user maximum (10 thumbnails of size 90px by 90px and 10 large, script resized images of having maximum width 500px or maximum height 500px depending on shape of image, meaning square, rectangle, horizontal, vertical etc).

  • A LEMP-stack (Linux Nginx MySQL PHP) social-media type application whose content will be text and images.

  • No third-party cloud storage like S3 etc. It has to be within the private data center using our own hardware and resources.

  • The solution has to include both the storage solution and organizing the images uploaded by users.

During my research, I also came up with the following 2 great articles, in case it helps you clarify my question further.

http://highscalability.com/flickr-architecture

http://perspectives.mvdirona.com/2008/06/30/FacebookNeedleInAHaystackEfficientStorageOfBillionsOfPhotos.aspx

1

There are 1 best solutions below

0
On BEST ANSWER

For the storage part of the project, I would say that you would need something different than a usual file system mounted on dedicated or external disks (SATA, SAS or fiber/SSD).

Glusterfs distributed file system, would be ideal for use a a storage engine, because it can support replicated configurations (for HA) and also distributed (and mixed) configuration to gain in IO speed.

For the organization part of the project, I would think that you should have a main file system (mounted across all clients/web servers), and in this file system you should have separate directories for every user, with two subdirs (one for the high resolution and one for the small resolution pictures).

Finally, the same storage servers can be used as web servers at the same time or we can use different servers (possibly virtual machines XEN, KVM or Vmware). The mounting of the gluster volume to the web servers, should be done with the use of fuse and glusterfs client module (from /etc/fstab). This is a must for the features of the glusterfs to work.