I need to develop a basic .NET document management system with the following specifications:
The data should be portable and self contained, so I will serialize the documents (typical formats include Word, PDF, Excel and Powerpoint) into binary data. I will then store said binary data in a SQL Server 2005 database. When a user needs to download a document, the system will deserialize the binary data and will present it in is original format.
The average row size cannot be bigger than 200k.
We expect a maximum of 500 documents will be uploaded monthly for a period of three years.
We don't expect the size of the database to ever go over 6 GB
We have maximum target of 20,000 people that potentially would access the system at the same time.
My question is: How robust does the technology need to be in order to offer solid performance, prevent site downtime, etc?
I am a novice developer and am not familiar with this kind of architecture and design.
What's the reason for needing to store the files in the database, instead of just storing the path of the document on a file server or CDN? Would be a lot less load on your DB server, and give you more flexible options for document storage.
If you're having issues with moved/deleted files in a system like the one I suggested, then perhaps also consider other options, such as:
In the end, a database-only solution may be simpler, but I wouldn't underestimate the load you may hit upon by storing large files for tens of thousands of users.