We are specing out a system that will index and store zillions of Syslog messages. These are text messages, with a few attributes (system name, date/time, message type, message body), that are typically 100 to 1500 bytes each.
We generate 2 to 10 gb of these messages per day, and need to retain at least 30 days of them.
The splunk system has a really great indexing and document compression system.
What to use?
I thought of mongodb, but it seems inappropriate for documents of this small size.
SQL Server is a possibility, but seems perhaps not super efficient for this purpose.
Text files with lucene? -- The windows file system doesn't always like dirs with zillions of files
Suggestions ?
Thanks!
There's a company called Boxed Ice that actually builds a server monitoring system using MongoDB. I would argue that it's definitely appropriate.
From a MongoDB perspective, we would say that you are storing lots of small documents with a few attributes. In a case like this MongoDB has several benefits here.
This is well within the type of data range that MongoDB can handle. There are several different methods of handling the 30 day retention periods. These will depend on your reporting needs. I would poke around on the groups for ideas here.
Based on the people I've worked with, this type of insert-heavy logging is one of the places where Mongo tends to be a very good fit.