In-Memory SQL/NoSQL - Keep the ability to query - Maybe use MySQL or SQL Server

575 Views Asked by At

I'd I like to be able to run a set of records/documents in a data store but have those records fresh in RAM at all times. I only need to use about 1.5GB to 2GB of RAM (potentially less). This would be server-based not embedded.

I know in-memory key-value stores will not work for me, because I need rich query capability.

I know MySQL can do memory based tables, but not sure if people use this feature for what I'm thinking about. Can SQL Server pin tables in memory? The entire table?

What other data stores should I be looking at? Map/Reduce style querying would also be fine if fast.

4

There are 4 best solutions below

2
On

VoltDB is an in-memory ACID-compliant SQL database that is optimized for high velocity OLTP. It runs on 64-bit Linux or Mac OS X, has an open source community edition, and client libraries in several languages.

When you say "rich query capability" it sounds like it could be more of an OLAP use case, but if the queries are of basic complexity, it could be a good fit for VoltDB. Could you share more about your use case?

3
On

I have to agree with Lirik here. In .Net terms using objects for your data and LINQ to query a collection of them has to be one of the fastest ways to do what you are asking for whilst still having a rich query language at your fingertips.

If you are worried about the size of the collection and if you will be able to hold all of the information in memory then you could have a look at projects such as Memcached to help you.

UPDATE

I have built an example of using Linq with a criteria object to query a list of products (could easily be a Linq to Sql datatable though)

First an example product class:

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;

namespace Linq_Question
{
    public class Product
    {
        public enum Categories
        {
            CatOne,
            CatTwo,
            CatThree
        }

        public int Id { get; set; }
        public string Name { get; set; }
        public Categories Category { get; set; }
        public decimal Price { get; set; }

    }
}

Now an example Product Criteria class

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;

namespace Linq_Question
{
    public class ProductCriteria
    {
        public Product.Categories? WhereCategoryIs { get; set; }
        public decimal? WherePriceIsGreaterThan { get; set; }
        public decimal? WherePriceIsLessThan { get; set; }
        public string WhereNameContains { get; set; }

        public ProductCriteria()
        {

        }
    }
}

An example repository - using lists

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Linq.Expressions;

namespace Linq_Question
{
    public class ProductRepository
    {
        private List<Product> products;

        public ProductRepository()
        {
            products = new List<Product>();
            products.Add(new Product() { Category = Product.Categories.CatOne, Id = 1, Name = "Product 1", Price = 100 });
            products.Add(new Product() { Category = Product.Categories.CatTwo, Id = 2, Name = "Product 2", Price = 120 });
            products.Add(new Product() { Category = Product.Categories.CatThree, Id = 3, Name = "Product 3", Price = 300 });
            products.Add(new Product() { Category = Product.Categories.CatOne, Id = 4, Name = "Product 4", Price = 400 });
            products.Add(new Product() { Category = Product.Categories.CatTwo, Id = 5, Name = "Product 5", Price = 500 });
            products.Add(new Product() { Category = Product.Categories.CatThree, Id = 6, Name = "Product 6", Price = 600 });
        }

        public IEnumerable<Product> Retrieve(ProductCriteria criteria)
        {
            return this.products.Where(FilterProducts(criteria));
        }

        private Func<Product, bool> FilterProducts(ProductCriteria criteria)
        {
            Expression<Func<Product, bool>> predicate = PredicateBuilder.True<Product>();
            List<IProductFilter> filters = new List<IProductFilter>();
            filters.Add(new PriceIsGreaterThanFilter());
            filters.Add(new CategoryFilter());

            foreach (var item in filters)
            {
                if (item.IsValidFilter(criteria))
                {
                    predicate = predicate.And(item.ApplyFilter(criteria));
                }
            }
            return predicate.Compile();
        }
    }
}

Notice in the FilterProducts method that a list of filters are looped over each one is checked to see if it is a valid filter given the current criteria object and then applied if required.

Here is the IProductFilter interface and some example filters

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Linq.Expressions;

namespace Linq_Question
{
    public interface IProductFilter
    {
        bool IsValidFilter(ProductCriteria criteria);
        Expression<Func<Product, bool>> ApplyFilter(ProductCriteria criteria);
    }

    public class CategoryFilter : IProductFilter
    {
        public bool IsValidFilter(ProductCriteria criteria)
        {
            return (criteria.WhereCategoryIs.HasValue);
        }

        public Expression<Func<Product, bool>> ApplyFilter(ProductCriteria criteria)
        {
            return (p => p.Category == criteria.WhereCategoryIs.GetValueOrDefault());
        }
    }

    public class PriceIsGreaterThanFilter : IProductFilter
    {
        public bool IsValidFilter(ProductCriteria criteria)
        {
            return (criteria.WherePriceIsGreaterThan.HasValue);
        }

        public Expression<Func<Product, bool>> ApplyFilter(ProductCriteria criteria)
        {
            return (p => p.Price > criteria.WherePriceIsGreaterThan.GetValueOrDefault());
        }
    }
}

Please note you will need the PredicateBuilder class - found http://www.albahari.com/nutshell/predicatebuilder.aspx

using System;
using System.Linq;
using System.Linq.Expressions;
using System.Collections.Generic;

namespace Linq_Question
{
    public static class PredicateBuilder
    {
        public static Expression<Func<T, bool>> True<T>() { return f => true; }
        public static Expression<Func<T, bool>> False<T>() { return f => false; }

        public static Expression<Func<T, bool>> Or<T>(this Expression<Func<T, bool>> expr1,
                                                            Expression<Func<T, bool>> expr2)
        {
            var invokedExpr = Expression.Invoke(expr2, expr1.Parameters.Cast<Expression>());
            return Expression.Lambda<Func<T, bool>>
                  (Expression.OrElse(expr1.Body, invokedExpr), expr1.Parameters);
        }

        public static Expression<Func<T, bool>> And<T>(this Expression<Func<T, bool>> expr1,
                                                             Expression<Func<T, bool>> expr2)
        {
            var invokedExpr = Expression.Invoke(expr2, expr1.Parameters.Cast<Expression>());
            return Expression.Lambda<Func<T, bool>>
                  (Expression.AndAlso(expr1.Body, invokedExpr), expr1.Parameters);
        }
    }
}

Finally, here is a small console app showing the idea in action:

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;

namespace Linq_Question
{
    class Program
    {
        static void Main(string[] args)
        {
            ProductRepository repo = new ProductRepository();
            Console.WriteLine("Items over 100");
            foreach (var item in repo.Retrieve(new ProductCriteria() { WherePriceIsGreaterThan = 100 }))
            {
                Console.WriteLine(string.Format("Name {0}, Category {1}, Price {2}", item.Name, item.Category, item.Price));
            }
            Console.WriteLine("Items with a Category of Two");
            foreach (var item in repo.Retrieve(new ProductCriteria() { WhereCategoryIs = Product.Categories.CatTwo }))
            {
                Console.WriteLine(string.Format("Name {0}, Category {1}, Price {2}", item.Name, item.Category, item.Price));
            }

            Console.Read();

        }
    }
}

You could expand on this idea to add multiple filters and even work out if the function returned from the IProductFilter should be AND or OR'd to the expression.

The filters could be injected in to the repository - making them easy to change at runtime.

I hope this gives you some ideas.

0
On

SQLLite can run in memory for a relational database solution. For a non-relational database, RavenDB can run in memory.

7
On

If you can represent your data as a collection of objects, then you can just use Linq as your "rich query" capability and do it all in memory: this will be MUCH faster than pretty much all of the database solutions out there. If you can keep all of your data in memory, then don't even bother with a database.

Note: if you're doing this in .NET, then you'll have to build your project in 64-bit mode because you cannot have a collection that has more than ~1.5 GB of data in RAM on a 32-bit application. If you can't build/run in 64-bit mode, then you might need a database.

Update

I'm not sure I'm following what you're saying: I'm not really sure what a prevalent system is and that would lead me to believe that I'm not using a prevalent system. I also don't know what a "homegrown" object is or what is a "homegrown" .NET system. I'm using just the standard .NET 4.0 and when I say objects, I specifically meant that you write classes which can hold your data. From there on, you load your data in memory (I don't know how you get your data: file, database, network, etc) then Linq does the ORM for you. Thread safety only matters when you're accessing your internal collection of objects and you can do that with a simple lock.

Oh, and if it is homegrown .NET system - are these objects hosted on a server and getting hit by multiple clients?

For the data I'm dealing with, I don't need any kind of rich query capability (neither in memory, nor to get it from a database), that's why we store anything we can't fit into memory into an embedded database (LevelDB). This way, all of our data is contained on disk and when we need it, we take the shortest/fastest route to get it into memory. Going over a network will only slow you down, so you can do that, but only do it when you absolutely have to (i.e. you have a hard constraint that your 2GB data is on a database somewhere other than your local machine).