Accord.Net - CacheSize on LibLinear

269 Views Asked by Dave Bish At 06 June 2017 at 23:33

I'm attempting to classify some inputs (text classification: 10,000+ examples, and 100,000+ features)

And I've read that using LibLinear is far faster / more memory efficient for such tasks, as such, I've ported my LibSvm classifier to accord/net, like so:

        //SVM Settings
        var teacher = new MulticlassSupportVectorLearning<Linear, Sparse<double>>()
        {
            //Using LIBLINEAR's L2-loss SVC dual for each SVM
            Learner = (p) => new LinearDualCoordinateDescent<Linear, Sparse<double>>()
            {
                Loss = Loss.L2,
                Complexity = 1,
            }
        };

        var inputs = allTerms.Select(t => new Sparse<double>(t.Sentence.Select(s => s.Index).ToArray(), t.Sentence.Select(s => (double)s.Value).ToArray())).ToArray();

        var classes = allTerms.Select(t => t.Class).ToArray();

        //Train the model
        var model = teacher.Learn(inputs, classes);

At the point of .Learn() - I get an instant OutOfMemoryExcpetion.

I've seen there's a CacheSize setting in the documentation, however, I cannot find where I can lower this setting, as is show in many examples.

One possible reason - I'm using the 'Hash trick' instead of indices - is Accord.Net attempting to allocate an array of the full hash space? (probably close to int.MaxValue) if so - is there any way to avoid this?

Any help is most appreciated!

Original Q&A

There are 1 best solutions below

papadoble151 On 15 June 2017 at 16:07

Allocating hash space of 10000+ documents with 100000+ features will take at least 4 GB of memory, which may be limited by the AppDomain memory limit and CLR object size limit. Many projects by default are prefered to be built under 32-bit platform, which does not allow allocation of objects more than 2GB. I've managed to overcome this by removing 32-bit platform prefernce (go to project properties -> build and uncheck "Prefer 32-bit"). After that we should allow creation of objects more taking more than 2 GB or memory, add this line to your configuration file

<runtime>
    <gcAllowVeryLargeObjects enabled="true" />
</runtime>

Be aware that if you add this line but leave the 32-bit platform build preference you will still get an exception, as your project will not be able to allocate an array of such size

This is how you tune the CacheSize

//SVM Settings
    var teacher = new MulticlassSupportVectorLearning<Linear, Sparse<double>>()
    {
        Learner = (p) => new SequentialMinimalOptimization<Linear, Sparse<double>>()
        {
            CacheSize = 1000
            Complexity = 1,
        }
    };

    var inputs = allTerms.Select(t => new Sparse<double>(t.Sentence.Select(s => s.Index).ToArray(), t.Sentence.Select(s => (double)s.Value).ToArray())).ToArray();

    var classes = allTerms.Select(t => t.Class).ToArray();

    //Train the model
    var model = teacher.Learn(inputs, classes);

This way of constructing an SVM does cope with Sparse<double> data structure, but it is not using LibLinear. If you open Accord.NET repository and look at SVM solving algorithms with LibLinear support (LinearCoordinateDescent, LinearNewtonMethod) you will see no CacheSize property.

Accord.Net - CacheSize on LibLinear

There are 1 best solutions below

Related Questions in C#

Related Questions in MACHINE-LEARNING

Related Questions in LIBSVM

Related Questions in ACCORD.NET

Related Questions in LIBLINEAR

Trending Questions

Popular # Hahtags

Popular Questions