My LSTM network is very slow. What to optimize?

436 Views Asked by At

I have following deeplearning4j network (and other similar)

MultiLayerConfiguration conf = new NeuralNetConfiguration.Builder()
            .updater(new Adam.Builder().learningRate(2e-2).build())
            .l2(1e-5)
            .weightInit(WeightInit.XAVIER)
            .gradientNormalization(GradientNormalization.ClipElementWiseAbsoluteValue).gradientNormalizationThreshold(1.0)
            .list()
            .layer(0, new LSTM.Builder().nIn(vectorSize).nOut(256)
                .activation(Activation.TANH).build())
            .layer(1, new RnnOutputLayer.Builder().activation(Activation.SOFTMAX)
                .lossFunction(LossFunctions.LossFunction.MCXENT).nIn(256).nOut(2).build())
            .build();

Unfortunately training is very slow. My vector size is 400. I have huge amount of samples. What would you suggest to optimize for faster training? Should I decrease inner layer size? Thanks

1

There are 1 best solutions below

2
On BEST ANSWER

Well from my own experience I would first definitely try Activation.SOFTSIGN as your activation function. It does not saturate as quickly, improving its robustness to vanishing gradients.