I have a very complex LSTM based neural network model which I'm training on Quora Duplicate Question pairs. There are approximately 400 000 sentence pairs in the original dataset. It would take a lot of processing power and computation time to train on the entire (or 80%) dataset. Would it be unwise if I choose a random subset of the dataset (say 8000 pairs only) for training and 2000 for testing? Would it have a severe impact on the performance? Is always "more the data, better the model" true?
1
There are 1 best solutions below
Related Questions in DEEP-LEARNING
- [Caffe]: Check failed: ShapeEquals(proto) shape mismatch (reshape not set)
- Caffe net.predict() outputs random results (GoogleNet)
- Implementation of convolutional sparse coding in deep networks frameworks
- Matlab example code for deep belief network for classification
- Two errors while running Caffe
- How to speed up caffe classifer in python
- Caffe Framework Runtest Core dumped error
- Scan function from Theano replicates non_sequences shared variables
- Why bad accuracy with neural network?
- Word2Vec Sentiment Classification with R and H2O
- What is gradInput and gradOutput in Torch7's 'nn' package?
- Error while drawing net in Caffe
- How does Caffe determine the number of neurons in each layer?
- Conclusion from PCA of dataset
- Google Deep Dream art: how to pick a layer in a neural network and enhance it
Related Questions in NLP
- command line parameter in word2vec
- Annotator dependencies: UIMA Type Capabilities?
- term frequency over time: how to plot +200 graphs in one plot with Python/pandas/matplotlib?
- Stanford Entity Recognizer (caseless) in Python Nltk
- How to interpret scikit's learn confusion matrix and classification report?
- Detect (predefined) topics in natural text
- Amazon Machine Learning for sentiment analysis
- How to Train an Input File containing lines of text in NLTK Python
- What exactly is the difference between AnalysisEngine and CAS Consumer?
- keywords in NEGATIVE Sentiment using sentiment Analysis(stanfordNLP)
- MaxEnt classifier implementation in java for linguistic features?
- Are word-vector orientations universal?
- Stanford Parser - Factored model and PCFG
- Training a Custom Model using Java Code - Stanford NER
- Topic or Tag suggestion algorithm
Related Questions in SEMANTIC-COMPARISON
- Comparing nested object properties using SemanticComparison
- Applying [AutoFixture] SemanticComparison OfLikeness to sequences / collections / arrays / IEnumerable
- Should I be using whole available data for training my deep learning model ? What are the pros and cons of using only a subset?
- Use dkpro semantic similarity with uby
- Verifying complete Mapping of an [unordered] Collection/Set of Items in a Unit Test
- Likeness - polishing and packaging
- NUnit: Track down difference between expected and actual in composite result using SemanticComparison .NET library
- Is it possible in AutoFixture SemanticComparison to set custom comparer for specified type
- Why can't I create a likeness proxy when abstract class not exposing every constructor argument?
- How to compare two anonymous types or two collection of different types using SemanticComparison
- Why doesn't simple test pass using AutoFixture Freeze, SemanticComparison Likeness and CreateProxy?
- Why doesn't Autofixture Likeness behave like I'd expect for one of these two tests?
- Trouble using Autofixture's CreateProxy to use Likeness, SemanticComparison features
Trending Questions
- UIImageView Frame Doesn't Reflect Constraints
- Is it possible to use adb commands to click on a view by finding its ID?
- How to create a new web character symbol recognizable by html/javascript?
- Why isn't my CSS3 animation smooth in Google Chrome (but very smooth on other browsers)?
- Heap Gives Page Fault
- Connect ffmpeg to Visual Studio 2008
- Both Object- and ValueAnimator jumps when Duration is set above API LvL 24
- How to avoid default initialization of objects in std::vector?
- second argument of the command line arguments in a format other than char** argv or char* argv[]
- How to improve efficiency of algorithm which generates next lexicographic permutation?
- Navigating to the another actvity app getting crash in android
- How to read the particular message format in android and store in sqlite database?
- Resetting inventory status after order is cancelled
- Efficiently compute powers of X in SSE/AVX
- Insert into an external database using ajax and php : POST 500 (Internal Server Error)
Popular Questions
- How do I undo the most recent local commits in Git?
- How can I remove a specific item from an array in JavaScript?
- How do I delete a Git branch locally and remotely?
- Find all files containing a specific text (string) on Linux?
- How do I revert a Git repository to a previous commit?
- How do I create an HTML button that acts like a link?
- How do I check out a remote Git branch?
- How do I force "git pull" to overwrite local files?
- How do I list all files of a directory?
- How to check whether a string contains a substring in JavaScript?
- How do I redirect to another webpage?
- How can I iterate over rows in a Pandas DataFrame?
- How do I convert a String to an int in Java?
- Does Python have a string 'contains' substring method?
- How do I check if a string contains a specific word?
As a Rule of Thumb, Deep Neural Networks usually benefit from more data.
If you have a well described model and properly engineered your inputs, you will lose if you chose a smaller subset of your dataset.
However, you could always evaluate this by using metrics. Check how your loss decreases at every sample size, starting from your 8000 pairs.
For big problems, you always have to keep in mind that computation time is usually also big.