I've been thrown into the deep end a bit with a task at work. I need to use DistilBERT for a multi-class text classification problem, but here's the kicker the dataset is gigantic - we're talking millions of samples!
I've been messing around with it, and DistilBERT does seem to do the job well. However, training takes forever So, here are my dilemmas:
Model Training
: How can I make DistilBERT handle this beast of a dataset more efficiently? Anyone got experience tweaking the training strategy, batch size, learning rate, etc.?
Hardware Constraints
: Any hardware magic tricks to pull off? Is splurging on a fancy GPU the only way, or are there some tricks I don't know about?
Inference Speed
: I also need to make sure the model can quickly classify new data after training. What are my options?
Any help would be a lifesaver!