How much time can data preprocessing and annotation for fine tuning an LLM take for training it on around 1k docs

27 Views Asked by Kunj Mehta At 20 March 2024 at 17:48

For data preprocessing, I am estimating having to do data cleaning, text normalization, parsing, tokenization, handling jargon, and data structuring for a question-answer tasked LLM.

I want to get an estimate of how much labor and time preprocessing and annotation can take if I am training my LLM on a corpus of around 1000 legal documents, each of approx. 100-200 pages. My base model is pile-of-law/legalbert-large-1.7M-2 (https://huggingface.co/pile-of-law/legalbert-large-1.7M-2) which I will further fine tune with more specific documents.

I am still working on estimating the timeline of my project, and have looked at some pre-trained base models for my use case so far.

Original Q&A

How much time can data preprocessing and annotation for fine tuning an LLM take for training it on around 1k docs

There are 0 best solutions below

Related Questions in DATA-ANNOTATIONS

Related Questions in LARGE-LANGUAGE-MODEL

Related Questions in HUGGINGFACE

Related Questions in DATA-PREPROCESSING

Trending Questions

Popular # Hahtags

Popular Questions