Offline using cached models from huggingface pretrained

820 Views Asked by At

I want to use models from: https://huggingface.co/ARTeLab/mbart-summarization-mlsum in offline mode, meaning that after downloading them from Hugging Face, they will be saved locally and I will be able to use them offline. However, I don't know how to do this. If anyone has already figured this out, please advise me. I use these lines to download models:

from transformers import MBartTokenizer, MBartForConditionalGeneration
tokenizer = MBartTokenizer.from_pretrained("ARTeLab/mbart-summarization-mlsum")
model = MBartForConditionalGeneration.from_pretrained("ARTeLab/mbart-summarization-mlsum")

The problem is that when I run this line, I download several files from the repository at once, and I don’t know which one is then used for tokenization:

tokenizer = MBartTokenizer.from_pretrained("ARTeLab/mbart-summarization-mlsum")

enter image description here

I will be glad to receive your advice and tips!

1

There are 1 best solutions below

0
On

HuggingFace includes a caching mechanism. Whenever you load a model, a tokenizer, or a dataset, the files are downloaded and kept in a local cache for further utilization.

You can get more information about cache management here: https://huggingface.co/docs/datasets/cache

You can use HuggingFace in offline mode: https://huggingface.co/docs/transformers/v4.31.0/installation#offline-mode

The problem is that when I run this line, I download several files from the repository at once, and I don’t know which one is then used for tokenization:

You need to download all those files to be able to load and use the tokenizer.