How to Find Positional embeddings from BARTTokenizer?

119 Views Asked by New_user At 27 July 2025 at 18:09

The objective is to add token embeddings (customized- obtained using different model) and the positional Embeddings.

Is there a Way I can find out positonal embedding along with the token embeddings for an article(length 500-1000 words) using BART model.

tokenized_sequence = tokenizer(sentence, padding='max_length', truncation=True, max_length=512, return_tensors="pt")

the output is input_ids and attention_mask but not parameter to return position_ids like in BERT model.

bert.embeddings.position_embeddings('YOUR_POSITIONS_IDS')

Or the only way to obtain Positional Embedding is using sinusoidal positional encoding?

Original Q&A

There are 1 best solutions below

Daraan On 11 January 2024 at 13:51 BEST ANSWER

The tokenizer is not responsible for the embeddings. It only generates the ids to be fed into the embedding layer. Barts embeddings are learned, i.e. the embedding come from their own embedding layer.

You can retrieve both types of embeddings like this. Here bart is a BartModel. The encoding is (roughly) done like this:

embed_pos = bart.encoder.embed_positions(input_ids)
inputs_embeds = bart.encoder.embed_tokens(input_ids)
hidden_states = inputs_embeds + embed_pos

Full working code:

from transformers import BartForConditionalGeneration, BartTokenizer

bart = BartForConditionalGeneration.from_pretrained("facebook/bart-base", forced_bos_token_id=0)
tok = BartTokenizer.from_pretrained("facebook/bart-base")
example_english_phrase = "UN Chief Says There Is No <mask> in Syria"
input_ids = tok(example_english_phrase, return_tensors="pt").input_ids

embed_pos = bart.model.encoder.embed_positions(input_ids) * bart.model.encoder.embed_scale # by default the scale is 1.0
inputs_embeds = bart.model.encoder.embed_tokens(input_ids)
hidden_states = inputs_embeds + embed_pos

Note that embed_pos is invariant to the actual token ids. Only their position matters. "New" embeddings are added if the input grows larger without changing the embeddings of the earlier positions:

These cases yield the same embeddings: embed_positions([0, 1]) == embed_positions([123, 241]) == embed_positions([444, 3453, 9344, 3453])[:2]

How to Find Positional embeddings from BARTTokenizer?

There are 1 best solutions below

Related Questions in PYTORCH

Related Questions in NLP

Related Questions in HUGGINGFACE-TRANSFORMERS

Related Questions in SUMMARIZATION

Related Questions in BART

Trending Questions

Popular # Hahtags

Popular Questions