What's the exact input size in MultiHead-Attention of BERT?

16 Views Asked by TomWu At 21 March 2024 at 15:25

I just recently learned BERT.

Some tutorials show that after embedding a sentence, a matrix X of [seq_len, 768] will be formed, and X will be sent to MultiHead_Attention, that is, multiple Self-Attentions.

But in fasterTransformer, why is the input [seq_len, head_num, size_per_head]? It seems that it divides the matrix X equally according to the number of heads and sends it to each head, instead of the complete matrix X.

So what is the real input?

Original Q&A

What's the exact input size in MultiHead-Attention of BERT?

There are 0 best solutions below

Related Questions in BERT-LANGUAGE-MODEL

Related Questions in TRANSFORMER-MODEL

Related Questions in ATTENTION-MODEL

Related Questions in MULTIHEAD-ATTENTION

Trending Questions

Popular # Hahtags

Popular Questions