Converting a GPT2 h5 model to torch for conversion to ggml - shape mismatches

Question

Converting a GPT2 h5 model to torch for conversion to ggml - shape mismatches

33 Views Asked by Twenkid At 01 March 2024 at 18:30

I want to convert a .h5 gpt2-medium model to ggml. More details: https://github.com/ggerganov/ggml/issues/745

The process goes with a script in the ggml library, which calls conversion functions. There was a small difference in the model, it was created with a 50255 vocab.size, while the original GPT2 is 50257, and I did fix that with padding, added in the script for conversion, when reading that layer.

Something like

@tf.function
def eager_f(symbolic_weight):
  print("PAD????", symbolic_weight.shape[0]*symbolic_weight.shape[1])
  paddings = tf.constant([[0, 2], [0,0]])  #add 2 after dim 0
  symbolic_weight = tf.pad(symbolic_weight, paddings,  "constant", 0) 
  print(symbolic_weight.shape)
  return symbolic_weight

That's in a script modeling_tf_utils.py

def load_tf_weights_from_h5(model, resolved_archive_file, ignore_mismatched_sizes=False, _prefix=None):
 #(...)
 if saved_weight_value is not None:
                        print("saved_weight_value=",saved_weight_value)
                        print(saved_weight_value.shape)
                        # Check if the shape of the current weight and the one from the H5 file are different
                        print("SAVED_WEIGHT")
                        print(saved_weight_value)
                        print(saved_weight_value.shape)
                        if saved_weight_value.shape[0] == 50255:
                           saved_weight_value = eager_f(saved_weight_value)
                           print("AFTER PADDING SAVED_WEIGHT:")
                           print(saved_weight_value)
                           print(saved_weight_value.shape)
                           ss = input("Press a key...")

It goes thorught the reading of the tf model and then it crashes when starting to map it to pytorch with shapes mismatch which is gross, it's [50257,1024] (tf) to [1024,1024]. 

```python
(...)
 K.int_shape(symbolic_weight)= (1024,)
Traceback (most recent call last):
  File "/home/tosh/ggml/examples/gpt-2/convert-h5-to-ggml.py", line 80, in <module>
    model = GPT2Model.from_pretrained(dir_model, low_cpu_mem_usage=True, from_tf=True) #from_tf
  File "/home/tosh/.local/lib/python3.10/site-packages/transformers/modeling_utils.py", line 3469, in from_pretrained
    model, loading_info = load_tf2_checkpoint_in_pytorch_model(
  File "/home/tosh/.local/lib/python3.10/site-packages/transformers/modeling_tf_pytorch_utils.py", line 468, in load_tf2_checkpoint_in_pytorch_model
    return load_tf2_model_in_pytorch_model(
  File "/home/tosh/.local/lib/python3.10/site-packages/transformers/modeling_tf_pytorch_utils.py", line 477, in load_tf2_model_in_pytorch_model
    return load_tf2_weights_in_pytorch_model(
  File "/home/tosh/.local/lib/python3.10/site-packages/transformers/modeling_tf_pytorch_utils.py", line 495, in load_tf2_weights_in_pytorch_model
    return load_tf2_state_dict_in_pytorch_model(
  File "/home/tosh/.local/lib/python3.10/site-packages/transformers/modeling_tf_pytorch_utils.py", line 565, in load_tf2_state_dict_in_pytorch_model
    missing_keys, unexpected_keys = pt_model.load_state_dict(new_pt_params_dict, strict=False)
  File "/home/tosh/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 2041, in load_state_dict
    raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for GPT2Model:
        size mismatch for wpe.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).

.../transformers/modeling_tf_utils.py

def load_tf_weights_from_h5(model, resolved_archive_file, ignore_mismatched_sizes=False, _prefix=None):
    mismatched_layers = []

    # Read the H5 file
    with h5py.File(resolved_archive_file, "r") as sharded_checkpoint_file:
        # Retrieve the name of each layer from the H5 file
        saved_h5_model_layers_name = set(load_attributes_from_hdf5_group(sharded_checkpoint_file, "layer_names"))
 ...

From the forward pass of the read-out of the tf model, the second layer has that shape.

I've mentioned two threads with other issues related to mismatches in gpt2 conversion, but they seem different cases and are older.

Also, from the last error log it seems like the conversion process applies that same [50257,1024] shape/tensor to many others [1024,1024], [1024], [1024,3072], [3072], ... , [1024, 4096], [4096, 1024] ... Either it doesn't advance some pointer from the tf part or it tries, because there's mismatch, I don't know - I haven't studied this part of the code.

<method-wrapper '__repr__' of TFGPT2MainLayer object at 0x7fee7e878460>
SYMBOLIC_WEIGHT:  <tf.Variable 'tfgpt2_model/transformer/wte/embeddings:0' shape=(50255, 1024) dtype=float32, numpy=
array([[ 0.00544963, -0.01376201,  0.00010876, ..., -0.03386341,
         0.00794204,  0.02500119],    
       ...,
         0.01859283,  0.01723549]], dtype=float32)>
(50257, 1024)
SYMBOLIC_WEIGHT:  <tf.Variable 'tfgpt2_model/transformer/wpe/embeddings:0' shape=(1024, 1024) dtype=float32, numpy=
array([[ 0.02799516,  0.02006585, -0.0060562 , ...,  0.00939397,
      ...
         0.00648996, -0.0052477 ]], dtype=float32)>
(1024, 1024)
SYMBOLIC_WEIGHT:  <tf.Variable 'tfgpt2_model/transformer/h_._0/ln_1/gamma:0' shape=(1024,) dtype=float32, numpy=array([1., 1., 1., ..., 1., 1., 1.], dtype=float32)>
(1024,)
SYMBOLIC_WEIGHT:  <tf.Variable 'tfgpt2_model/transformer/h_._0/ln_1/beta:0' shape=(1024,) dtype=float32, numpy=array([0., 0., 0., ..., 0., 0., 0.], dtype=float32)>
(1024,)
SYMBOLIC_WEIGHT:  <tf.Variable 'tfgpt2_model/transformer/h_._0/attn/c_attn/weight:0' shape=(1024, 3072) dtype=float32, 
...
SYMBOLIC_WEIGHT:  <tf.Variable 'tfgpt2_model/transformer/h_._0/attn/c_attn/bias:0' shape=(1, 3072) dtype=float32, numpy=array([[0., 0., 0., ..., 0., 0., 0.]], dtype=float32)>
(1, 3072)
SYMBOLIC_WEIGHT:  <tf.Variable 'tfgpt2_model/transformer/h_._0/attn/c_proj/weight:0' shape=(1024, 1024) dtype=float32, numpy=

(1024,)
K.int_shape(symbolic_weight)= (1024,)
Traceback (most recent call last):
  File "/home/tosh/ggml/examples/gpt-2/convert-h5-to-ggml.py", line 80, in <module>
    model = GPT2Model.from_pretrained(dir_model, low_cpu_mem_usage=True, from_tf=True) #from_tf
  File "/home/tosh/.local/lib/python3.10/site-packages/transformers/modeling_utils.py", line 3469, in from_pretrained
    model, loading_info = load_tf2_checkpoint_in_pytorch_model(
  File "/home/tosh/.local/lib/python3.10/site-packages/transformers/modeling_tf_pytorch_utils.py", line 468, in load_tf2_checkpoint_in_pytorch_model
    return load_tf2_model_in_pytorch_model(
  File "/home/tosh/.local/lib/python3.10/site-packages/transformers/modeling_tf_pytorch_utils.py", line 477, in load_tf2_model_in_pytorch_model
    return load_tf2_weights_in_pytorch_model(
  File "/home/tosh/.local/lib/python3.10/site-packages/transformers/modeling_tf_pytorch_utils.py", line 495, in load_tf2_weights_in_pytorch_model
    return load_tf2_state_dict_in_pytorch_model(
  File "/home/tosh/.local/lib/python3.10/site-packages/transformers/modeling_tf_pytorch_utils.py", line 565, in load_tf2_state_dict_in_pytorch_model
    missing_keys, unexpected_keys = pt_model.load_state_dict(new_pt_params_dict, strict=False)
  File "/home/tosh/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 2041, in load_state_dict
    raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for GPT2Model:
        size mismatch for wpe.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
        size mismatch for h.0.ln_1.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
        size mismatch for h.0.ln_1.bias: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
        size mismatch for h.0.attn.c_attn.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024, 3072]).
        size mismatch for h.0.attn.c_attn.bias: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([3072]).
        size mismatch for h.0.attn.c_proj.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
        size mismatch for h.0.attn.c_proj.bias: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
        size mismatch for h.0.ln_2.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
        size mismatch for h.0.ln_2.bias: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
        size mismatch for h.0.mlp.c_fc.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024, 4096]).
        size mismatch for h.0.mlp.c_fc.bias: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([4096]).
        size mismatch for h.0.mlp.c_proj.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([4096, 1024]).
        size mismatch for h.0.mlp.c_proj.bias: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
        size mismatch for h.1.ln_1.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
        size mismatch for h.1.ln_1.bias: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
        size mismatch for h.1.attn.c_attn.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024, 3072]).
        size mismatch for h.1.attn.c_attn.bias: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([3072]).
        size mismatch for h.1.attn.c_proj.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
        size mismatch for h.1.attn.c_proj.bias: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
        size mismatch for h.1.ln_2.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
        size mismatch for h.1.ln_2.bias: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
        size mismatch for h.1.mlp.c_fc.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024, 4096]).
        size mismatch for h.1.mlp.c_fc.bias: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([4096]).
        size mismatch for h.1.mlp.c_proj.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([4096, 1024]).
        size mismatch for h.1.mlp.c_proj.bias: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
        size mismatch for h.2.ln_1.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
        size mismatch for h.2.ln_1.bias: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
        size mismatch for h.2.attn.c_attn.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024, 3072]).
        size mismatch for h.2.attn.c_attn.bias: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([3072]).
        size mismatch for h.2.attn.c_proj.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
        size mismatch for h.2.attn.c_proj.bias: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
        size mismatch for h.2.ln_2.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
        size mismatch for h.2.ln_2.bias: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
        size mismatch for h.2.mlp.c_fc.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024, 4096]).
        size mismatch for h.2.mlp.c_fc.bias: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([4096]).
        size mismatch for h.2.mlp.c_proj.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([4096, 1024]).
        size mismatch for h.2.mlp.c_proj.bias: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
        size mismatch for h.3.ln_1.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
        size mismatch for h.3.ln_1.bias: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
        size mismatch for h.3.attn.c_attn.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024, 3072]).
        size mismatch for h.3.attn.c_attn.bias: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([3072]).
        size mismatch for h.3.attn.c_proj.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
        size mismatch for h.3.attn.c_proj.bias: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
        size mismatch for h.3.ln_2.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
        size mismatch for h.3.ln_2.bias: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
        size mismatch for h.3.mlp.c_fc.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024, 4096]).
        size mismatch for h.3.mlp.c_fc.bias: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([4096]).
        size mismatch for h.3.mlp.c_proj.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([4096, 1024]).
        size mismatch for h.3.mlp.c_proj.bias: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
        size mismatch for h.4.ln_1.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
        size mismatch for h.4.ln_1.bias: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
        size mismatch for h.4.attn.c_attn.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024, 3072]).
        size mismatch for h.4.attn.c_attn.bias: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([3072]).
        size mismatch for h.4.attn.c_proj.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
        size mismatch for h.4.attn.c_proj.bias: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
        size mismatch for h.4.ln_2.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
        size mismatch for h.4.ln_2.bias: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
        size mismatch for h.4.mlp.c_fc.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024, 4096]).
        size mismatch for h.4.mlp.c_fc.bias: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([4096]).
        size mismatch for h.4.mlp.c_proj.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([4096, 1024]).
        size mismatch for h.4.mlp.c_proj.bias: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
        size mismatch for h.5.ln_1.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
        size mismatch for h.5.ln_1.bias: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
        size mismatch for h.5.attn.c_attn.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024, 3072]).
        size mismatch for h.5.attn.c_attn.bias: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([3072]).
        size mismatch for h.5.attn.c_proj.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
        size mismatch for h.5.attn.c_proj.bias: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
        size mismatch for h.5.ln_2.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
        size mismatch for h.5.ln_2.bias: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
        size mismatch for h.5.mlp.c_fc.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024, 4096]).
        size mismatch for h.5.mlp.c_fc.bias: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([4096]).
        size mismatch for h.5.mlp.c_proj.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([4096, 1024]).
        size mismatch for h.5.mlp.c_proj.bias: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
        size mismatch for h.6.ln_1.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
        size mismatch for h.6.ln_1.bias: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
        size mismatch for h.6.attn.c_attn.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024, 3072]).
        size mismatch for h.6.attn.c_attn.bias: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([3072]).
        size mismatch for h.6.attn.c_proj.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
        size mismatch for h.6.attn.c_proj.bias: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
        size mismatch for h.6.ln_2.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
        size mismatch for h.6.ln_2.bias: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
        size mismatch for h.6.mlp.c_fc.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024, 4096]).
        size mismatch for h.6.mlp.c_fc.bias: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([4096]).
        size mismatch for h.6.mlp.c_proj.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([4096, 1024]).
        size mismatch for h.6.mlp.c_proj.bias: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
        size mismatch for h.7.ln_1.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
        size mismatch for h.7.ln_1.bias: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
        size mismatch for h.7.attn.c_attn.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024, 3072]).
        size mismatch for h.7.attn.c_attn.bias: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([3072]).
        size mismatch for h.7.attn.c_proj.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
        size mismatch for h.7.attn.c_proj.bias: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
        size mismatch for h.7.ln_2.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
        size mismatch for h.7.ln_2.bias: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
        size mismatch for h.7.mlp.c_fc.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024, 4096]).
        size mismatch for h.7.mlp.c_fc.bias: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([4096]).
        size mismatch for h.7.mlp.c_proj.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([4096, 1024]).
        size mismatch for h.7.mlp.c_proj.bias: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
        size mismatch for h.8.ln_1.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
        size mismatch for h.8.ln_1.bias: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
        size mismatch for h.8.attn.c_attn.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024, 3072]).
        size mismatch for h.8.attn.c_attn.bias: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([3072]).
        size mismatch for h.8.attn.c_proj.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
        size mismatch for h.8.attn.c_proj.bias: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
        size mismatch for h.8.ln_2.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
        size mismatch for h.8.ln_2.bias: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
        size mismatch for h.8.mlp.c_fc.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024, 4096]).
        size mismatch for h.8.mlp.c_fc.bias: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([4096]).
        size mismatch for h.8.mlp.c_proj.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([4096, 1024]).
        size mismatch for h.8.mlp.c_proj.bias: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
        size mismatch for h.9.ln_1.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
        size mismatch for h.9.ln_1.bias: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
        size mismatch for h.9.attn.c_attn.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024, 3072]).
        size mismatch for h.9.attn.c_attn.bias: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([3072]).
        size mismatch for h.9.attn.c_proj.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
        size mismatch for h.9.attn.c_proj.bias: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
        size mismatch for h.9.ln_2.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
        size mismatch for h.9.ln_2.bias: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
        size mismatch for h.9.mlp.c_fc.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024, 4096]).
    (...)

Original Q&A

Converting a GPT2 h5 model to torch for conversion to ggml - shape mismatches

There are 0 best solutions below

Related Questions in PYTHON

Related Questions in TENSORFLOW

Related Questions in PYTORCH

Related Questions in GPT-2

Trending Questions

Popular # Hahtags

Popular Questions