I want to convert a .h5 gpt2-medium model to ggml. More details: https://github.com/ggerganov/ggml/issues/745
The process goes with a script in the ggml library, which calls conversion functions. There was a small difference in the model, it was created with a 50255 vocab.size, while the original GPT2 is 50257, and I did fix that with padding, added in the script for conversion, when reading that layer.
Something like
@tf.function
def eager_f(symbolic_weight):
print("PAD????", symbolic_weight.shape[0]*symbolic_weight.shape[1])
paddings = tf.constant([[0, 2], [0,0]]) #add 2 after dim 0
symbolic_weight = tf.pad(symbolic_weight, paddings, "constant", 0)
print(symbolic_weight.shape)
return symbolic_weight
That's in a script modeling_tf_utils.py
def load_tf_weights_from_h5(model, resolved_archive_file, ignore_mismatched_sizes=False, _prefix=None):
#(...)
if saved_weight_value is not None:
print("saved_weight_value=",saved_weight_value)
print(saved_weight_value.shape)
# Check if the shape of the current weight and the one from the H5 file are different
print("SAVED_WEIGHT")
print(saved_weight_value)
print(saved_weight_value.shape)
if saved_weight_value.shape[0] == 50255:
saved_weight_value = eager_f(saved_weight_value)
print("AFTER PADDING SAVED_WEIGHT:")
print(saved_weight_value)
print(saved_weight_value.shape)
ss = input("Press a key...")
It goes thorught the reading of the tf model and then it crashes when starting to map it to pytorch with shapes mismatch which is gross, it's [50257,1024] (tf) to [1024,1024].
```python
(...)
K.int_shape(symbolic_weight)= (1024,)
Traceback (most recent call last):
File "/home/tosh/ggml/examples/gpt-2/convert-h5-to-ggml.py", line 80, in <module>
model = GPT2Model.from_pretrained(dir_model, low_cpu_mem_usage=True, from_tf=True) #from_tf
File "/home/tosh/.local/lib/python3.10/site-packages/transformers/modeling_utils.py", line 3469, in from_pretrained
model, loading_info = load_tf2_checkpoint_in_pytorch_model(
File "/home/tosh/.local/lib/python3.10/site-packages/transformers/modeling_tf_pytorch_utils.py", line 468, in load_tf2_checkpoint_in_pytorch_model
return load_tf2_model_in_pytorch_model(
File "/home/tosh/.local/lib/python3.10/site-packages/transformers/modeling_tf_pytorch_utils.py", line 477, in load_tf2_model_in_pytorch_model
return load_tf2_weights_in_pytorch_model(
File "/home/tosh/.local/lib/python3.10/site-packages/transformers/modeling_tf_pytorch_utils.py", line 495, in load_tf2_weights_in_pytorch_model
return load_tf2_state_dict_in_pytorch_model(
File "/home/tosh/.local/lib/python3.10/site-packages/transformers/modeling_tf_pytorch_utils.py", line 565, in load_tf2_state_dict_in_pytorch_model
missing_keys, unexpected_keys = pt_model.load_state_dict(new_pt_params_dict, strict=False)
File "/home/tosh/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 2041, in load_state_dict
raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for GPT2Model:
size mismatch for wpe.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
.../transformers/modeling_tf_utils.py
def load_tf_weights_from_h5(model, resolved_archive_file, ignore_mismatched_sizes=False, _prefix=None):
mismatched_layers = []
# Read the H5 file
with h5py.File(resolved_archive_file, "r") as sharded_checkpoint_file:
# Retrieve the name of each layer from the H5 file
saved_h5_model_layers_name = set(load_attributes_from_hdf5_group(sharded_checkpoint_file, "layer_names"))
...
From the forward pass of the read-out of the tf model, the second layer has that shape.
I've mentioned two threads with other issues related to mismatches in gpt2 conversion, but they seem different cases and are older.
Also, from the last error log it seems like the conversion process applies that same [50257,1024] shape/tensor to many others [1024,1024], [1024], [1024,3072], [3072], ... , [1024, 4096], [4096, 1024] ... Either it doesn't advance some pointer from the tf part or it tries, because there's mismatch, I don't know - I haven't studied this part of the code.
<method-wrapper '__repr__' of TFGPT2MainLayer object at 0x7fee7e878460>
SYMBOLIC_WEIGHT: <tf.Variable 'tfgpt2_model/transformer/wte/embeddings:0' shape=(50255, 1024) dtype=float32, numpy=
array([[ 0.00544963, -0.01376201, 0.00010876, ..., -0.03386341,
0.00794204, 0.02500119],
...,
0.01859283, 0.01723549]], dtype=float32)>
(50257, 1024)
SYMBOLIC_WEIGHT: <tf.Variable 'tfgpt2_model/transformer/wpe/embeddings:0' shape=(1024, 1024) dtype=float32, numpy=
array([[ 0.02799516, 0.02006585, -0.0060562 , ..., 0.00939397,
...
0.00648996, -0.0052477 ]], dtype=float32)>
(1024, 1024)
SYMBOLIC_WEIGHT: <tf.Variable 'tfgpt2_model/transformer/h_._0/ln_1/gamma:0' shape=(1024,) dtype=float32, numpy=array([1., 1., 1., ..., 1., 1., 1.], dtype=float32)>
(1024,)
SYMBOLIC_WEIGHT: <tf.Variable 'tfgpt2_model/transformer/h_._0/ln_1/beta:0' shape=(1024,) dtype=float32, numpy=array([0., 0., 0., ..., 0., 0., 0.], dtype=float32)>
(1024,)
SYMBOLIC_WEIGHT: <tf.Variable 'tfgpt2_model/transformer/h_._0/attn/c_attn/weight:0' shape=(1024, 3072) dtype=float32,
...
SYMBOLIC_WEIGHT: <tf.Variable 'tfgpt2_model/transformer/h_._0/attn/c_attn/bias:0' shape=(1, 3072) dtype=float32, numpy=array([[0., 0., 0., ..., 0., 0., 0.]], dtype=float32)>
(1, 3072)
SYMBOLIC_WEIGHT: <tf.Variable 'tfgpt2_model/transformer/h_._0/attn/c_proj/weight:0' shape=(1024, 1024) dtype=float32, numpy=
(1024,)
K.int_shape(symbolic_weight)= (1024,)
Traceback (most recent call last):
File "/home/tosh/ggml/examples/gpt-2/convert-h5-to-ggml.py", line 80, in <module>
model = GPT2Model.from_pretrained(dir_model, low_cpu_mem_usage=True, from_tf=True) #from_tf
File "/home/tosh/.local/lib/python3.10/site-packages/transformers/modeling_utils.py", line 3469, in from_pretrained
model, loading_info = load_tf2_checkpoint_in_pytorch_model(
File "/home/tosh/.local/lib/python3.10/site-packages/transformers/modeling_tf_pytorch_utils.py", line 468, in load_tf2_checkpoint_in_pytorch_model
return load_tf2_model_in_pytorch_model(
File "/home/tosh/.local/lib/python3.10/site-packages/transformers/modeling_tf_pytorch_utils.py", line 477, in load_tf2_model_in_pytorch_model
return load_tf2_weights_in_pytorch_model(
File "/home/tosh/.local/lib/python3.10/site-packages/transformers/modeling_tf_pytorch_utils.py", line 495, in load_tf2_weights_in_pytorch_model
return load_tf2_state_dict_in_pytorch_model(
File "/home/tosh/.local/lib/python3.10/site-packages/transformers/modeling_tf_pytorch_utils.py", line 565, in load_tf2_state_dict_in_pytorch_model
missing_keys, unexpected_keys = pt_model.load_state_dict(new_pt_params_dict, strict=False)
File "/home/tosh/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 2041, in load_state_dict
raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for GPT2Model:
size mismatch for wpe.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
size mismatch for h.0.ln_1.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for h.0.ln_1.bias: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for h.0.attn.c_attn.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024, 3072]).
size mismatch for h.0.attn.c_attn.bias: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([3072]).
size mismatch for h.0.attn.c_proj.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
size mismatch for h.0.attn.c_proj.bias: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for h.0.ln_2.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for h.0.ln_2.bias: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for h.0.mlp.c_fc.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024, 4096]).
size mismatch for h.0.mlp.c_fc.bias: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([4096]).
size mismatch for h.0.mlp.c_proj.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([4096, 1024]).
size mismatch for h.0.mlp.c_proj.bias: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for h.1.ln_1.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for h.1.ln_1.bias: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for h.1.attn.c_attn.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024, 3072]).
size mismatch for h.1.attn.c_attn.bias: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([3072]).
size mismatch for h.1.attn.c_proj.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
size mismatch for h.1.attn.c_proj.bias: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for h.1.ln_2.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for h.1.ln_2.bias: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for h.1.mlp.c_fc.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024, 4096]).
size mismatch for h.1.mlp.c_fc.bias: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([4096]).
size mismatch for h.1.mlp.c_proj.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([4096, 1024]).
size mismatch for h.1.mlp.c_proj.bias: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for h.2.ln_1.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for h.2.ln_1.bias: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for h.2.attn.c_attn.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024, 3072]).
size mismatch for h.2.attn.c_attn.bias: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([3072]).
size mismatch for h.2.attn.c_proj.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
size mismatch for h.2.attn.c_proj.bias: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for h.2.ln_2.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for h.2.ln_2.bias: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for h.2.mlp.c_fc.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024, 4096]).
size mismatch for h.2.mlp.c_fc.bias: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([4096]).
size mismatch for h.2.mlp.c_proj.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([4096, 1024]).
size mismatch for h.2.mlp.c_proj.bias: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for h.3.ln_1.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for h.3.ln_1.bias: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for h.3.attn.c_attn.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024, 3072]).
size mismatch for h.3.attn.c_attn.bias: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([3072]).
size mismatch for h.3.attn.c_proj.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
size mismatch for h.3.attn.c_proj.bias: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for h.3.ln_2.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for h.3.ln_2.bias: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for h.3.mlp.c_fc.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024, 4096]).
size mismatch for h.3.mlp.c_fc.bias: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([4096]).
size mismatch for h.3.mlp.c_proj.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([4096, 1024]).
size mismatch for h.3.mlp.c_proj.bias: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for h.4.ln_1.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for h.4.ln_1.bias: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for h.4.attn.c_attn.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024, 3072]).
size mismatch for h.4.attn.c_attn.bias: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([3072]).
size mismatch for h.4.attn.c_proj.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
size mismatch for h.4.attn.c_proj.bias: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for h.4.ln_2.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for h.4.ln_2.bias: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for h.4.mlp.c_fc.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024, 4096]).
size mismatch for h.4.mlp.c_fc.bias: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([4096]).
size mismatch for h.4.mlp.c_proj.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([4096, 1024]).
size mismatch for h.4.mlp.c_proj.bias: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for h.5.ln_1.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for h.5.ln_1.bias: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for h.5.attn.c_attn.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024, 3072]).
size mismatch for h.5.attn.c_attn.bias: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([3072]).
size mismatch for h.5.attn.c_proj.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
size mismatch for h.5.attn.c_proj.bias: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for h.5.ln_2.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for h.5.ln_2.bias: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for h.5.mlp.c_fc.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024, 4096]).
size mismatch for h.5.mlp.c_fc.bias: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([4096]).
size mismatch for h.5.mlp.c_proj.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([4096, 1024]).
size mismatch for h.5.mlp.c_proj.bias: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for h.6.ln_1.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for h.6.ln_1.bias: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for h.6.attn.c_attn.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024, 3072]).
size mismatch for h.6.attn.c_attn.bias: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([3072]).
size mismatch for h.6.attn.c_proj.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
size mismatch for h.6.attn.c_proj.bias: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for h.6.ln_2.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for h.6.ln_2.bias: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for h.6.mlp.c_fc.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024, 4096]).
size mismatch for h.6.mlp.c_fc.bias: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([4096]).
size mismatch for h.6.mlp.c_proj.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([4096, 1024]).
size mismatch for h.6.mlp.c_proj.bias: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for h.7.ln_1.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for h.7.ln_1.bias: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for h.7.attn.c_attn.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024, 3072]).
size mismatch for h.7.attn.c_attn.bias: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([3072]).
size mismatch for h.7.attn.c_proj.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
size mismatch for h.7.attn.c_proj.bias: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for h.7.ln_2.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for h.7.ln_2.bias: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for h.7.mlp.c_fc.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024, 4096]).
size mismatch for h.7.mlp.c_fc.bias: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([4096]).
size mismatch for h.7.mlp.c_proj.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([4096, 1024]).
size mismatch for h.7.mlp.c_proj.bias: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for h.8.ln_1.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for h.8.ln_1.bias: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for h.8.attn.c_attn.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024, 3072]).
size mismatch for h.8.attn.c_attn.bias: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([3072]).
size mismatch for h.8.attn.c_proj.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
size mismatch for h.8.attn.c_proj.bias: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for h.8.ln_2.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for h.8.ln_2.bias: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for h.8.mlp.c_fc.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024, 4096]).
size mismatch for h.8.mlp.c_fc.bias: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([4096]).
size mismatch for h.8.mlp.c_proj.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([4096, 1024]).
size mismatch for h.8.mlp.c_proj.bias: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for h.9.ln_1.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for h.9.ln_1.bias: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for h.9.attn.c_attn.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024, 3072]).
size mismatch for h.9.attn.c_attn.bias: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([3072]).
size mismatch for h.9.attn.c_proj.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
size mismatch for h.9.attn.c_proj.bias: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for h.9.ln_2.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for h.9.ln_2.bias: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for h.9.mlp.c_fc.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024, 4096]).
(...)