Float16 mixed precision being slower than regular float32, keras, tensorflow 2.0

190 Views Asked by At

I am using Tensorflow 2.10 in windows with a NVIDIA RTX 2060 SUPER (with tensor cores) for deep learning. But when enabling mixed precision of float16 the time per epoch actually becomes slower than faster.

Code:

import tensorflow as tf
import ssl

ssl._create_default_https_context = ssl._create_unverified_context

(train_x, train_y), (test_x, test_y) = tf.keras.datasets.cifar100.load_data()

tf.keras.mixed_precision.set_global_policy("mixed_float16")

model = tf.keras.Sequential([
    
    tf.keras.layers.Lambda(lambda x : x / 255, input_shape=(32,32,3)),
    tf.keras.layers.Conv2D(filters=64, kernel_size=(4,4)),
    tf.keras.layers.MaxPool2D(),
    tf.keras.layers.Conv2D(filters=32, kernel_size=(2,2)),
    tf.keras.layers.MaxPool2D(),
    tf.keras.layers.Flatten(),
    tf.keras.layers.Dense(4096, activation="relu"),
    tf.keras.layers.Dense(4096, activation="relu"),
    tf.keras.layers.Dense(4096, activation="relu"),
    tf.keras.layers.Dense(4096, activation="relu"),
    tf.keras.layers.Dropout(0.2),
    tf.keras.layers.Dense(100),
    tf.keras.layers.Activation("softmax", dtype="float32")
    ])

model.compile(optimizer="adam", loss=tf.keras.losses.SparseCategoricalCrossentropy(), metrics=["accuracy"])

print("compute dtype of first layer: ", model.layers[0].compute_dtype)

model.fit(train_x, train_y, epochs=100, batch_size=1020)

model.evaluate(test_x, test_y)

I put some images of the problem: Here's an image of without using mixed precision , And here's an image using mixed precision, more slow

Running the code in Google Colab that uses a more modern version of tensorflow (TF 2.15) does work well, and is faster with mixed precision than without it (as it should be). Here's the link to the colab: Google Colab

I'm not an expert using tensorflow and I have been trying to fix this error for weeks, some help would be appreciated. Thanks!

Other Information:

I'm using cuDNN version 8.1.1 and Cuda 11.2, that are technically the compatible versions.

2

There are 2 best solutions below

2
On BEST ANSWER

The solution I found is switch to Ubuntu (Linux) and update to the newer Tensorflow 2.15.

In this version mixed precision (float16) is twice of fast compared to the classic float32.

I also upgrade from Cuda 11.2 to 12.2 and from Cudnn 8.1.1 to 8.9

3
On

I am not exactly sure what is the essence of the problem here. By definition, mixed precision accelerates the learning process by using both 16-bit and 32-bit floating-point types. This is because matrix multiplications and convolutions are quite computationally expensive in 32-bits. However, performing the same calculations in 16-bits heavily reduces the bandwidth required to process the aforementioned computations. Subsequently, this entails more data being processed per epoch. Thus, less time.

Theoretically, you ought to be able to achieve the same accuracy by training the model in mixed precisions. This is because the final loss calculations are calculated in 32-bits. Subsequently, this preserves the model's ability to learn effectively.