I am learning StyleGAN architecture and I got confused about the purpose of the Mapping Network. In the original paper it says:
Our mapping network consists of 8 fully-connected layers, and the dimensionality of all input and output activations— including z and w — is 512.
And there is no information about this network being trained in any way.
Like, wouldn’t it just generate some nonsense values?
I've tried creating a network like that (but with a smaller shape (16,)):
import tensorflow as tf
import numpy as np
model = tf.keras.models.Sequential()
model.add(tf.keras.Input(shape=(16)))
for i in range(7):
model.add(tf.keras.layers.Dense(16, activation='relu'))
model.compile()
and then evaluated it on some random values:
g = tf.random.Generator.from_seed(34)
model(
g.normal(shape=(16, 16))
)
And I am getting some random outputs like:
array([[0. , 0.01045225, 0. , 0. , 0.02217731,
0.00940356, 0.02321716, 0.00556996, 0. , 0. ,
0. , 0.03117323, 0. , 0. , 0.00734158,
0. ],
[0.03159791, 0.05680077, 0. , 0. , 0. ,
0. , 0.05907414, 0. , 0. , 0. ,
0. , 0. , 0.03110216, 0.04647615, 0. ,
0.04566741],
.
. # More similar vectors goes there
.
[0. , 0.01229661, 0.00056016, 0. , 0.03534952,
0.02654905, 0.03212402, 0. , 0. , 0. ,
0. , 0.0913604 , 0. , 0. , 0. ,
0. ]], dtype=float32)>
What am I missing? Is there any information on the Internet about training Mapping Network? Any math explanation? Got really confused :(



As I understand the mapping network is not trained separately. It it part of generator network and adjusts weights based on gradients just like other parts of the network.
In their stylegan generator code implementation it written the Generator is composed of two sub networks one mapping and another synthesis. In stylegan3 generator source it is much easier to see. The output of mapping is passed to synthesis network which generates image.
The diagram below shows mapping network from stylegan 2019 paper. Section 2 describes about mapping network.
Generator Diagram with Mapping Layer
Mapping layer is represented with
fin the paper that takes noise vectorzinitialized from normal distribution and maps to intermediate latent representationw. It is implemented with 8 layer MLP. Stylegan mapping network implementation has MLP layers set to 8.In section 4 they mention,
So,
zandwhave same dimensions butwis more disentangled thanz. Finding awfrom intermediate latent spaceWfor an image allows specific image editing.From Encoder for Editing paper,
In stylegan2-ada paper with other changes they found mapping network depth of 2 better than 8. In stylegan3 mapping layer code implementation default number of layers in mapping is set to 2.
References