I am trying to understand the TensorFlow
text classification example at https://www.tensorflow.org/tutorials/keras/text_classification. They define the model as follows:
model = tf.keras.Sequential([
layers.Embedding(max_features + 1, embedding_dim),
layers.Dropout(0.2),
layers.GlobalAveragePooling1D(),
layers.Dropout(0.2),
layers.Dense(1)])
To the best of my knowledge, deep learning models use an activation function and I wonder what activation function the above classification model uses internally. Can anyone help me understand that?
As you read, the model definition is written something like this
And the data set used in that tutorials is a binary classification
zero
andone
. By not defining any activation to the last layer of the model, the original author wants to get thelogits
rather than probability. And that why they used theloss
function asNow, if we set the last layer activation as
sigmoid
(which usually pick for binary classification), then we must setfrom_logits=False
. So, here are two option to chose from:with logit: True
We take the
logit
from the last layer and that why we setfrom_logits=True
.without logit: False
And here we take the
probability
from the last layer and that why we setfrom_logits=False
.Now, you may wonder, why this tutorial uses
logit
(or no activation to the last layer)? The short answer is, it generally doesn't matter, we can choose any option. The thing is, there is a chance of numerical instability in the case of usingfrom_logits=False
. Check this answer for more details.