If I have a binary classification task, the last layer of a neural network should be:
output= tensorflow.keras.layers.Dense(1,activation='sigmoid')(x)
model.compile(loss='binary_crossentropy')
If the classification problem is multi-class, label is scalar and one out of k discrete value, I usually encode the scalar label into a k-dimensional one-hot representation and modify
output= tensorflow.keras.layers.Dense(k,activation='softmax')(x)
model.compile(loss='categorical_crossentropy')
If I have a binary classification task with multiple k binary targets, I would modify as:
output= tensorflow.keras.layers.Dense(k,activation='sigmoid')(x)
model.compile(loss='binary_crossentropy')
However, I still can not figure out what is the correct network design when I have multiple outputs and multiple categories per output. e.g. [[0,1],[7,2],..]