I have trained a model using Caffe and NVIDIA's DIGITS. Testing it on DIGITS for the following images results in the following:
When I download the model from DIGITS I get a snapshot_iter_24240.caffemodel
along with deploy.prototxt
, mean.binaryproto
and labels.txt
. (and solver.prototxt
and train_val.prototxt
which I think is not relevant)
I use coremltools
to convert the caffemodel to mlmodel running the following:
import coremltools
# Convert a caffe model to a classifier in Core ML
coreml_model = coremltools.converters.caffe.convert(('snapshot_iter_24240.caffemodel',
'deploy.prototxt',
'mean.binaryproto'),
image_input_names = 'data',
class_labels = 'labels.txt')
# Now save the model
coreml_model.save('food.mlmodel')
The code outputs the following:
(/anaconda/envs/coreml) bash-3.2$ python run.py
================= Starting Conversion from Caffe to CoreML ======================
Layer 0: Type: 'Input', Name: 'input'. Output(s): 'data'.
Ignoring batch size and retaining only the trailing 3 dimensions for conversion.
Layer 1: Type: 'Convolution', Name: 'conv1'. Input(s): 'data'. Output(s): 'conv1'.
Layer 2: Type: 'ReLU', Name: 'relu1'. Input(s): 'conv1'. Output(s): 'conv1'.
Layer 3: Type: 'LRN', Name: 'norm1'. Input(s): 'conv1'. Output(s): 'norm1'.
Layer 4: Type: 'Pooling', Name: 'pool1'. Input(s): 'norm1'. Output(s): 'pool1'.
Layer 5: Type: 'Convolution', Name: 'conv2'. Input(s): 'pool1'. Output(s): 'conv2'.
Layer 6: Type: 'ReLU', Name: 'relu2'. Input(s): 'conv2'. Output(s): 'conv2'.
Layer 7: Type: 'LRN', Name: 'norm2'. Input(s): 'conv2'. Output(s): 'norm2'.
Layer 8: Type: 'Pooling', Name: 'pool2'. Input(s): 'norm2'. Output(s): 'pool2'.
Layer 9: Type: 'Convolution', Name: 'conv3'. Input(s): 'pool2'. Output(s): 'conv3'.
Layer 10: Type: 'ReLU', Name: 'relu3'. Input(s): 'conv3'. Output(s): 'conv3'.
Layer 11: Type: 'Convolution', Name: 'conv4'. Input(s): 'conv3'. Output(s): 'conv4'.
Layer 12: Type: 'ReLU', Name: 'relu4'. Input(s): 'conv4'. Output(s): 'conv4'.
Layer 13: Type: 'Convolution', Name: 'conv5'. Input(s): 'conv4'. Output(s): 'conv5'.
Layer 14: Type: 'ReLU', Name: 'relu5'. Input(s): 'conv5'. Output(s): 'conv5'.
Layer 15: Type: 'Pooling', Name: 'pool5'. Input(s): 'conv5'. Output(s): 'pool5'.
Layer 16: Type: 'InnerProduct', Name: 'fc6'. Input(s): 'pool5'. Output(s): 'fc6'.
Layer 17: Type: 'ReLU', Name: 'relu6'. Input(s): 'fc6'. Output(s): 'fc6'.
Layer 18: Type: 'Dropout', Name: 'drop6'. Input(s): 'fc6'. Output(s): 'fc6'.
WARNING: Skipping training related layer 'drop6' of type 'Dropout'.
Layer 19: Type: 'InnerProduct', Name: 'fc7'. Input(s): 'fc6'. Output(s): 'fc7'.
Layer 20: Type: 'ReLU', Name: 'relu7'. Input(s): 'fc7'. Output(s): 'fc7'.
Layer 21: Type: 'Dropout', Name: 'drop7'. Input(s): 'fc7'. Output(s): 'fc7'.
WARNING: Skipping training related layer 'drop7' of type 'Dropout'.
Layer 22: Type: 'InnerProduct', Name: 'fc8_food'. Input(s): 'fc7'. Output(s): 'fc8_food'.
Layer 23: Type: 'Softmax', Name: 'prob'. Input(s): 'fc8_food'. Output(s): 'prob'.
================= Summary of the conversion: ===================================
Detected input(s) and shape(s) (ignoring batch size):
'data' : 3, 227, 227
Size of mean image: (H,W) = (256, 256) is greater than input image size: (H,W) = (227, 227). Mean image will be center cropped to match the input image dimensions.
Network Input name(s): 'data'.
Network Output name(s): 'prob'.
(/anaconda/envs/coreml) bash-3.2$
After about 45 seconds food.mlmodel
is generated. I import it into an iOS project using Xcode Version 9.0 beta 3 (9M174d) and run the following the code in a single view iOS project.
//
// ViewController.swift
// SeeFood
//
// Created by Reza Shirazian on 7/23/17.
// Copyright © 2017 Reza Shirazian. All rights reserved.
//
import UIKit
import CoreML
import Vision
class ViewController: UIViewController {
override func viewDidLoad() {
super.viewDidLoad()
var images = [CIImage]()
// guard let ciImage = CIImage(image: #imageLiteral(resourceName: "pizza")) else {
// fatalError("couldn't convert UIImage to CIImage")
// }
images.append(CIImage(image: #imageLiteral(resourceName: "pizza"))!)
images.append(CIImage(image: #imageLiteral(resourceName: "spaghetti"))!)
images.append(CIImage(image: #imageLiteral(resourceName: "burger"))!)
images.append(CIImage(image: #imageLiteral(resourceName: "sushi"))!)
images.forEach{detectScene(image: $0)}
// Do any additional setup after loading the view, typically from a nib.
}
override func didReceiveMemoryWarning() {
super.didReceiveMemoryWarning()
// Dispose of any resources that can be recreated.
}
func detectScene(image: CIImage) {
guard let model = try? VNCoreMLModel(for: food().model) else {
fatalError()
}
// Create a Vision request with completion handler
let request = VNCoreMLRequest(model: model) { [weak self] request, error in
guard let results = request.results as? [VNClassificationObservation],
let topResult = results.first else {
fatalError("unexpected result type from VNCoreMLRequest")
}
// Update UI on main queue
//let article = (self?.vowels.contains(topResult.identifier.first!))! ? "an" : "a"
DispatchQueue.main.async { [weak self] in
results.forEach({ (result) in
if Int(result.confidence * 100) > 1 {
print("\(Int(result.confidence * 100))% it's \(result.identifier)")
}
})
print("********************************")
}
}
let handler = VNImageRequestHandler(ciImage: image)
DispatchQueue.global(qos: .userInteractive).async {
do {
try handler.perform([request])
} catch {
print(error)
}
}
}
}
which outputs the following:
22% it's cup cakes
8% it's ice cream
5% it's falafel
5% it's macarons
3% it's churros
3% it's gyoza
3% it's donuts
2% it's tacos
2% it's cannoli
********************************
35% it's cup cakes
22% it's frozen yogurt
8% it's chocolate cake
7% it's chocolate mousse
6% it's ice cream
2% it's donuts
********************************
38% it's gyoza
7% it's falafel
6% it's tacos
4% it's hamburger
3% it's oysters
2% it's peking duck
2% it's hot dog
2% it's baby back ribs
2% it's cannoli
********************************
7% it's hamburger
6% it's pork chop
6% it's steak
6% it's peking duck
5% it's pho
5% it's prime rib
5% it's baby back ribs
4% it's mussels
4% it's grilled salmon
2% it's filet mignon
2% it's foie gras
2% it's pulled pork sandwich
********************************
this is completely off and inconsistent with how the model was performing on DIGITS. I'm not sure what I'm doing wrong or if I've missed a step. I tried creating the model without mean.binaryproto
but that made no difference.
If it helps here is the deploy.prototxt
input: "data"
input_shape {
dim: 1
dim: 3
dim: 227
dim: 227
}
layer {
name: "conv1"
type: "Convolution"
bottom: "data"
top: "conv1"
param {
lr_mult: 1.0
decay_mult: 1.0
}
param {
lr_mult: 2.0
decay_mult: 0.0
}
convolution_param {
num_output: 96
kernel_size: 11
stride: 4
weight_filler {
type: "gaussian"
std: 0.01
}
bias_filler {
type: "constant"
value: 0.0
}
}
}
layer {
name: "relu1"
type: "ReLU"
bottom: "conv1"
top: "conv1"
}
layer {
name: "norm1"
type: "LRN"
bottom: "conv1"
top: "norm1"
lrn_param {
local_size: 5
alpha: 0.0001
beta: 0.75
}
}
layer {
name: "pool1"
type: "Pooling"
bottom: "norm1"
top: "pool1"
pooling_param {
pool: MAX
kernel_size: 3
stride: 2
}
}
layer {
name: "conv2"
type: "Convolution"
bottom: "pool1"
top: "conv2"
param {
lr_mult: 1.0
decay_mult: 1.0
}
param {
lr_mult: 2.0
decay_mult: 0.0
}
convolution_param {
num_output: 256
pad: 2
kernel_size: 5
group: 2
weight_filler {
type: "gaussian"
std: 0.01
}
bias_filler {
type: "constant"
value: 0.1
}
}
}
layer {
name: "relu2"
type: "ReLU"
bottom: "conv2"
top: "conv2"
}
layer {
name: "norm2"
type: "LRN"
bottom: "conv2"
top: "norm2"
lrn_param {
local_size: 5
alpha: 0.0001
beta: 0.75
}
}
layer {
name: "pool2"
type: "Pooling"
bottom: "norm2"
top: "pool2"
pooling_param {
pool: MAX
kernel_size: 3
stride: 2
}
}
layer {
name: "conv3"
type: "Convolution"
bottom: "pool2"
top: "conv3"
param {
lr_mult: 1.0
decay_mult: 1.0
}
param {
lr_mult: 2.0
decay_mult: 0.0
}
convolution_param {
num_output: 384
pad: 1
kernel_size: 3
weight_filler {
type: "gaussian"
std: 0.01
}
bias_filler {
type: "constant"
value: 0.0
}
}
}
layer {
name: "relu3"
type: "ReLU"
bottom: "conv3"
top: "conv3"
}
layer {
name: "conv4"
type: "Convolution"
bottom: "conv3"
top: "conv4"
param {
lr_mult: 1.0
decay_mult: 1.0
}
param {
lr_mult: 2.0
decay_mult: 0.0
}
convolution_param {
num_output: 384
pad: 1
kernel_size: 3
group: 2
weight_filler {
type: "gaussian"
std: 0.01
}
bias_filler {
type: "constant"
value: 0.1
}
}
}
layer {
name: "relu4"
type: "ReLU"
bottom: "conv4"
top: "conv4"
}
layer {
name: "conv5"
type: "Convolution"
bottom: "conv4"
top: "conv5"
param {
lr_mult: 1.0
decay_mult: 1.0
}
param {
lr_mult: 2.0
decay_mult: 0.0
}
convolution_param {
num_output: 256
pad: 1
kernel_size: 3
group: 2
weight_filler {
type: "gaussian"
std: 0.01
}
bias_filler {
type: "constant"
value: 0.1
}
}
}
layer {
name: "relu5"
type: "ReLU"
bottom: "conv5"
top: "conv5"
}
layer {
name: "pool5"
type: "Pooling"
bottom: "conv5"
top: "pool5"
pooling_param {
pool: MAX
kernel_size: 3
stride: 2
}
}
layer {
name: "fc6"
type: "InnerProduct"
bottom: "pool5"
top: "fc6"
param {
lr_mult: 1.0
decay_mult: 1.0
}
param {
lr_mult: 2.0
decay_mult: 0.0
}
inner_product_param {
num_output: 4096
weight_filler {
type: "gaussian"
std: 0.005
}
bias_filler {
type: "constant"
value: 0.1
}
}
}
layer {
name: "relu6"
type: "ReLU"
bottom: "fc6"
top: "fc6"
}
layer {
name: "drop6"
type: "Dropout"
bottom: "fc6"
top: "fc6"
dropout_param {
dropout_ratio: 0.5
}
}
layer {
name: "fc7"
type: "InnerProduct"
bottom: "fc6"
top: "fc7"
param {
lr_mult: 1.0
decay_mult: 1.0
}
param {
lr_mult: 2.0
decay_mult: 0.0
}
inner_product_param {
num_output: 4096
weight_filler {
type: "gaussian"
std: 0.005
}
bias_filler {
type: "constant"
value: 0.1
}
}
}
layer {
name: "relu7"
type: "ReLU"
bottom: "fc7"
top: "fc7"
}
layer {
name: "drop7"
type: "Dropout"
bottom: "fc7"
top: "fc7"
dropout_param {
dropout_ratio: 0.5
}
}
layer {
name: "fc8_food"
type: "InnerProduct"
bottom: "fc7"
top: "fc8_food"
param {
lr_mult: 1.0
decay_mult: 1.0
}
param {
lr_mult: 2.0
decay_mult: 0.0
}
inner_product_param {
num_output: 101
weight_filler {
type: "gaussian"
std: 0.01
}
bias_filler {
type: "constant"
value: 0.0
}
}
}
layer {
name: "prob"
type: "Softmax"
bottom: "fc8_food"
top: "prob"
}
The discrepancy between the predictions on DIGITS using CaffeModel and CoreML was due to the fact that CoreML was interpreting the input data differently that DIGITS. Changing the call to
convert
with the following parameters resolved the issuehttp://pythonhosted.org/coremltools/generated/coremltools.converters.caffe.convert.html#coremltools.converters.caffe.convert