Converting a caffe model to CoreML using coremltools results in inconsistent perdictions

1k Views Asked by At

I have trained a model using Caffe and NVIDIA's DIGITS. Testing it on DIGITS for the following images results in the following: enter image description here enter image description here enter image description here enter image description here

When I download the model from DIGITS I get a snapshot_iter_24240.caffemodel along with deploy.prototxt, mean.binaryproto and labels.txt. (and solver.prototxt and train_val.prototxt which I think is not relevant)

I use coremltools to convert the caffemodel to mlmodel running the following: import coremltools

# Convert a caffe model to a classifier in Core ML
coreml_model = coremltools.converters.caffe.convert(('snapshot_iter_24240.caffemodel',
                                                     'deploy.prototxt',
                                                     'mean.binaryproto'),
                                                      image_input_names = 'data',
                                                      class_labels = 'labels.txt')

# Now save the model

coreml_model.save('food.mlmodel')

The code outputs the following:

(/anaconda/envs/coreml) bash-3.2$ python run.py 

================= Starting Conversion from Caffe to CoreML ======================
Layer 0: Type: 'Input', Name: 'input'. Output(s): 'data'.
Ignoring batch size and retaining only the trailing 3 dimensions for conversion. 
Layer 1: Type: 'Convolution', Name: 'conv1'. Input(s): 'data'. Output(s): 'conv1'.
Layer 2: Type: 'ReLU', Name: 'relu1'. Input(s): 'conv1'. Output(s): 'conv1'.
Layer 3: Type: 'LRN', Name: 'norm1'. Input(s): 'conv1'. Output(s): 'norm1'.
Layer 4: Type: 'Pooling', Name: 'pool1'. Input(s): 'norm1'. Output(s): 'pool1'.
Layer 5: Type: 'Convolution', Name: 'conv2'. Input(s): 'pool1'. Output(s): 'conv2'.
Layer 6: Type: 'ReLU', Name: 'relu2'. Input(s): 'conv2'. Output(s): 'conv2'.
Layer 7: Type: 'LRN', Name: 'norm2'. Input(s): 'conv2'. Output(s): 'norm2'.
Layer 8: Type: 'Pooling', Name: 'pool2'. Input(s): 'norm2'. Output(s): 'pool2'.
Layer 9: Type: 'Convolution', Name: 'conv3'. Input(s): 'pool2'. Output(s): 'conv3'.
Layer 10: Type: 'ReLU', Name: 'relu3'. Input(s): 'conv3'. Output(s): 'conv3'.
Layer 11: Type: 'Convolution', Name: 'conv4'. Input(s): 'conv3'. Output(s): 'conv4'.
Layer 12: Type: 'ReLU', Name: 'relu4'. Input(s): 'conv4'. Output(s): 'conv4'.
Layer 13: Type: 'Convolution', Name: 'conv5'. Input(s): 'conv4'. Output(s): 'conv5'.
Layer 14: Type: 'ReLU', Name: 'relu5'. Input(s): 'conv5'. Output(s): 'conv5'.
Layer 15: Type: 'Pooling', Name: 'pool5'. Input(s): 'conv5'. Output(s): 'pool5'.
Layer 16: Type: 'InnerProduct', Name: 'fc6'. Input(s): 'pool5'. Output(s): 'fc6'.
Layer 17: Type: 'ReLU', Name: 'relu6'. Input(s): 'fc6'. Output(s): 'fc6'.
Layer 18: Type: 'Dropout', Name: 'drop6'. Input(s): 'fc6'. Output(s): 'fc6'.
WARNING: Skipping training related layer 'drop6' of type 'Dropout'.
Layer 19: Type: 'InnerProduct', Name: 'fc7'. Input(s): 'fc6'. Output(s): 'fc7'.
Layer 20: Type: 'ReLU', Name: 'relu7'. Input(s): 'fc7'. Output(s): 'fc7'.
Layer 21: Type: 'Dropout', Name: 'drop7'. Input(s): 'fc7'. Output(s): 'fc7'.
WARNING: Skipping training related layer 'drop7' of type 'Dropout'.
Layer 22: Type: 'InnerProduct', Name: 'fc8_food'. Input(s): 'fc7'. Output(s): 'fc8_food'.
Layer 23: Type: 'Softmax', Name: 'prob'. Input(s): 'fc8_food'. Output(s): 'prob'.

================= Summary of the conversion: ===================================
Detected input(s) and shape(s) (ignoring batch size):
'data' : 3, 227, 227
Size of mean image: (H,W) = (256, 256) is greater than input image size: (H,W) = (227, 227). Mean image will be center cropped to match the input image dimensions. 

Network Input name(s): 'data'.
Network Output name(s): 'prob'.
(/anaconda/envs/coreml) bash-3.2$ 

After about 45 seconds food.mlmodel is generated. I import it into an iOS project using Xcode Version 9.0 beta 3 (9M174d) and run the following the code in a single view iOS project. // // ViewController.swift // SeeFood // // Created by Reza Shirazian on 7/23/17. // Copyright © 2017 Reza Shirazian. All rights reserved. //

import UIKit
import CoreML
import Vision

class ViewController: UIViewController {

  override func viewDidLoad() {
    super.viewDidLoad()

    var images = [CIImage]()
//    guard let ciImage = CIImage(image: #imageLiteral(resourceName: "pizza")) else {
//      fatalError("couldn't convert UIImage to CIImage")
//    }
    images.append(CIImage(image: #imageLiteral(resourceName: "pizza"))!)
    images.append(CIImage(image: #imageLiteral(resourceName: "spaghetti"))!)
    images.append(CIImage(image: #imageLiteral(resourceName: "burger"))!)
    images.append(CIImage(image: #imageLiteral(resourceName: "sushi"))!)
    images.forEach{detectScene(image: $0)}

    // Do any additional setup after loading the view, typically from a nib.
  }

  override func didReceiveMemoryWarning() {
    super.didReceiveMemoryWarning()
    // Dispose of any resources that can be recreated.
  }

  func detectScene(image: CIImage) {
    guard let model = try? VNCoreMLModel(for: food().model) else {
      fatalError()
    }
    // Create a Vision request with completion handler
    let request = VNCoreMLRequest(model: model) { [weak self] request, error in
      guard let results = request.results as? [VNClassificationObservation],
        let topResult = results.first else {
          fatalError("unexpected result type from VNCoreMLRequest")
      }

      // Update UI on main queue
      //let article = (self?.vowels.contains(topResult.identifier.first!))! ? "an" : "a"
      DispatchQueue.main.async { [weak self] in
        results.forEach({ (result) in
          if Int(result.confidence * 100) > 1 {
            print("\(Int(result.confidence * 100))% it's \(result.identifier)")
          }
        })
        print("********************************")

      }
    }
    let handler = VNImageRequestHandler(ciImage: image)
    DispatchQueue.global(qos: .userInteractive).async {
      do {
        try handler.perform([request])
      } catch {
        print(error)
      }
    }
  }
}

which outputs the following:

22% it's cup cakes
8% it's ice cream
5% it's falafel
5% it's macarons
3% it's churros
3% it's gyoza
3% it's donuts
2% it's tacos
2% it's cannoli
********************************
35% it's cup cakes
22% it's frozen yogurt
8% it's chocolate cake
7% it's chocolate mousse
6% it's ice cream
2% it's donuts
********************************
38% it's gyoza
7% it's falafel
6% it's tacos
4% it's hamburger
3% it's oysters
2% it's peking duck
2% it's hot dog
2% it's baby back ribs
2% it's cannoli
********************************
7% it's hamburger
6% it's pork chop
6% it's steak
6% it's peking duck
5% it's pho
5% it's prime rib
5% it's baby back ribs
4% it's mussels
4% it's grilled salmon
2% it's filet mignon
2% it's foie gras
2% it's pulled pork sandwich
********************************

this is completely off and inconsistent with how the model was performing on DIGITS. I'm not sure what I'm doing wrong or if I've missed a step. I tried creating the model without mean.binaryproto but that made no difference.

If it helps here is the deploy.prototxt

input: "data"
input_shape {
  dim: 1
  dim: 3
  dim: 227
  dim: 227
}
layer {
  name: "conv1"
  type: "Convolution"
  bottom: "data"
  top: "conv1"
  param {
    lr_mult: 1.0
    decay_mult: 1.0
  }
  param {
    lr_mult: 2.0
    decay_mult: 0.0
  }
  convolution_param {
    num_output: 96
    kernel_size: 11
    stride: 4
    weight_filler {
      type: "gaussian"
      std: 0.01
    }
    bias_filler {
      type: "constant"
      value: 0.0
    }
  }
}
layer {
  name: "relu1"
  type: "ReLU"
  bottom: "conv1"
  top: "conv1"
}
layer {
  name: "norm1"
  type: "LRN"
  bottom: "conv1"
  top: "norm1"
  lrn_param {
    local_size: 5
    alpha: 0.0001
    beta: 0.75
  }
}
layer {
  name: "pool1"
  type: "Pooling"
  bottom: "norm1"
  top: "pool1"
  pooling_param {
    pool: MAX
    kernel_size: 3
    stride: 2
  }
}
layer {
  name: "conv2"
  type: "Convolution"
  bottom: "pool1"
  top: "conv2"
  param {
    lr_mult: 1.0
    decay_mult: 1.0
  }
  param {
    lr_mult: 2.0
    decay_mult: 0.0
  }
  convolution_param {
    num_output: 256
    pad: 2
    kernel_size: 5
    group: 2
    weight_filler {
      type: "gaussian"
      std: 0.01
    }
    bias_filler {
      type: "constant"
      value: 0.1
    }
  }
}
layer {
  name: "relu2"
  type: "ReLU"
  bottom: "conv2"
  top: "conv2"
}
layer {
  name: "norm2"
  type: "LRN"
  bottom: "conv2"
  top: "norm2"
  lrn_param {
    local_size: 5
    alpha: 0.0001
    beta: 0.75
  }
}
layer {
  name: "pool2"
  type: "Pooling"
  bottom: "norm2"
  top: "pool2"
  pooling_param {
    pool: MAX
    kernel_size: 3
    stride: 2
  }
}
layer {
  name: "conv3"
  type: "Convolution"
  bottom: "pool2"
  top: "conv3"
  param {
    lr_mult: 1.0
    decay_mult: 1.0
  }
  param {
    lr_mult: 2.0
    decay_mult: 0.0
  }
  convolution_param {
    num_output: 384
    pad: 1
    kernel_size: 3
    weight_filler {
      type: "gaussian"
      std: 0.01
    }
    bias_filler {
      type: "constant"
      value: 0.0
    }
  }
}
layer {
  name: "relu3"
  type: "ReLU"
  bottom: "conv3"
  top: "conv3"
}
layer {
  name: "conv4"
  type: "Convolution"
  bottom: "conv3"
  top: "conv4"
  param {
    lr_mult: 1.0
    decay_mult: 1.0
  }
  param {
    lr_mult: 2.0
    decay_mult: 0.0
  }
  convolution_param {
    num_output: 384
    pad: 1
    kernel_size: 3
    group: 2
    weight_filler {
      type: "gaussian"
      std: 0.01
    }
    bias_filler {
      type: "constant"
      value: 0.1
    }
  }
}
layer {
  name: "relu4"
  type: "ReLU"
  bottom: "conv4"
  top: "conv4"
}
layer {
  name: "conv5"
  type: "Convolution"
  bottom: "conv4"
  top: "conv5"
  param {
    lr_mult: 1.0
    decay_mult: 1.0
  }
  param {
    lr_mult: 2.0
    decay_mult: 0.0
  }
  convolution_param {
    num_output: 256
    pad: 1
    kernel_size: 3
    group: 2
    weight_filler {
      type: "gaussian"
      std: 0.01
    }
    bias_filler {
      type: "constant"
      value: 0.1
    }
  }
}
layer {
  name: "relu5"
  type: "ReLU"
  bottom: "conv5"
  top: "conv5"
}
layer {
  name: "pool5"
  type: "Pooling"
  bottom: "conv5"
  top: "pool5"
  pooling_param {
    pool: MAX
    kernel_size: 3
    stride: 2
  }
}
layer {
  name: "fc6"
  type: "InnerProduct"
  bottom: "pool5"
  top: "fc6"
  param {
    lr_mult: 1.0
    decay_mult: 1.0
  }
  param {
    lr_mult: 2.0
    decay_mult: 0.0
  }
  inner_product_param {
    num_output: 4096
    weight_filler {
      type: "gaussian"
      std: 0.005
    }
    bias_filler {
      type: "constant"
      value: 0.1
    }
  }
}
layer {
  name: "relu6"
  type: "ReLU"
  bottom: "fc6"
  top: "fc6"
}
layer {
  name: "drop6"
  type: "Dropout"
  bottom: "fc6"
  top: "fc6"
  dropout_param {
    dropout_ratio: 0.5
  }
}
layer {
  name: "fc7"
  type: "InnerProduct"
  bottom: "fc6"
  top: "fc7"
  param {
    lr_mult: 1.0
    decay_mult: 1.0
  }
  param {
    lr_mult: 2.0
    decay_mult: 0.0
  }
  inner_product_param {
    num_output: 4096
    weight_filler {
      type: "gaussian"
      std: 0.005
    }
    bias_filler {
      type: "constant"
      value: 0.1
    }
  }
}
layer {
  name: "relu7"
  type: "ReLU"
  bottom: "fc7"
  top: "fc7"
}
layer {
  name: "drop7"
  type: "Dropout"
  bottom: "fc7"
  top: "fc7"
  dropout_param {
    dropout_ratio: 0.5
  }
}
layer {
  name: "fc8_food"
  type: "InnerProduct"
  bottom: "fc7"
  top: "fc8_food"
  param {
    lr_mult: 1.0
    decay_mult: 1.0
  }
  param {
    lr_mult: 2.0
    decay_mult: 0.0
  }
  inner_product_param {
    num_output: 101
    weight_filler {
      type: "gaussian"
      std: 0.01
    }
    bias_filler {
      type: "constant"
      value: 0.0
    }
  }
}
layer {
  name: "prob"
  type: "Softmax"
  bottom: "fc8_food"
  top: "prob"
}
2

There are 2 best solutions below

1
On

The discrepancy between the predictions on DIGITS using CaffeModel and CoreML was due to the fact that CoreML was interpreting the input data differently that DIGITS. Changing the call to convert with the following parameters resolved the issue

coreml_model = coremltools.converters.caffe.convert(('snapshot_iter_24240.caffemodel',
                                                     'deploy.prototxt',
                                                     'mean.binaryproto'),
                                                      image_input_names = 'data',
                                                      class_labels = 'labels.txt',
                                                      is_bgr=True, image_scale=255.)

http://pythonhosted.org/coremltools/generated/coremltools.converters.caffe.convert.html#coremltools.converters.caffe.convert

99% it's spaghetti bolognese
********************************
73% it's pizza
10% it's lasagna
7% it's spaghetti bolognese
2% it's spaghetti carbonara
********************************
97% it's sushi
********************************
97% it's hamburger
********************************
1
On

In its current form, coremltools has a tendency to change the input/output types and value ranges to suite its own internal optimizations. I highly recommend reimporting your newly-created .mlmodel file into your Python code and verify what data types it expects.

For example: it will convert Int values into Float (uses Double type in Swift) and Bool values to Int (True:1, False:0)