CoreML / MLModelConfig preferredMetalDevice - understanding device placement heuristics

2.2k Views Asked by At

Is there any public document that clearly states CoreML's strategy for GPU device placement when running inference models on macOS? How does it decide if it should run on integrated, discrete or CPU? Can one reliably 'force' one path? How does this change for systems like the new Mac Pro with multiple discrete GPUs as well as multiple eGPU?

My testing on my rMBP indicates the answer is no - and that temperature, battery, being plugged in to power, automatic graphics settings and app support and perhaps even some MLModel architecture heuristic all play a role in device placement.

Longer with context:

Im curious if there is any public documentation on CoreML's device selection heuristic. With the addition of 10.15's CoreML preferredMetalDevice API for MLModelConfig, I imagined it would be possible to force the MTLDevice an MLModel / Vision request runs on.

In my testing with integrated, discrete and eGPU on my 2018 rMBP with Vega 20, it appears only the eGPU consistently runs the CoreML model when requested.

My CoreML Model is a pipeline model consisting of a MobileNet classifier with multiple outputs (multi head classifiers attached to a custom feature extractor).

Im curious to understand device selection preference for a few reasons:

a) I'd like to ensure my MLModel is fed images CIImages backed by MTLTextures local to the device inference will be run on, to limit PCI transfers and keep processing on a single GPU device

b) My model is actually fed frames of video, and WWDC '19 / 10.15 introduces VideoToolbox and AVFoundation API's to help force particular video encoders and decoders on specific GPUs.

In theory, if all works well, I should be able to specify the same MTLDevice for video decode, preprocessing, CoreML/Vision inference, and subsequent encoding - keeping all IOSurface backed CVPixelBuffers, CVMetalTextureRefs, MPSImages and friends resident on the same GPU.

Apple has a Pro Apps WWDC video suggesting this is the path forward to fast path Multi GPU support / Afterburner decoder support moving forward.

Does CoreML ACTUALLY allow suggested device placement to work?

I am running a retina MacBook Pro 2018 with Vega 20 GPU, and trying various methods to get the Vega 20 to light up.

  • Disabling automatic graphics switching

  • Disabling automatic graphics switching / setting NSSupportsAutomaticGraphicsSwitching to False

  • Disabling automatic graphics switching / setting NSSupportsAutomaticGraphicsSwitching to True

  • Enabling automatic graphics switching / setting NSSupportsAutomaticGraphicsSwitching to False

  • Enabling automatic graphics switching / setting NSSupportsAutomaticGraphicsSwitching to True

  • having a full battery and plugged into my Apple power adaptor

  • having full battery and plugged into my eGPU

Results:

  • I can reliably get the eGPU to run inference on my MLModel if I use MLModelConfig with preferredMetalDevice - every time.

  • I can fairly reliably get the integrated GPU to run inference if I request it - but on occasion with some configurations of battery power, being plugged in, or automatic graphics switching options it doesn't run.

  • I cannot reliably get the discrete GPU to run consistently on any above combination of configurations - but do see that all of my resources are resident on the GPU (textures etc), and see that CoreML is configured to run there. It just doesn't report any activity.

I have configured my info.plist to have the proper eGPU support, and can hot plug / detect device changes and dispatch work to eGPUs, and also support detecting device removal requests. That all works. What doesn't is CoreML respecting my device placement!

1

There are 1 best solutions below

11
On

There is not a public document clearly stating CoreML’s GPU utilization plan. Note that your question seems to be asking many different questions, and should be more focused on one question per post, but I will do the best I can to answer them.

You can “force” it run on the CPU only:

let config = MLModelConfiguration()
config.computeUnits = .cpuOnly

Or CPU and GPU:

config.computeUnits = .cpuAndGPU

Or all available compute units which includes the Neural Engine if available, and if the MLModel layer(s) supports it:

config.computeUnits = .all

When there are multiple Metal devices, you can choose which to use. See this example code to choose between the highest powered Metal device, external GPUs or a GPU not driving a display.

You can also choose to allow low precision loss:

config.allowLowPrecisionAccumulationOnGPU = true