I'm trying to adapt this tutorial to use my own neural net and images. I can do that on my CPU, but what I cannot do either with the unchanged tutorial, or my adaptation of it, is use my GPU. According to system information, I have an "NVIDIA Quadro P2200", not that I need to specify this anywhere as far as I can tell. Instead, it seems all I need do is replace:
LearningModelDeviceKind deviceKind = LearningModelDeviceKind::Default;
with:
LearningModelDeviceKind deviceKind = LearningModelDeviceKind::DirectX;
When I do this, I get an exception in:
auto results = session.Evaluate(binding, L"RunId");
After constructing the second parameter, this drops into:
template <typename D> WINRT_IMPL_AUTO(Windows::AI::MachineLearning::LearningModelEvaluationResult) consume_Windows_AI_MachineLearning_ILearningModelSession<D>::Evaluate(Windows::AI::MachineLearning::LearningModelBinding const& bindings, param::hstring const& correlationId) const
{
void* result{};
check_hresult(WINRT_IMPL_SHIM(Windows::AI::MachineLearning::ILearningModelSession)->Evaluate(*(void**)(&bindings), *(void**)(&correlationId), &result));
return Windows::AI::MachineLearning::LearningModelEvaluationResult{ result, take_ownership_from_abi };
}
A winrt::hresult_error is thrown immediately upon stepping into the check_hresult(...) line. I think this means bindings is somehow invalid... but (a) I'm not sure about that and (b) I have no idea what to do to make it valid. Help?
EDIT: I can now get the MS sample working, but not my adaptation. When I view the MS sample .onnx file using Netron, the input and output nodes have reasonable names, and the tensor sizes reported are also reasonable. On the model I am trying to use, the input & output nodes both have ":0" as the last part of their name, and the tensor sizes have one "unknown" size e.g. input size is reported as "unk_123 x 3 x 224 x 224". Do either of these create any incompatibility? The network is supplied to me, so I'd like to understand if either require change before asking for it...
It all works as intended. Having tripped up several times trying to adapt Windows ML code to my requirements, my strong advice is:
For example, in response to the EDIT section, the issue was copied/pasted/edited code that changed the output shape from 1 x 1000 x 1 x 1 (pasted) to 1 x 10 x 1 x 1 (edited) when it needed to be 1 x 10. That was detected by following my own advice above :-)
I can confirm that setting
deviceKind = LearningModelDeviceKind::DirectXis what invokes the GPU, but that you may not get any noticeable speed improvement from doing so.