I followed instructions from apple website (https://developer.apple.com/metal/pytorch/) and when I verified mps support with its Python script, it just gave me back something I do not understand. (It's too long, partial listed below) I wish I could use the GPU acceleration for stable diffusion. My Macbook has Radeon Pro 555 with Ventura OS. Help please :(
Python 3.11.1 (v3.11.1:a7a450f84a, Dec 6 2022, 15:24:06) [Clang 13.0.0 (clang-1300.0.29.30)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> if torch.backends.mps.is_available():
... mps_device = torch.device("mps")
... x = torch.ones(1, device=mps_device)
... print (x)
... else:
... print ("MPS device not found.")
...
Traceback (most recent call last):
File "<stdin>", line 4, in <module>
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/torch/_tensor.py", line 461, in __repr__
return torch._tensor_str._str(self, tensor_contents=tensor_contents)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/torch/_tensor_str.py", line 677, in _str
return _str_intern(self, tensor_contents=tensor_contents)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/torch/_tensor_str.py", line 597, in _str_intern
tensor_str = _tensor_str(self, indent)
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/torch/_tensor_str.py", line 349, in _tensor_str
formatter = _Formatter(get_summarized_data(self) if summarize else self)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/torch/_tensor_str.py", line 137, in __init__
nonzero_finite_vals = torch.masked_select(
^^^^^^^^^^^^^^^^^^^^
RuntimeError: Failed to create indexing library, error: Error Domain=MTLLibraryErrorDomain Code=3 "program_source:168:1: error: type 'const constant ulong3 *' is not valid for attribute 'buffer'
REGISTER_INDEX_OP_ALL_DTYPES(select);
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
program_source:160:5: note: expanded from macro 'REGISTER_INDEX_OP_ALL_DTYPES'
REGISTER_INDEX_OP(8bit, idx64, char, INDEX_OP_TYPE, ulong3); \
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
program_source:138:5: note: expanded from macro 'REGISTER_INDEX_OP'
constant IDX_DTYPE * offsets [[buffer(3)]], \
^ ~~~~~~~~~
program_source:168:1: note: type 'ulong3' (vector of 3 'unsigned long' values) cannot be used in buffer pointee type
program_source:160:59: note: expanded from macro 'REGISTER_INDEX_OP_ALL_DTYPES'
REGISTER_INDEX_OP(8bit, idx64, char, INDEX_OP_TYPE, ulong3); \
^
program_source:168:1: error: explicit instantiation of 'index_select' does not refer to a function template, variable template, member function, member class, or static data member
REGISTER_INDEX_OP_ALL_DTYPES(select);
^
program_source:160:5: note: expanded from macro 'REGISTER_INDEX_OP_ALL_DTYPES'
REGISTER_INDEX_OP(8bit, idx64, char, INDEX_OP_TYPE, ulong3); \
^
program_source:134:13: note: expanded from macro 'REGISTER_INDEX_OP'
kernel void index_ ## INDEX_OP_TYPE<DTYPE, IDX_DTYPE>( \
^
<scratch space>:9:1: note: expanded from here
index_select
^
program_source:20:13: note: candidate template ignored: substitution failure [with T = char, OffsetsT = unsigned long __attribute__((ext_vector_type(3)))]: type 'unsigned long const constant * __attribute__((ext_vector_type(3)))' is not valid for attribute 'buffer'
kernel void index_select(
^
program_source:168:1: error: type 'const constant ulong3 *' is not valid for attribute 'buffer'
REGISTER_INDEX_OP_ALL_DTYPES(select);
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
program_source:162:5: note: expanded from macro 'REGISTER_INDEX_OP_ALL_DTYPES'
REGISTER_INDEX_OP(16bit, idx64, short, INDEX_OP_TYPE, ulong3); \
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
program_source:138:5: note: expanded from macro 'REGISTER_INDEX_OP'
constant IDX_DTYPE * offsets [[buffer(3)]], \
^ ~~~~~~~~~
program_source:168:1: note: type 'ulong3' (vector of 3 'unsigned long' values) cannot be used in buffer pointee type
program_source:162:59: note: expanded from macro 'REGISTER_INDEX_OP_ALL_DTYPES'
REGISTER_INDEX_OP(16bit, idx64, short, INDEX_OP_TYPE, ulong3); \
^
program_source:168:1: error: explicit instantiation of 'index_select' does not refer to a function template, variable template, member function, member class, or static data member
REGISTER_INDEX_OP_ALL_DTYPES(select);
^
program_source:162:5: note: expanded from macro 'REGISTER_INDEX_OP_ALL_DTYPES'
REGISTER_INDEX_OP(16bit, idx64, short, INDEX_OP_TYPE, ulong3); \
^
program_source:134:13: note: expanded from macro 'REGISTER_INDEX_OP'
kernel void index_ ## INDEX_OP_TYPE<DTYPE, IDX_DTYPE>( \
^
<scratch space>:17:1: note: expanded from here
index_select
^
....
...
program_source:248:13: note: candidate template ignored: substitution failure [with T = metal::_atomic<int, void>, E = int, OffsetsT = unsigned long __attribute__((ext_vector_type(3)))]: type 'unsigned long const constant * __attribute__((ext_vector_type(3)))' is not valid for attribute 'buffer'
kernel void index_put_accumulate_native_dtypes(
^
}
>>>
>>>
I degrade Python form 3.12.1 to 3.11.1, and reinstall the latest version of Pytorch nightly, still no luck with the result.
I can replicate this on recent Nightly builds (notably,
2.3.0.dev20240114). However, the latest stable release (Torch 2.1.2) works well.Try to create a new environment with the stable release of Torch. The Apple documentation for MPS acceleration with PyTorch recommends the Nightly build because it used to be more experimental.
Next, try to run your code
Update: This is confirmed as an issue on recent PyTorch nightly builds. See here and here.