Can I get the offset to the underlying object from a memoryview? (Why not?) Is there an alternative?

324 Views Asked by At

I'm implementing a parser for a binary (image) file format. memoryview seems to be an almost perfect solution for this: The parser class keeps the data in a bytes object and passes sliced memoryviews to code that parses substructures. Sometimes those substructures contain offsets relative to the entire file; this is handily supported by the underlying object being a field of the memoryview, so I can always get a memoryview to any file offset. All my memoryviews are contiguous.

Now, I have a function that returns a bunch of these contiguous memoryviews which correspond to all the image data in the image file (as opposed to the metadata), and I would like to determine which ranges of the underlying bytes object—or, equivalently, the image file—they correspond to. In other words, I would like to extract from the memoryview information corresponding to "it is a view of underlying_bytes[12:100]".

It seems to me that the memoryviews necessarily have internally sufficient information to compute the start and end offsets in the underlying object that they correspond to. However, I can find no method that would help me access this information.

Is there such a method? Would such a method on memoryview be a bad or impossible idea for some reason? Is there a convenient alternative to memoryview that would allow me to do this?

1

There are 1 best solutions below

0
Sami Liedes On

Not a full answer (so if someone has insight or a better solution, I intend to accept that answer), but here's a workaround using Cython. It would be nice to be able to do this without Cython.

from cpython.buffer cimport PyObject_GetBuffer, PyBuffer_Release, PyBUF_ANY_CONTIGUOUS, PyBUF_SIMPLE
import cython

@cython.binding(True)
def memoryview_slice(m: memoryview) -> slice:
    '''Gets a slice object that corresponds to the given memoryview's view on the undelying.'''
    cdef Py_buffer view_buffer, underlying_buffer
    PyObject_GetBuffer(m, &view_buffer, PyBUF_SIMPLE | PyBUF_ANY_CONTIGUOUS)
    try:
        view_ptr = <const char *>view_buffer.buf
        PyObject_GetBuffer(m.obj, &underlying_buffer, PyBUF_SIMPLE | PyBUF_ANY_CONTIGUOUS)
        try:
            underlying_ptr = <const char *>underlying_buffer.buf
            if view_ptr < underlying_ptr:
                raise RuntimeError("Weird: view_ptr < underlying_ptr")
            start = view_ptr - underlying_ptr
            return slice(start, start + len(m))
        finally:
            PyBuffer_Release(&underlying_buffer)
    finally:
        PyBuffer_Release(&view_buffer)

Usage:

>>> a = b'1'*1024
>>> memoryview_slice(memoryview(a))
slice(0, 1024, None)
>>> memoryview_slice(memoryview(a)[10:123])
slice(10, 123, None)