Cython and regex.h

1k Views Asked by At

I am relatively new to Cython, so apologies if this question seems very basic.

There is a parallelizable block of regex matching, and I'd like to run it with Cython and nogil. To avoid using Python objects, my plan is to import regex.h.

The following import segment compiles:

cdef extern from "regex.h" nogil:
   ctypedef struct regoff_t
   ctypedef struct regex_t
   ctypedef struct regmatch_t
   int regcomp(regex_t* preg, const char* regex, int cflags)
   int regexec(const regex_t *preg, const char *string, size_t nmatch, regmatch_t pmatch[], int eflags)

def matchPatterns(str pageContent, str regex):
   cdef set matchingPatterns = set()
   return matchingPatterns

But as soon as I make use of regex_t or any of its functions, I get the error: contentMatchPatternCython.pyx:10:16: Variable type 'regex_t' is incomplete

If I remove the empty ctypedefs, the code does not compile as regex_t is undefined. Obviously, I think/hope that there's a way forward without duplicating the struct definition in Cython.

I'm using Python 2.7.2 and Cython 0.22. Any pointers would be received with gratitude.

1

There are 1 best solutions below

2
On BEST ANSWER

http://docs.cython.org/src/userguide/external_C_code.html

To directly quote the documentation:

If the header file declares a big struct and you only want to use a few members, you only need to declare the members you’re interested in. Leaving the rest out doesn’t do any harm, because the C compiler will use the full definition from the header file.

In some cases, you might not need any of the struct’s members, in which case you can just put pass in the body of the struct declaration, e.g.:

cdef extern from "foo.h":
    struct A:
        pass
    # or (I've added this bit - not in the documentation directly...)
    ctypedef struct B:
        pass

Which of these to use depends if you're matching C code that reads struct A {} or typedef struct {} B.