The correct way to search for a substring in a string

141 Views Asked by At

the most part of my question is: how to deal with the cases when a string loaded into __m128i contains only part of a substring?

the requirement: to search escaped sequences or the '"' (double quoting, not escaped) at the same time.

examples: (there is no double-quoting at start)

  1. some string ending with"
  2. some string \"with nested\" string"
  3. some string with\\n escaped new line"
  4. some string with\\uDABF hex"

the simple case: \\n, \\r. in that case I'm just looking for \\ char using _mm_cmpestri(), and, if the result is pointing to the last char - to r-shift it by one char using _mm_bsrli_si128() and insert one char at last position using _mm_insert_epi8() to check it for validity.

but there is a more complicated case when it is necessary to validate a \\uDABF-like sequences.

unlike strchr()/strstr() the _mm_cmpestri() with _SIDD_CMP_EQUAL_ANY allows to search for {'\\', '"'} at once call because the '\\' has a higher priority than \".

at the moment it seems to me that I'm using a completely illogical approach to solving this problem using SIMD...

perhaps there are some common practices for solving such problems using SIMD?

0

There are 0 best solutions below