Need assistance in converting osquery queries from regexp_like() to regex_match()

112 Views Asked by At

I'm currently working with osquery and need some help in converting several queries that use regexp_like() to the equivalent regex_match() syntax. I have over 500 queries, and it's not practical to manually update each one. I am specifically working with ATC tables and need to use these queries in packs.

As an example, I have a query like this:

SELECT DISTINCT registry_key, act_timestamp FROM testregistry WHERE regexp_like (registry_key, '(?i)\\REGISTRY\\.*\\Services\\SharedAccess\\Parameters\\FirewallPolicy\\StandardProfile') AND lower(reg_val) = lower('EnableFirewall') AND lower(reg_val_data) = lower('0') AND DATE_STR_FILTER

I attempted to convert it by replacing regexp_like() with regex_match() and adding 0 as the index for the full match:

SELECT DISTINCT registry_key, act_timestamp FROM testregistry WHERE regex_match (registry_key, '(?i)\\REGISTRY\\.*\\Services\\SharedAccess\\Parameters\\FirewallPolicy\\StandardProfile', 0) AND lower(reg_val) = lower('EnableFirewall') AND lower(reg_val_data) = lower('0') AND DATE_STR_FILTER

However, this results in an "Invalid syntax" error.

Could someone provide guidance on the correct way to convert these queries? I appreciate any assistance or insights you can offer.

1

There are 1 best solutions below

1
Patrick Janser On

I never heard from osquery before, so I tried it out. It's a nice discovery! Thanks!

The function regex_match() is built to return a value. So if you want to use it as a condition, you have to compare it with something. Example:

SELECT * FROM services WHERE REGEX_MATCH(name, "^[Aa]", 0) IS NOT NULL;

This will return all services starting with the character A or a.

Another example to show that you can extract some data from a field and use it in your output:

SELECT
  name,
  REGEX_MATCH(name, "^.*?(\d+).*$", 1) AS `number_in_name`
FROM
  services
WHERE
  REGEX_MATCH(name, "^[aAbB]", 0) IS NOT NULL;

This will list all services starting with A or B (case insensitive) and extract the number in the service name and display it in a new field called number_in_name:

+------------------------------+----------------+
| name                         | number_in_name |
+------------------------------+----------------+
| AJRouter                     |                |
| ALG                          |                |
| AppIDSvc                     |                |
| Appinfo                      |                |
| AppMgmt                      |                |
| AppReadiness                 |                |
| AppVClient                   |                |
| AppXSvc                      |                |
| AssignedAccessManagerSvc     |                |
| AudioEndpointBuilder         |                |
| Audiosrv                     |                |
| autotimesvc                  |                |
| AxInstSV                     |                |
| BDESVC                       |                |
| BFE                          |                |
| BITS                         |                |
| Bonjour Service              |                |
| BrokerInfrastructure         |                |
| Browser                      |                |
| BTAGService                  |                |
| BthAvctpSvc                  |                |
| bthserv                      |                |
| AarSvc_23283ec               | 23283          |
| BcastDVRUserService_23283ec  | 23283          |
| BluetoothUserService_23283ec | 23283          |
+------------------------------+----------------+

Regex pattern to convert your SQL queries

The regex pattern, with the x (extended) and i (case-insensitive) modifiers:

/
regexp_like\s*\(
  (?<spacing_begin>\s*)        # Capture the spacing as it could contain indentation.
  (?<field>[\w\.]+|`[\w\.]+`), # The DB field, with the backticks notation also.
  (?<spacing_middle>\s*)       # Capture the indentation.
  (?<pattern>                  # Capture the regex pattern:
    (?<quote>['"])             # Strings can start with single or double quotes.
    (?:\k<quote>{2}|.*?)       # Escaped quote or anything else, in an ungreedy way.
    \k<quote>                  # The closing single or double quote captured before.
  )
  (?<spacing_end>\s*)          # Capture the last indentation.
\)
/gxi

Replacement:

REGEX_MATCH(${spacing_begin}${field},${spacing_middle}${pattern},${spacing_middle}0${spacing_end}) IS NOT NULL

Test it live here: https://regex101.com/r/3hMryB/3

I'm using PHP's PCRE engine, as it has the x modifier that helps you write regex patterns on multiple lines, with spaces, indentation and optional comments.

I used named groups to capture the pieces from your SQL query as they are more readable than numbers in the replacement string.

It can get tricky to capture strings correctly because they can be single or double quoted (ex: 'hello' or "world"). And we also have to handle quotes inside of them. I see that in SQL you have to double them and not escape them with a backslash (ex: "Say ""Hello!""").

Be also aware that the (?i) modifier to make the pattern case-insensitive doesn't seem to work with REGEX_MATCH(). So this means that we have to rewrite some of the patterns. This could be done with some JS or PHP, by using a callback function during the replacement step. Detecting if (?i) is in your pattern and in this case convert all letters to a range with the lowercase and uppercase declinations (z -> [zZ], é -> [éÉ], etc).

I wrote all that in JavaScript and you can run it directly here:

/**
 * This function is used to call a specific function callback only once
 * at the time. This is to avoid multiple events such as 'input' to run
 * multiple times and lead to a non-responding browser.
 *
 * @param {function} callback The function to call.
 */
function onlyOnceAtATime(callback) {
  // Keep track of all the running callbacks.
  let runningCallbacks = new Map();

  return function (...args) {
    if (runningCallbacks.has(callback.name)) {
      return;
    }

    runningCallbacks.set(callback.name, true);
    callback(...args);
    runningCallbacks.delete(callback.name);
  };
}

/**
 * Converts a SQL query to replace regexp_like() by regex_match().
 *
 * @param {string} sql The SQL query to convert.
 * @return {string} The new SQL query.
 */
function convertRegexpLikeToRegexMatch(sql) {
  return sql.replace(
    /regexp_like\s*\((?<spacing_begin>\s*)(?<field>[\w\.]+|`[\w\.]+`),(?<spacing_middle>\s*)(?<pattern>(?<quote>['"])(?:\k<quote>{2}|.*?)\k<quote>)(?<spacing_end>\s*)\)/giu,
    (fullMatch, spacingBegin, field, spacingMiddle, pattern, quote, spacingEnd) => {
      const regexpCaseInsensitiveModifier = /\(\?i\)/;
      // The pattern has to potentially be replaced if it contains "(?i)".
      if (pattern.match(regexpCaseInsensitiveModifier)) {
        // 1. Remove this unsupported modifier.
        pattern = pattern.replace(regexpCaseInsensitiveModifier, '');
        // 2. Make all letters case-insensitive.
        pattern = convertToCaseInsensitive(pattern);
      }
      // Return the modified SQL.
      return 'REGEX_MATCH(' + spacingBegin + field + ',' + spacingMiddle +
             pattern + ',' + spacingMiddle + '0' + spacingEnd + ') IS NOT NULL';
    }
  );
}

/**
 * Converts a regex pattern to case-insensitive.
 *
 * @param {string} input The regex pattern to modify.
 * @return {string} The modified regex pattern.
 */
function convertToCaseInsensitive(input) {
  return input.replace(
    // Pattern to find indented code blocks (not in fences).
    /\p{L}/gu,
    // Replacement callback function:
    (letter) => {
      return '[' + letter.toLowerCase() + letter.toUpperCase() + ']';
    }
  );
}

// When the document is ready, attach the event handlers.
document.addEventListener("DOMContentLoaded", function () {
  const input = document.getElementById("input");
  const output = document.getElementById("output");

  function updateOutput() {
    output.value = convertRegexpLikeToRegexMatch(input.value);
  }
  
  // Each change in the input textarea should update the output textarea.
  input.addEventListener('input', onlyOnceAtATime(updateOutput));
  
  // When clicking in the output textarea, select all the text.
  output.addEventListener('click', (event) => {
    const target = event.target;
    target.setSelectionRange(0, target.value.length);
  });
  
  // For the first load, calculate the output.
  updateOutput();
});
body {
  box-sizing: border-box;
}
body *, body *:before, body *:after {
  box-sizing: inherit;
}

textarea {
  width: 100%;
}
textarea[readonly] {
  background: #f8f8f8;
}
<h3>Converts <code>regexp_like()</code> to <code>regex_match()</code> and handles the missing <code>(?i)</code> modifier</h3>

<form action="#">
  <div>
    <textarea name="input" id="input"
              cols="80" rows="20"
              placeholder="Paste your SQL queries here">
SELECT DISTINCT registry_key, act_timestamp FROM testregistry WHERE regexp_like (registry_key, '(?i)\\REGISTRY\\.*\\Services\\SharedAccess\\Parameters\\FirewallPolicy\\StandardProfile') AND lower(reg_val) = lower('EnableFirewall') AND lower(reg_val_data) = lower('0') AND DATE_STR_FILTER;

-- Another example with indentation respected during replacement.
SELECT
  registry_key,
  act_timestamp
FROM
  testregistry
WHERE
  REGEXP_LIKE(
    registry_key,
    "^\\[rR][eE]"
  )
  AND LOWER(reg_val) = LOWER('EnableFirewall')
  AND LOWER(reg_val_data) = LOWER('0')
  AND DATE_STR_FILTER;</textarea>
  </div>
  <div>
    <p>Output:</p>
    <textarea name="output" id="output"
              cols="30" rows="20"
              readonly></textarea>
  </div>
</form>