Levenshtein distance for words from list in Bigquery

62 Views Asked by At

I want to know if there is a way to perform Levenshtein algorithm with a list of words in Bigquery.

I want to check whether different words match some of the words from checking list. Why Levenshtein distance? Because I also want to check similar words which could be written with mistakes (for example, for the word "first" I want to find also "fiist" for distance = 1)

Currently, I use this code (from Medium)

create temp function
  LevenshteinDistance(in_a string,
    in_b string)
  returns int64
  language js as 
"""
var a = in_a.toLowerCase();
var b = in_b.toLowerCase();
  
if(a.length == 0) return b.length; 
if(b.length == 0) return a.length;
var matrix = [];
// increment along the first column of each row
var i;
for(i = 0; i <= b.length; i++){
  matrix[i] = [i];
}
// increment each column in the first row
var j;
for(j = 0; j <= a.length; j++){
  matrix[0][j] = j;
}
// Fill in the rest of the matrix
for(i = 1; i <= b.length; i++){
  for(j = 1; j <= a.length; j++){
    if(b.charAt(i-1) == a.charAt(j-1)){
      matrix[i][j] = matrix[i-1][j-1];
    } else {
      matrix[i][j] = 
        Math.min(matrix[i-1][j-1] + 1, // substitution
        Math.min(matrix[i][j-1] + 1, // insertion
        matrix[i-1][j] + 1)); // deletion
    }
  }
}
return matrix[b.length][a.length];
""";

And it works great with one word. But does not work with multiple words

I want to do something like this:

LevenshteinDistance(message, r'one|two|three')
0

There are 0 best solutions below