Regex to pull multiword matches from list of terms

151 Views Asked by At

I am trying to modify the following regex (in javascript) from the glossarizer plugin to make it less lenient.

 var regex = new RegExp("(^s*|[^!])" + this.clean(term) + "\\s*|\\,$", "i");

It is trying to retrieve a definition for a term in a JSON array of terms and definitions.

[{term: "black cat", definition: "a black cat"},
{term: "cat", definition: "meow"}]

Right now it is matching "black cat" when I pass in "cat", but I do not want it to. I want it to match something that starts at the beginning of a string or after a comma and */s and ends in either a comma or the end of the string, so that multiple definitions can be passed in.

*Match for cat:
'cat'
' Cat '
'cat, feline'
'feline, cat  , cheetah'

*Not a match for cat:
'black cat'
'Catapult'
'!cat'

I tried putting new RegExp("(^|^s*|[^!])" + this.clean(term) + "\\s*$|\\s*,", "i") (adding pipes so it should be start of string OR comma and whitespace, and so at the end it would be whitespace followed by a comma or the end of the string, but it didn't have the desired effect (using regex101.com, but being mostly confused)

3

There are 3 best solutions below

0
On

You can use this regex for matching your valid cases:

/(?:^|,) *\bcat(?= *(?:,|$))/gmi

RegEx Demo

0
On

You might be better off splitting the list by comma-space and filtering the result. Regexes (especially the JS flavor, with its lack of lookbehinds) are bad at parsing syntaxes like this.

terms = "feline, cat, cheetah";
if(terms.toLowerCase().split(", ").indexOf("cat") >= 0)
   // a cat was there!
0
On

Apologies for adding this as answer rather than comment (not yet got enough reputation to add a comment being new). This is about checking the correct JSON is being used for the plug-in because your 'not a match' list confused me. For 'not a match' each rejected term needs ! in front of it, so the JSON could be

[
{
term: "black cat",
definition: "a black cat"
},
{
term: "!black cat, cat, !Catapult",  
definition: "meow"
}
] 

This would match the whole word cat including with punctuation before or after, plus ginger cat 'cat in the hat' 'my cat,' etc but not catastrophe or cats. If this is what you want only a json change is needed. The RegEx in the previous answer could be more what you are looking for though.

A developer update which is relevant has been made since you posted your question which might be useful

Ignore ! in the words while getting description of terms https://github.com/PebbleRoad/glossarizer/blob/master/jquery.glossarize.js