How to find a substring anywhere in a string

8.8k Views Asked by At

This should be easy, but I'm finding it difficult.

I just want to find whether a substring exists anywhere in a string. In my case, whether the name of a website exists in the title of a product.

My code is like this:

#FindNoCase("Amazon.com", "Google Chromecast available at Amazon")#

The above returns a 0 which is correct because the entire substring "Amazon.com" doesn't exist in the main string. But some of it does, namely the "Amazon" part.

How could I achieve what I'm trying to do which is just see if ANY of the substring (at least more than 2 character in length) exists in the main string?

So I need something like FindOneOf() but actually "find at least three of". It should then look at the word "Amazon" in the product title and check if at least 3 characters in the sequence of "Amazon.com" exists. When it sees that "Ama" exists, then it just needs to return a true value. Can it be done using the existing built-in functions somehow?

Update: Very simple solution. I used Left("amazon", 3).

2

There are 2 best solutions below

0
On

Solved it by doing this

#FindNoCase(left("Amazon.com", 3), "Google Chromecast available at Amazon")#

Yes there is potential it won't do what I need in cases where the seller name less than 3 characters long. But I think its rare enough to be ok.

6
On

There's a lot of danger in false positives, like if someone was buying the Alabama state flag.

Because of store names that contain spaces, this is a little tricky (Wal Mart is often written with a space).

If your string always contains at [store], you can extract the store name by finding the last at in the sentence and creating a string by chopping off everything else.

Because it looks for occurrences of at only as a whole word, there's no danger with store names such as Beats Audio, or Sam's Meat Shop. I can't think of any any stores with the word at in the name. While that would technically trip it up, there's much lower risk, and you can do a pre-replace on such store names.

<cfset mystring = "Google Chromecast available at Amazon">
<cfset SellerName = REReplaceNoCase(mystring,".*\b(?:at)\b(?!.*\b(?:at)\b)\s*","")>
<cfoutput>Seller: #Sellername#</cfoutput>

You can then do your comparisons much more safely.


Per your comment, If you know all possible patterns, you can still obtain the data if you want to (false positives can either be embarrassing or catastrophic, depending on the action). If you know the stores you're working with, you can use a regex to pull out the string like this

<cfset mystring = "Google Chromecast available at Amazon.co.uk">
<cfset SellerName = REReplaceNoCase(mystring,".*\b((Google|Amazon|Wal[\W]*Mart|E[\W]*bay)(\.[a-z]+)*)\b","\1")>
<cfoutput>Seller: #Sellername#</cfoutput>

The only part you need to update is the pipe-delimited list You might add K-Mart as K[\W]*Mart the [\W]* permits any special character or space so it covers kMart, K-Mart, k*Mart, but not Kwik-E-Mart.


Update #2, per more comments

<cfset mystring = "Google Chromecast available at Toys-R-US">
<cfset SellerNameRE = REReplace(rsProduct.sellername,"[\W]+","[\W]*","ALL")>
<cfset TheSellerName = REReplaceNoCase(mystring,".*\b((#sellernameRE#)(\.[a-z]+)*)\b","\1")>
<cfoutput>Seller: #TheSellername# (#SellerNameRE#)</cfoutput>

This replaces any symbols with the wildcard character so that symbols aren't required so that if something says Wal*Mart, it will still match WalMart.

You could also load a seperate column with "Regex Names" so that you're not doing this each time.

So your table would look something like

SellerID    SellerName    RegexName
1           Wal-Mart      Wal[\W]*Mart
2           Toys-R-US     Toys[\W]*R[\W]*US

<cfset mystring = "Google Chromecast available at Toys-R-US">
<cfset TheSellerName = REReplaceNoCase(mystring,".*\b((#rsProduct.RegexName#)(\.[a-z]+)*)\b","\1")>
<cfoutput>Seller: #TheSellername# (#SellerNameRE#)</cfoutput>