Extract words from cell that are exactly 10 characters long and contain number and letter

130 Views Asked by At

I have a list of Lenovo products. I am looking for a way to extract their Product number from it.

The number may sit anywhere in the cell.
The number is 10 characters long, it always contains both numbers and letters and no special characters between them.
Sometimes the number is in brackets or has some special characters before or after (like - _ " < etc.).
I don't need them, I am looking for exactly 10 letters or digits and have at least 1 digit in it.
In the last line from the example there are two words with 10 characters - THINKCENTRE and 12Q6000AGE. THINKCENTRE doesn't contain a number, so it should be excluded.

Input Desired result
LEGION T5 (90SV003WGE) 90SV003WGE
FLEX 5 (82R700BEGE) XKLUSIV 82R700BEGE
V17-IRU 83A2001NGE 83A2001NGE
LENOVO E16 G1 21JT000HGE_WIN 11 21JT000HGE
LENOVO THINKCENTRE M70T 12Q6000AGE I7-12 12Q6000AGE
IDEAPAD 3 17ALC6 0 or empty

I tried different formulas but it gives me various results. I managed to find a way to extract the longest word, but very often this number isn't the longest and/or includes a lot of characters for cleanup.

3

There are 3 best solutions below

3
JvdV On BEST ANSWER

Starting assumptions:

  • Substrings of 10 characters only;
  • Substrings only contain characters [A-Z0-9]
  • Contains at least one digit [0-9];
  • Contains at least one character [A-Z].

DAF:

You could try to split your input on anything that isn't alphanumeric. That can be achieved with a double TEXTSPLIT() rather easy. Now you can check these substrings against your criteria using some boolean structure. For example:

enter image description here

Formula in B2:

=MAP(A2:A8,LAMBDA(s,LET(i,SEQUENCE(36)-1,r,TEXTSPLIT(s,TEXTSPLIT(s,BASE(i,36),,1)),TEXTJOIN(", ",,REPT(r,ISERR(-r)*(TEXTBEFORE(r&0,i)<>r)*(LEN(r)=10))))))

UDF:

You can also use an UDF applying some regular expression to extract the same as the above:

Public Function RegexMatch(s As String) As String

Static RE As Object: If RE Is Nothing Then Set RE = CreateObject("vbscript.regexp")

RE.Pattern = "(?:\b|_)(?!\d+\b|[A-Z]+\b)([A-Z\d]{10})(?:\b|_)"
RE.Global = True

Set REMatches = RE.Execute(s)
If REMatches.Count > 0 Then
    For Each Match In REMatches
        If RegexMatch = "" Then
            RegexMatch = Match.Submatches(0)
        Else
            RegexMatch = RegexMatch & ", " & Match.Submatches(0)
        End If
    Next
Else
    RegexMatch = vbNullString
End If

End Function

The pattern's explaination can be found here

You can invoke the UDF using: =RegexMatch(A2) or embedded in a MAP(A2:A8,LAMBDA(s,RegexMatch(s))) structure.

2
Harun24hr On

FILTERXML() can be utilized. For Excel-365 try-

=IFERROR(FILTERXML("<t><s>"&TEXTJOIN("</s><s>",1,TEXTSPLIT(A2,{"(",")","_"," "}))&"</s></t>","//s[translate(.,'1234567890','')!=.][string-length()=10]"),"")

For Excel-2013 and later.

=IFERROR(FILTERXML("<t><s>"&SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(A2,"(",""),")",""),"_"," ")," ","</s><s>")&"</s></t>","//s[translate(.,'1234567890','')!=.][string-length()=10]"),"")

enter image description here

0
FunThomas On

I am almost sure that there is a clever regular expression for this, but I am not very good at RegEx so I am doing it with plain old VBA. The following function is rather straight forward. It takes a string and

  • replaces underscores by space (to handle 21JT000HGE_WIN)
  • uses Split to get the single words
  • loops over all words
  • Replace ( and )
  • Check for length = 10
  • Loop over all characters of a word and check if it contains at least a character and a digit.

You can use that function as UDF in your Excel sheet

Function getProductNumber(s As String) As String
    Dim tokens() As String, i As Long
    tokens = Split(Replace(s, "_", " "), " ")
    For i = 0 To UBound(tokens)
        Dim word As String, j As Long, char As String
        word = UCase(Replace(Replace(tokens(i), "(", ""), ")", ""))
        If Len(word) = 10 Then
            Dim digitFound As Boolean, charFound As Boolean
            digitFound = False
            charFound = False
            For j = 1 To Len(word)
                char = Mid(word, j, 1)
                If Asc(char) >= Asc("0") And Asc(char) <= Asc("9") Then
                    digitFound = True
                ElseIf Asc(char) >= Asc("A") And Asc(char) <= Asc("Z") Then
                    charFound = True
                End If
            Next
            
            If digitFound And charFound Then
                getProductNumber = word
                Exit Function
            End If
        End If
    Next i
End Function