I have a list of strings in cells - 1000s of them - and I need to work out the characters per word but separated by word - preferably in 1 swift formula...
For Example: 1. "Black Cup With Handle" > Formula I need > 5,3,4,6
- "Giant Bear Statue" > Formula I need > 5,4,6
I need this for a recurring task which has been macro'd in a very inefficient way to count words into columns (of which we need to use up to 20 for the just encase) but this needs to be tackled.
Usually, we count the spaces and layer nested serach() formulas to piggyback onto one and other to break down the structure then character counts the individual words...
I could alternatively the macro to substitute the spaces for commas and used text to columns but that still leaves me with a prolonged counting process for what im looking for
we obviously use =LEN(A1)-LEN(SUBSTITUTE(A1," ",""))
to count the spaces in the word
we currently then use =SEACRH()
function combined with =MID()
functions (and some bizarre numbers) to reveal each word into its own individual cell
then =LEN
once again bu on all individual words - very long-winded
Im hoping to find a shorter way to do this but feeling there may not be a dynamic enough way to do it with formula alone, hoping someone can prove me wrong!
You'll have different options depending on your Excel version.
OPTION 1:
TEXTJOIN
I think you are looking for a
TEXTJOIN
function. Just bare in mind that you can only use this the more later versions of Excel (see link to documentation) and it could work like this:Formula in
B1
:To make it so that you won't need to use the above key-combo, we can include an
INDEX
:Additional Information:
FILTERXML
This function takes (as per documentation) two required arguments:
Because we want to return an array of elements (words) from the cell, we need to
SUBSTITUTE
the spaces for end-tags (</..>
) and concatenate that with a start-tag (<..>
) at the start of the string and another end-tag at the end.I'll have to rely on an
XML
explaination on the tags as to why<?><?>
works and it's meaning, because as far as my testing goes I could swap the letters around or replace by another letter with the same results as long as the finalXpath
would resemble the same character. It would be great if someone would be able to complement this answer with a better explanation on this matter.For more
FILTERXML
"tricks", have a look hereTEXTJOIN
If you are a Office 365 subscriber or own Excel 2019 you can make use of this function. There are (as per documentation) at least 3 required arguments:
TRUE
orFALSE
and determines whether or not you want to exclude/include empty valuesNow this is where we can join the two functions together,
FILTERXML
returning an array which we can use inTEXTJOIN
.INDEX
+LEN
I'll have to explain the use of these functions together. I don't think
LEN
andINDEX
will need much of an introduction on their own, but together they work quite nicely. Natively there will be a force called implicit intersection that will preventLEN
from returning an array of values when you pass an array of values to the function, in this case through ourFILTERXML
.Normally you would disable this mechanism using a key combination of: CtrlShiftEnter, better known as
CSE
.Now what
INDEX
does is disabling this implicit intersection makingLEN
able to return an array, removing the need toCSE
the formula.INDEX
is one of the functions that has this "power". A more in depth explanation on implicit intersection can be found hereOPTION 2: UDF
Without access to
TEXTJOIN
I think you'll need to have a look at using an UDF, possibly looking like below:You can call this in
B1
like so:=TEXTJOIN(A1)
Additional Information:
The UDF consists out of three main mechanisms that work together:
JOIN
This funciton takes two parameters, where the first one is required:
The function returns a string value
SPLIT
This function takes a string and delimits it by a specified character/substring. It takes the following arguments:
In this case we would only need the first two arguments.
Application.Evaluate
This is IMO one of the most handy mechanisms you can use to pull of a returned array of values without having to loop through items/cells. It may get slow when you feed the function a large array formula, but in this case it will be fine. The funtion converts a Microsoft Excel name into an object or value, and when we pass it an formula, it thus will return the results. In this particular case it will return an array.