How to capture all caps words for textual analysis using excel?

122 Views Asked by At

I'm currently working with a large dataset based on people's reviews. The dataset includes 25 thousand individuals' reviews of different books, and each row on the text column is unique and has lots of sentences.

My issue is that I'd like to code for whether the text contains any ALL CAPS words. For instance, someone could say: "This book is AMAZING! I loved it!." Normally, excel allows you to directly identify when the cell is entirely written in all caps, but this situation is a bit more unique than that, given that I want to be able to see whether any word within the text is written in all caps. Any suggestions would be appreciated.

I googled possible solutions, and also ask ChatGPT3.5 and GPT4. Nothing that came out gave me an implementable suggestion (GPT was giving codes that were too long for excel to carry out, and the shorter ones it gave were plain wrong)

2

There are 2 best solutions below

4
On

Assuming you can put your data into a Table in Excel you can utilise PowerQuery to do this pretty easily, using some dummy data as shown below.

Select a cell in your table then go to Data => Get &Transform Data => From Table.Range

enter image description here

This will load up the Power Query editor and if you go to Add Column => Custom Column and enter the following it'll add a True/False column that indicates if there's any capital words for each row enter image description here

enter image description here

enter image description here

Then go to File => Close & Load To... enter image description here

and choose to load it to a Table on a new or existing sheet and it'll give you the data enter image description here enter image description here

The HasCaps code works by splitting the text column at every space " " and creating a list, it then checks if each item in that list is the same when you capitalise it, and then checks if there are any 'trues' in the list.

0
On

You can try the following, assuming your input data is in cell A1 and considering also additional conditions indicated in the comment section of the question: 1) Exclude words with length equal to 1, 2) Consider as word delimiter , too:

=LET(s,TEXTSPLIT(A1,{" ",","}), f, FILTER(s,(LEN(s) >1) 
 * EXACT(s, UPPER(s)),""), IF(@f="", FALSE, TRUE))

It can be encapsulated in a user LAMBDA function Z, so you can use it for array ranges.

=LET(A, A1:A4, Z, LAMBDA(x, LET(s,TEXTSPLIT(x,{" ",","}),
 f, FILTER(s,(LEN(s) >1) * EXACT(s, UPPER(s)),""), IF(@f="", FALSE, TRUE))),
 BYROW(A, LAMBDA(x, Z(x))))

Here is the output:

output

It identifies the word by or , delimiters, then uses the EXACT function to do case sensitive comparisons.

Keep in mind that if you have AMAZING01 it considers this case as valid too. If you want to exclude it, you need to define the list of allowed characters (lk) and check that only such characters can appear. For example for upper case letters exclamation sign and punctuation only, you can consider the following:

=LET(lk, VSTACK(CHAR(ROW(65:90)), "!",".", ";"), s,TEXTSPLIT(A1,{" ",","}), 
 f, FILTER(s,(LEN(s) >1) * EXACT(s, UPPER(s))
  * ISERR(TEXTSPLIT(s, lk,,1)),""), IF(@f="", FALSE, TRUE))

You can check here the list of ASCII characters and add the ones you consider in a similar way. You can use CHAR(33) instead of "!" for example. It makes sense to use the function in case you can find a range of valid consecutive values from the ASCII list of characters.

The expression: TEXTSPLIT(s, lk,,1) extracts from s name on each cell everything that is not a valid character and produce an error of type Empty Array (#CALC!) when all characters are valid and this the condition we are looking for.