Matlab categorical table variables: Speed? Use in join keys?

Question

Matlab categorical table variables: Speed? Use in join keys?

286 Views Asked by user36800 At 29 June 2025 at 17:44

I've dipping my toe into Matlab's categorical variable pool in the context of Matlab tables. Actually, I may have wandered into that territory in the past, but if so, it would have been in a relatively superficial manner.

These days, I want to use Matlab code patterns to do what I normally would do in MS Access, e.g., various types of joins and filtering. Much of my data is categorical, and I've read up on the advantages of using categorical variables in tables. However, they mostly centre around descriptiveness (over enumerated types) and memory efficiency. I haven't run across mention of speed. Do categorical variables offer a speed advantage?

I also wonder how advisable it is to use categorical variables when doing various types of joins. The categorical variables will occupy different tables, so it's not clear to me how equivalence in values is established if such variables are involved in the SQL ON clause (which Matlab refers to as a keys parameter).

From the dearth of relevant Google hits, it almost seems like I'm in new territory, which to me would be a scary thing. Lack of documentation of best practices, and the resulting need for trial/error and reverse engineering, requires more time than I can devote, so I'll sadly revert back to using strings.

If anyone can point to online guidance information, I'd appreciate it.

Original Q&A

There are 1 best solutions below

**user36800** · Answer 1

A partial answer only....

The following test indicates that catgorized data behaves sensibly when used as join keys:

BigList = {'dog' 'cat' 'mouse' 'horse' 'rat'}'
SmallList = BigList( 1 : end-2 )

Nrows = 20;

% Create tables for innerjoin using strings

tBig = table( ...
    (1:Nrows)' , ...
    BigList( ceil( length(BigList) * rand( Nrows , 1 ) ) ) , ...
    'VariableNames' , {'B_ID' 'Animal'} )

tSmall = table( ...
    (1:Nrows)' , ...
    SmallList( ceil( length(SmallList) * rand( Nrows , 1 ) ) ) , ...
    'VariableNames' , {'S_ID' 'Animal'} )

tBigSmall = innerjoin( tBig , tSmall , 'Keys','Animal' );
tBig = sortrows( tBig , {'Animal','B_ID'} );
tSmall = sortrows( tSmall, {'Animal','S_ID'} );
tBigSmall = sortrows( tBigSmall, {'Animal' 'B_ID' 'S_ID'} );

% Now innerjoin the same tables using categorized strings

tcBig = tBig;
tcBig.cAnimal = categorical( tcBig.Animal );
tcBig.Animal = [];

tcSmall = tSmall;
tcSmall.cAnimal = categorical( tcSmall.Animal );
tcSmall.Animal = [];

tcBigSmall = innerjoin( tcBig , tcSmall , 'Keys','cAnimal' );
tcBig = sortrows( tcBig , {'cAnimal','B_ID'} );
tcSmall = sortrows( tcSmall, {'cAnimal','S_ID'} );
tcBigSmall = sortrows( tcBigSmall, {'cAnimal' 'B_ID' 'S_ID'} );

% Check if the join results are the same

if all( tBigSmall.Animal == tcBigSmall.cAnimal )
    disp('categorical vs string key: inner joins MATCH.')
else
    disp('categorical vs string key: inner joins DO NOT MATCH.')
end % if

So the only question now is about speed. This is a general question, not just for joins, so I'm not sure what would be a good test. There are many possibilities, e.g., number of table rows, number of categories, whether it's a join or a filtering, etc.

In any case, I believe that the answers to both question would be better documented.

Matlab categorical table variables: Speed? Use in join keys?

There are 1 best solutions below

Related Questions in MATLAB

Related Questions in PERFORMANCE

Related Questions in JOIN

Related Questions in CATEGORICAL-DATA

Related Questions in MATLAB-TABLE

Trending Questions

Popular # Hahtags

Popular Questions