Computing similarity matrix with mixed data

255 Views Asked by Martin Nemeth At 27 July 2025 at 13:01

I have asked this question also on "Cross Validated" forum, but with no answer so far, so I am trying also here:

I would like to compute similarity matrix (which I will further use for clustering purposes) from my data (failure data from automotive company). The data consist of these variables:

START DATE + TIME (dd/mm/yyyy hh/mm/ss), DURATION (in seconds), DAY OF THE WEEK (mon,tue,...), WORKING TEAM (1,2,3), LOCALIZATION (1,2,3,...,20), FAILURE TYPE

From this, it is clear, that there are continuous and categorical data. What method would you suggest to calculate similarities between failure types? I think I can not use Euclidean distance, or Gowe's similarity. Thank you in advance.

Original Q&A

There are 1 best solutions below

Malcolm McLean On 07 January 2017 at 19:09

No, you need an ad hoc function that represents your knowledge about what the data means in the real world. Presumably it will be mainly applying a weight to a continuous difference, and a 2D simple matrix for the discrete categorical variables. But don't rule our censoring of extreme values or fuzzification.

Computing similarity matrix with mixed data

There are 1 best solutions below

Related Questions in CLUSTER-ANALYSIS

Related Questions in DATA-MINING

Related Questions in SIMILARITY

Related Questions in CATEGORICAL-DATA

Trending Questions

Popular # Hahtags

Popular Questions