Aim: The aim is to extract different clusters of farm management types for farms that share similar pre-conditions, such as environmental factors or housing systems. By analyzing the heterogeneity within these homogeneous groups, more practical relevant features (differences between features) can be extracted.

Data: The dataset compresses 5k farms, each described by 200 features. These features provide information location environmental conditions as well as on housing, feeding and other management properties.

Methodology: In the current methodological procedure, i combine a knowledge-based clustering with an unsupervised dimension reduction clustering approach, such as: umap + dpscan). I pre-filter based on known key features, like location, because I know otherwise the unsupervised clustering procedure results mainly inform me that the location is one of the most relevant distinguishing features. But I aim to investigate feature combinations, rather than environmental conditions. However, I am not sure if this is the best approach.

MY QUESTION IS: Is it a good idea to combine knowledge-based pre-filtering with an unsupervised dimension reduction clustering approach?

0

There are 0 best solutions below