I'm looking for a supervised machine learning algorithm that would produce transparent rules or definitions that can be easily interpreted by a human.
Most algorithms that I work with (SVMs, random forests, PLS-DA) are not very transparent. That is, you can hardly summarize the models in a table in a publication aimed at a non-computer scientist audience. What authors usually do is, for example, publish a list of variables that are important based on some criterion (for example, Gini index or mean decrease of accuracy in the case of RF), and sometimes improve this list by indicating how these variables differ between the classes in question.
What I am looking is a relatively simple output of the style "if (any of the variables V1-V10 > median or any of the variables V11-V20 < 1st quartile) and variable V21-V30 > 3rd quartile, then class A".
Is there any such thing around?
Just to constraint my question a bit: I am working with highly multidimensional data sets (tens of thousands to hundreds of thousands of often colinear variables). So for example regression trees would not be a good idea (I think).
You sound like you are describing decision trees. Why would regression trees not be a good choice? Maybe not optimal, but they work, and those are the most directly interpretable models. Anything that works on continuous values works on ordinal values.
There's a tension between wanting an accurate classifier, and wanting a simple and explainable model. You could build a random decision forest model, and constrain it in several ways to make it more interpretable:
The model won't be as good, necessarily.