Finding duplicate expressions/parameters

128 Views Asked by At

I have structure as below

Parameter -> Condition -> Rule

Let say i need to create a Business rule, Customer Age > 18

I have two parameters, Customer Age (P1) and 18(P2), where P1 is Field Parameter (Ognl) and P2 is constant Parameter with value 18.

So my Condition now is , Customer Age > 18 and so as my Rule.

Problem Statement : Avoid user from creating duplicate parameter/condition and rules.

Solution : Constant Parameters, Field Parameters etc i can check in DB and compare if already present.

Now condition for me,

Customer Age > 18 and 18 < Customer Age is same in business terms.

The above cases can be more complex.

(a + b) * (c + d) is same as (b + a) * (d + c)

I need to validate the above expressions.

First Approach - Load all expression from DB (Can be 10000's) and compare using Stack/Tree Structure, which will really kill my objective.

Second Approach - I was thinking of building power full, let say hashcode generator or we can say one int value against every expression (considering operators/brackets also). this value should be generated in such a way that it validates above expression.

Means a + b and b + a should generate same int value, and a - b and b - a should generate different.

3

There are 3 best solutions below

0
On

For a 100% safe solution you should analyze the expressions with a computer algebra system to see whether there are mathemiatically equal. But that's not so easy.

A pragmatic approach that can be to test whether two expressions are similar:

  • Check whether they have the same variables
  • Compare their outputs for a number of different inputs, see if the outputs are equal

You can store the variable list and outputs for a predefined set of inputs as a "hash" for the expression. This hash does not give a guarentee that two expresions are equal, but you could present expressions with the same hash to the user asking if this new rule is equal to one of these similar ones.

4
On

Maybe a simplified version of your first approach: What about filtering only the relevant expressions by looking for similar content as you are about to insert into the database?

If you know that you are about to insert Customer Age you can find all expressions containing this parameter and build the stack/tree based on this reduced set of expressions.

7
On

I think that you cannot avoid writing a parser of expressions, building an AST of the expressions and code rewrite rules to detect expressions equivalence.

It may not be as time consuming as you think.

For the parsing and AST building part, you can start from exp4j: http://www.objecthunter.net/exp4j/

For the rewrite rules, you can have a look at: Strategies for simplifying math expressions