How to Translate C# Expression to Custom Nested (recursive) Class structure?

365 Views Asked by At

I've custom Query Class which will be used to build query with help of lambda expression like below

var query = new Query("Person").Where<Person>(p =>
(p.Name == "Maulik" && (p.Id == 2 || p.Age == 30)) || (p.Id == 2 && p.Name == "Maulik"));

By Implementing ExpressionVisitor based on this reference link class I'm able to get translated query into string format as below

(((Name EqualTo "Maulik") AND ((Id EqualTo 2) OR (Age EqualTo 30))) OR ((Id EqualTo 2) AND (Name EqualTo 'Maulik')))

But, I want to translate above expression to my custom nested classes

public class Expression
{
    public List<Expression> Filters { get; } // Nested expression
    public Operator Operator { get; set; } //e.g AND/OR
    public List<Condition> Conditions { get; } // One Expression can have many conditions
}

public class Condition
{
    public string Name { get; set; }
    public Operator Operator { get; set; } //e.g <,> ==, != etc
    public object Value { get; set; }
}

Where Expression means: (A > 5 && A < 10) and Condition means inner most criteria A > 5

By Nested I mean, where one expression can have child expression, which can also have sub child expression and so on, and for each expression there can be multiple conditions.

Reason to convert into this kind of structure is to handle various SQL/NoSQL provider, from this kind of class structure will be creating separate queries for all different providers, so this class structure can't be changed entirely.

This class structure is created to maintain parentheses sequence based on AND and OR condition of query, so accordingly same level condition can be club with single expression.

I'm looking for any kind of generic mapping which can help here to translate expression to nested class structure.

What will be best way of converting expression to custom nested class structure? Please share if any implementation example available. I couldn't find any relevant example yet.

1

There are 1 best solutions below

2
On

I think that your classes do not reflect the structure of expressions right. Let me use EBNF to write down a possible syntax of such expressions (without claim of completeness):

Expression = Term { ("AND" | "OR") Term }.
Term = "(" Expression ")" | Comparison.
Comparison = name ("=" | "!=" | "<" ...) Value.
Value = stringConstant | number.

One important thing we see, is that a Term can be either an Expression or a Comparison. This means that we must be able to plug in the one or the other to both sides of an AND or OR. Example: Name = "Maulik" AND (Id = 2 OR Id = 3). To the left of the AND we have a comparison, to the right an expression.

Therefore, we must make these two things assignment compatible, e.g., by deriving them from the same base class.

public abstract class Term
{
}

public Expression : Term
{
    public Term FirstTerm { get; set; }
    public List<(BooleanOperator operator, Term term)> SucceedingTerms { get; } = new();
}

public Comparison : Term
{
    public string Name { get; set; }
    public ComparisonOperator Operator { get; set; }
    public object Value { get; set; }
}

Another important fact is that the syntax is recursive. I.e., an expression contains terms and terms can contain another expression. This is necessary to be able to represent parenthesized expressions with multiple levels of nesting.

This recursive nature of the syntax has two consequences:

  1. The class structure must allow recursive structures. This is given by the fact that the Expression class has a list containing Terms and those can be Expressions again.
  2. The parser shown later must be recursive. It contains indirect recursive calls (ParseExpression calls ParseTerm and ParseTerm calls ParseExpression).

Note that I am using ValueTulpes as list element consisting of an operator and a term.


Now, to the conversion itself. You need a compiler. Either

  • Build your own custom compiler by hand. Niklaus Wirth's book Compiler Construction (2 PDFs) is an introduction to the theory and the techniques of compiler construction. It gives you a general idea of what a compiler is and what it does.
  • Use a tool or library to generate a compiler: for example ANTLR.

Since the syntax is simple, you might dare to write the compiler yourself.

Divide the task into lexical analysis (done by a lexer) and syntax analysis (done by a parser). Besides analyzing the syntax, the parser also creates the required data structure as output. The lexer just returns a stream of symbols. E.g.

enum Symbol { EndOfInput, LeftPar, RightPar, AndOperator, OrOperator, Identifier, Number,
              StringLiteral, ...}
private string _input;
private int _currentPos;
private string _identifier;
private string _stringValue;
private decimal _number;
private Symbol _symbol;

/// Does the lexical analysis.
private void GetSymbol()
{
    // Get next `_symbol` (and associated variables) from `_input` at `_currentPos` and
    // update `_currentPos`.
}

Using this basic infrastructure, you can create the parser. The parser consists of methods closely reflecting the structure of the syntax (as given in EBNF above).

// Expression = Term { ("AND" | "OR") Term }.
void Expression ParseExpression()
{
    var expression = new Expression();
    expression.FirstTerm = ParseTerm();
    while (_symbol == Symbol.AndOperator || _symbol == Symbol.OrOperator) {
        var op = _symbol == Symbol.AndOperator
            ? BooleanOperator.And
            : BooleanOperator.Or;
        GetSymbol();
        term = ParseTerm();
        expression.SucceedingTerms.Add((op, term));
    }
    return expression;
}

// Term = "(" Expression ")" | Comparison.
void Term ParseTerm()
{
    Term term = null;
    if (Symbol == Symbol.LeftPar) {
        GetSymbol();
        term = ParseExpression();
        if (Symbol == Symbol.RightPar) {
            GetSymbol();
        } else {
            Error("\")\" expected.");
        }
    } else {
        term = ParseComparison();
    }
    return term;
}

This is not complete, but you get the idea. Since the structure of this parser depends on the syntax production (EBNF), do not start with it before you have nailed down the exact syntax.

Pay attention to always have decidable paths in the syntax given by the next symbol. First, I had it wrong. My term was given by Term = "(" Expression | Comparison ")" | Comparison.. The problem here is that an expression starts with a term and a term can be a comparison. The comparison starts with a name. Therefore, both Expression and Comparison can start with a name. When parsing Expression | Comparison and the next symbol is a name, we are unable to decide whether we must parse an expression or a comparison.

The updated syntax is Term = "(" Expression ")" | Comparison.. Now, we know that if the next symbol is a left parenthesis, we must parse an expression and otherwise a comparison.