how to distinguish tokens which have similar patterns in Lexer, but they occur in different contexts in the parser

Question

how to distinguish tokens which have similar patterns in Lexer, but they occur in different contexts in the parser

165 Views Asked by Max Osad At 29 December 2021 at 23:18

I have two pretty similar patterns in Lexer.x first for numbers second for byte. Here they are.

$digit=0-9
$byte=[a-f0-9]


    $digit+                       { \s -> TNum  (readRational s) }
    $digit+.$digit+               { \s -> TNum  (readRational s) }
    $digit+.$digit+e$digit+       { \s -> TNum  (readRational s) }
    $digit+e$digit+               { \s -> TNum  (readRational s) }
    $byte$byte                        { \s -> TByte (encodeUtf8(pack s))     }

I have Parser.y

%token

        cnst                            { TNum  $$}
        byte                            { TByte  $$}
        '['                            { TOSB     }    
        ']'                            { TCSB     }

%%

Expr: 
 '[' byte ']' {$1}
| const {$1}

when I write, I got.

[ 11 ] parse error
11 ok

but when I put byte pattern in Lexer before numbers

$digit=0-9
$byte=[a-f0-9]

    $byte$byte                        { \s -> TByte (encodeUtf8(pack s))     }
    $digit+                       { \s -> TNum  (readRational s) }
    $digit+.$digit+               { \s -> TNum  (readRational s) }
    $digit+.$digit+e$digit+       { \s -> TNum  (readRational s) }
    $digit+e$digit+               { \s -> TNum  (readRational s) }

I got

[ 11 ] ok
11 parse error

I think that happens because Lexer makes tokens from string and then gives them to parser. And when parser wait for byte token it got number token and parser don't have opportunity to make from this value another token. What I should do in this situation?

Original Q&A

There are 1 best solutions below

**willeM_ Van Onsem** · Accepted Answer · 2021-12-30T13:16:14.660000

In that case you should postpone parsing. You can for example make a TNumByte data constructor that stores the value as String:

Token
    = TByte ByteString
    | TNum Rational
    | TNumByte String
    -- …

For a sequence of $digits, it is not yet clear if we have to interpret this as byte or number, so we construct a TNumByte for this:

$digit=0-9
$byte=[a-f0-9]

$digit$digit                  { TNumByte }
$byte$byte                    { \s -> TByte (encodeUtf8(pack s)) }
$digit+                       { \s -> TNum  (readRational s) }
$digit+.$digit+               { \s -> TNum  (readRational s) }
$digit+.$digit+e$digit+       { \s -> TNum  (readRational s) }
$digit+e$digit+               { \s -> TNum  (readRational s) }

then in the parser we can decide based on the context:

%token

  cnst                           { TNum $$ }
  byte                           { TByte $$ }
  numbyte                        { TNumByte $$ }  -- 🖘 can be int or byte
  '['                            { TOSB }
  ']'                            { TCSB }

%%

Expr
  : '[' byte ']' { $2 }
  | '[' numbyte ']' { encodeUtf8(pack $2) }  -- 🖘 interpret as byte
  | const { $1 }
  | numbyte { readRational $1 }  -- 🖘 interpret as int
  ;

how to distinguish tokens which have similar patterns in Lexer, but they occur in different contexts in the parser

There are 1 best solutions below

Related Questions in PARSING

Related Questions in HASKELL

Related Questions in LEXER

Related Questions in HAPPY

Trending Questions

Popular # Hahtags

Popular Questions