Syntax error when trying to use dependent types in GF

143 Views Asked by At

I'm trying to play with dependent types, but I've gotten stuck on a syntax error that I can't seem to find the cause for. It's on my first lin definition in the concrete syntax.

For context, I've been reading Inari's embedded grammars tutorial, and I'm interested in making functions which I can call from the outside, but which won't actually be used in linearisation unless I do it manually from Haskell. Please let me know if you know a better way to achieve this.

My abstract syntax:

cat
    S;
    Pre_S Clude ;
    VP Clude ;
    NP Clude ;
    V2 ;
    N ;
    Clude ; 

fun
    Include, Exclude : Clude ;

    To_S : Pre_S Include -> S ;

    Pred   : (c : Clude) -> NP c -> VP c -> Pre_S c ;
    Compl  : (c : Clude) -> V2 -> NP c -> VP c ;

    -- I'd like to call this one from Haskell
    Scramble : (c : Clude) -> VP c -> VP Exclude ;

    To_NP : (c : Clude) -> N -> NP c ;

    Bob, Billy, John, Mary, Lisa : N ;
    Liked, Loved, Hated : V2 ;

Concrete:

lincat
    Pre_S, S, V2, NP, N = {s : Str} ;
    VP = {fst : Str ; snd : Str} ;

lin 
    To_S pre = {s = pre.s} ;
    Pred _ sub vp = {s = sub ++ vp.fst ++ vp.snd} ;
    Compl _ ver obj = {fst = ver ; snd = obj} ;

    Scramble _ vp = {fst = vp.snd ; snd = vp.fst } ;

    To_NP _ n = {s = n.s} ;

    Bob = {s = "Bob"} ; --and seven more like this

Is it because this isn't possible this way, or did I do something wrong that I'm just not managing to find?

1

There are 1 best solutions below

6
On

1. Fixing the syntax errors

None of the errors in your code is due to dependent types. The first issue, the actual syntax error is this:

To_S pre = {s = pre.s} ;

pre is a reserved word in GF, you can't used it as a variable name. You can write e.g.

To_S pr = {s = pr.s} ;

or shorter—since To_S just looks like a coercion function, you can do this:

To_S pr = pr ;

After fixing that, you get new error messages:

     Happened in linearization of Pred
       type of sub
      expected: Str
      inferred: {s : Str}
      
     Happened in linearization of Compl
       type of ver
      expected: Str
      inferred: {s : Str}
      
     Happened in linearization of Compl
       type of obj
      expected: Str
      inferred: {s : Str}

These are fixed as follows. You can't put a {s : Str} into a field that is supposed to contain a Str, so you need to access the sub.s, ver.s and obj.s.

    Pred _ sub vp = {s = sub.s ++ vp.fst ++ vp.snd} ;
    Compl _ ver obj = {fst = ver.s ; snd = obj.s} ;

Full grammar after fixing the syntax errors

This works for me.

lin
    To_S pr = pr ;
    Pred _ sub vp = {s = sub.s ++ vp.fst ++ vp.snd} ;
    Compl _ ver obj = {fst = ver.s ; snd = obj.s} ;

    Scramble _ vp = {fst = vp.snd ; snd = vp.fst } ;

    To_NP _ n = {s = n.s} ;

    Bob = {s = "Bob"} ;
    Loved = {s = "loved"} ; -- just add rest of the lexicon

    Include, Exclude = {s = ""} ; -- these need to have a linearisation, otherwise nothing works!

2. Tips and tricks on working with your grammar

With the lexicon of Bob and Loved, it generates exactly one tree. When I use the command gt (generate_trees), if I don't give a category flag, it automatically uses the start category, which is S.

> gt | l -treebank
To_S (Pred Include (To_NP Include Bob) (Compl Include Loved (To_NP Include Bob)))
Bob loved Bob

Parsing in S works too:

p "Bob loved Bob"
To_S (Pred Include (To_NP Include Bob) (Compl Include Loved (To_NP Include Bob)))

Parsing VPs: add a linref

With the current grammar, we can't parse any VPs:

> p -cat="VP ?" "Bob loved"
The parser failed at token 2: "loved"

> p -cat="VP ?" "loved Bob"
The parser failed at token 2: "Bob"

That's because the lincat of VP is discontinuous and doesn't have a single s field. But if you would like to parse also VPs, you can add a linref for the category of VP, like this:

  linref
    VP = \vp -> vp.fst ++ vp.snd ;

Now it works to parse even VPs, sort of:

Cl> p -cat="VP ?" "loved Bob"
Compl ?3 Loved (To_NP ?3 Bob)

These weird question marks are metavariables, and we'll fix it next.

Metavariables

If you've read my blog, you may have already seen this bit about metavariables: https://inariksit.github.io/gf/2018/08/28/gf-gotchas.html#metavariables-or-those-question-marks-that-appear-when-parsing

GF has a weird requirement that every argument needs to contribute with a string (even an empty string!), otherwise it isn’t recognised when parsing. This happens even if there is no ambiguity.

We have already linearised Include and Exclude with empty strings—they must have some linearisation in any case, otherwise the whole grammar doesn't work. So we need to add the Clude's empty string into all of the linearisations, otherwise GF parser is confused.

In all those linearisations where you just marked the Clude argument with underscore, we do this now. Doesn't matter which field we add it to, it's just an empty string and makes no effect in the output.

    Pred _clu sub vp = {s = sub.s ++ vp.fst ++ vp.snd ++ _clu.s} ;
    Compl _clu ver obj = {fst = ver.s ; snd = obj.s ++ _clu.s} ;
    Scramble _clu vp = {fst = vp.snd ; snd = vp.fst ++ _clu.s} ;
    To_NP _clu n = {s = n.s ++ _clu.s} ;

After this, parsing in VP works without question marks:

Cl> p -cat="VP ?" "loved Bob"
Compl Exclude Loved (To_NP Exclude Bob)
Compl Include Loved (To_NP Include Bob)

Cl> p -cat="VP ?" "Bob loved"
Scramble Exclude (Compl Exclude Loved (To_NP Exclude Bob))

I'm still getting metavariables????

So we fixed it, but why do I still get question marks when I gt and get a tree that uses Scramble?

> gt -cat="VP Exclude" 
Compl Exclude Loved (To_NP Exclude Bob)
Scramble ?3 (Compl ?3 Loved (To_NP ?3 Bob))
Scramble Exclude (Scramble ?4 (Compl ?4 Loved (To_NP ?4 Bob)))

> gt -cat="VP Include" 
Compl Include Loved (To_NP Include Bob)

That's because the Scramble operation truly suppresses its arguments. It's not just a case of GF compiler being stupid and refusing to cooperate even if it's obvious which argument there is, in this case there is no way to retrieve which argument it was: Scramble makes everything into a VP Exclude.

Full grammar after adding linref and including Clude's empty string

Just for the sake of completeness, here's all the changes.

lin
    To_S pr = pr ;

    Pred _clu sub vp = {s = sub.s ++ vp.fst ++ vp.snd ++ _clu.s} ;
    Compl _clu ver obj = {fst = ver.s ; snd = obj.s ++ _clu.s} ;
    Scramble _clu vp = {fst = vp.snd ; snd = vp.fst ++ _clu.s} ;
    To_NP _clu n = {s = n.s ++ _clu.s} ;

    Bob = {s = "Bob"} ;
    Loved = {s = "loved"} ;

    Include, Exclude = {s = ""} ;

  linref
    VP = \vp -> vp.fst ++ vp.snd ;

3. Alternative way of making this grammar without dependent types

If you would like to use your grammar from Python or the Haskell bindings for the C runtime, you are out of luck: the C runtime doesn't support dependent types. So here's a version of your grammar where we mimic the behaviour using parameters in the concrete syntax and the nonExist token (see https://inariksit.github.io/gf/2018/08/28/gf-gotchas.html#raise-an-exception).

I have kept the documentation minimal (because this answer is already so long ), but if you have any questions about some parts of this solution, just ask!

Abstract syntax

  cat
    S ; VP ; NP ; V2 ; N ;

  fun
    Pred   :  NP -> VP -> S ;
    Compl  :  V2 -> NP -> VP ;

    Scramble : VP -> VP ;

    To_NPIncl,
      ToNPExcl : N -> NP ;

    Bob, Billy, John, Mary, Lisa : N ;
    Liked, Loved, Hated : V2 ;

Concrete syntax

  param
    Clude = Include | Exclude ;
  lincat
    -- To mimic your categories VP Clude and NP Clude,
    -- we add a parameter in VP and NP
    VP = {fst, snd : Str ; clude : Clude} ;
    NP = {s : Str ; clude : Clude} ;

    -- The rest are like in your grammar
    S, V2, N = {s : Str} ;

  lin

    -- No need for Pre_S, we can match NP's and VP's Clude in Pred
    -- Only make a sentence if both's Clude is Include
    Pred np vp = case <np.clude, vp.clude> of {
      <Include,Include> => {s = np.s ++ vp.fst ++ vp.snd} ;
      _ => {s = Predef.nonExist}
      } ;

    -- As per your grammar, V2 has no inherent Clude, but NP does
    Compl v2 np = {
      fst = v2.s ;
      snd = np.s ;
      clude = np.clude ;
      } ;

    -- Scramble doesn't look at it's argument VP's clude, 
    -- just makes it into Exclude automatically.
    Scramble vp = {
      fst = vp.snd ;
      snd = vp.fst ;
      clude = Exclude ;
      } ;

    -- Your grammar has the function To_NP : (c : Clude) -> N -> NP c ;
    -- We translate it into two functions.
    To_NPIncl n = n ** {clude = Include} ;
    To_NPExcl n = n ** {clude = Exclude} ;

    -- Finally, lexicon.
    Bob = {s = "Bob"} ;
    Loved = {s = "loved"} ;

Now when we generate all trees in category S, there is one that uses Scramble, but it doesn't have a linearisation.

> gt | l -treebank
ClParam: Pred (To_NPIncl Bob) (Compl Loved (To_NPIncl Bob))
ClParamEng: Bob loved Bob
ClParam: Pred (To_NPIncl Bob) (Scramble (Compl Loved (To_NPIncl Bob)))
ClParamEng: 

Maybe a bit less elegant than your version, where the tree wasn't even generated, but this is just to demonstrate the different approaches. If you're working on Haskell and won't need to use C runtime, feel free to continue with your approach!