So started making progress on LuvvieScript and then it all kicked off a bit on Twitter... https://twitter.com/gordonguthrie/status/389659700741943296
Anthony Ramine https://twitter.com/nokusu made the point that I was doing it wrong and I should be compiling from Erlang to JavaScript via Core Erlang and not the Erlang AST. This is both an compelling yet unattractive option for me... Twitter not being the right medium for that discussion I thought I would write it up here and get some advice on that.
Strategic Overview
LuvvieScript has three core requirements:
- a valid subset of Erlang that compiles to same and performant Javascript
- a complete Source Map so that it can be debugged in the browser in LuvvieScript not Javascript
- a 'runtime' client-side javascript environment (with server-side comms) to execute LuvvieScript modules in (a sort of in-page supervisor...)
The third of these options is kinda out of scope for this debate but the first two are core.
There is a lazy-gits corollary - I want to use as many Erlang and Javascript syntax tools (lexers, parser, tokenizers, AST transforms, etc, etc, etc) as possible and write the smallest amount of code.
Current Thinking
The way the code is currently written as the following structure:
- compile the code to the Erlang AST (which has line numbers)
- tokenise the code (keeping comments and white space) and use those tokens to build a dictionary that maps line/column info to tokens
- merge the dictionary and AST to give a line/col AST (with some fannying about to group fns of different arities)
- transform this new Erlang AST to a Javascript AST as implmented in the SpiderMonkey Parser API https://developer.mozilla.org/en-US/docs/SpiderMonkey/Parser_API
- use Javascript utils like brushtail to mutate away tail calls in the Javascript AST https://github.com/puffnfresh/brushtail
- use Javascript utils like ESCodeGen to emit the javascript https://github.com/Constellation/escodegen
Basically I get an Erlang AST that looks something like this:
[{function,
{19,{1,9}},
atom1_fn,0,
[{clause,
{19,none},
[],
[[]],
[{match,
{20,none},
[{var,{20,{5,6}},'D'}],
[{atom,{20,{11,15}},blue}]},
{var,{21,{5,6}},'D'}]}]}]},
and I then transpose it into a Javascript JSON AST that looks like:
{
"type": "Program",
"body": [
{
"type": "VariableDeclaration",
"declarations": [
{
"type": "VariableDeclarator",
"id": {
"type": "Identifier",
"name": "answer",
"loc": {
"start": {
"line": 2,
"column": 4
},
"end": {
"line": 2,
"column": 10
}
}
},
"init": {
"type": "BinaryExpression",
"operator": "*",
"left": {
"type": "Literal",
"value": 6,
"raw": "6",
"loc": {
"start": {
"line": 2,
"column": 13
},
"end": {
"line": 2,
"column": 14
}
}
},
"right": {
"type": "Literal",
"value": 7,
"raw": "7",
"loc": {
"start": {
"line": 2,
"column": 17
},
"end": {
"line": 2,
"column": 18
}
}
},
"loc": {
"start": {
"line": 2,
"column": 13
},
"end": {
"line": 2,
"column": 18
}
}
},
"loc": {
"start": {
"line": 2,
"column": 4
},
"end": {
"line": 2,
"column": 18
}
}
}
],
"kind": "var",
"loc": {
"start": {
"line": 2,
"column": 0
},
"end": {
"line": 2,
"column": 19
}
}
}
],
"loc": {
"start": {
"line": 2,
"column": 0
},
"end": {
"line": 2,
"column": 19
}
}
}
El Problemo
Anthony's point is well made - Core Erlang is a simplified and more regular language than Erlang and should be more easily transpiled to Javascript than plain Erlang, but it is not very well documented.
I can get an AST like representation of Core Erlang easily enough:
{c_module,[],
{c_literal,[],basic_types},
[{c_var,[],{atom1_fn,0}},
{c_var,[],{atom2_fn,0}},
{c_var,[],{bish_fn,1}},
{c_var,[],{boolean_fn,0}},
{c_var,[],{float_fn,0}},
{c_var,[],{int_fn,0}},
{c_var,[],{module_info,0}},
{c_var,[],{module_info,1}},
{c_var,[],{string_fn,0}}],
[],
[{{c_var,[],{int_fn,0}},{c_fun,[],[],{c_literal,[],1}}},
{{c_var,[],{float_fn,0}},{c_fun,[],[],{c_literal,[],2.3}}},
{{c_var,[],{boolean_fn,0}},{c_fun,[],[],{c_literal,[],true}}},
{{c_var,[],{atom1_fn,0}},{c_fun,[],[],{c_literal,[],blue}}},
{{c_var,[],{atom2_fn,0}},{c_fun,[],[],{c_literal,[],'Blue 4 U'}}},
{{c_var,[],{string_fn,0}},{c_fun,[],[],{c_literal,[],"string theory"}}},
{{c_var,[],{bish_fn,1}},
{c_fun,[],
[{c_var,[],'_cor0'}],
{c_case,[],
{c_var,[],'_cor0'},
[{c_clause,[],
[{c_literal,[],bash}],
{c_literal,[],true},
{c_literal,[],berk}},
{c_clause,[],
[{c_literal,[],bosh}],
{c_literal,[],true},
{c_literal,[],bork}},
{c_clause,
[compiler_generated],
[{c_var,[],'_cor1'}],
{c_literal,[],true},
{c_primop,[],
{c_literal,[],match_fail},
[{c_tuple,[],
[{c_literal,[],case_clause},
{c_var,[],'_cor1'}]}]}}]}}},
{{c_var,[],{module_info,0}},
{c_fun,[],[],
{c_call,[],
{c_literal,[],erlang},
{c_literal,[],get_module_info},
[{c_literal,[],basic_types}]}}},
{{c_var,[],{module_info,1}},
{c_fun,[],
[{c_var,[],'_cor0'}],
{c_call,[],
{c_literal,[],erlang},
{c_literal,[],get_module_info},
[{c_literal,[],basic_types},{c_var,[],'_cor0'}]}}}]}
But no line col/nos. So I can get an AST that will generate JS - but critically not SourceMaps.
Question 1 How can I get the line information I need - (I can already get column information from the 'normal' Erlang tokens...)
Erlang Core is slightly different to normal Erlang in the production process because it starts substituting variable names in function calls for its own internal ones which will also cause some Source Map problems. An example would be this Erlang clause:
bish_fn(A) ->
case A of
bash -> berk;
bosh -> bork
end.
The Erlang AST preserves the names nicely:
[{function,
{31,{1,8}},
bish_fn,1,
[{clause,
{31,none},
[{var,{31,{11,12}},'A'}],
[[]],
[{'case',
{32,none},
[{var,{32,{11,12}},'A'}],
[{clause,
{33,none},
[{atom,{33,{9,13}},bash}],
[[]],
[{atom,{34,{13,17}},berk}]},
{clause,
{35,none},
[{atom,{35,{9,13}},bosh}],
[[]],
[{atom,{36,{13,17}},bork}]}]}]}]}]},
Core Erlang has already mutated away the names of the parameters called in the function:
'bish_fn'/1 =
%% Line 30
fun (_cor0) ->
%% Line 31
case _cor0 of
%% Line 32
<'bash'> when 'true' ->
'berk'
%% Line 33
<'bosh'> when 'true' ->
'bork'
( <_cor1> when 'true' ->
primop 'match_fail'
({'case_clause',_cor1})
-| ['compiler_generated'] )
end
Question 2 is there anything I can to to preserve or map variable names in Core Erlang?
Question 3 I appreciate that Core Erlang is explicity designed to make it easy to compile into Erlang and write tools that mutate Erlang Code - but the question really it will it make it easier to compile out of Erlang?
Options
I could fork the core erlang code and add a source mapping options but I play the Lazy Man card here...
Update
In response to Eric's response, I should clarify how I am generating the Core Erlang cerl records. I first compile my plain Erlang to core erlang using:
c(some_module, to_core)
Then I use core_scan
and core_parse
in this function nicked from compiler.erl
:
compile(File) ->
case file:read_file(File) of
{ok,Bin} ->
case core_scan:string(binary_to_list(Bin)) of
{ok,Toks,_} ->
case core_parse:parse(Toks) of
{ok, Mod} ->
{ok, Mod};
{error,E} ->
{error, {parse, E}}
end;
{error,E,_} ->
{error, {scan, E}}
end;
{error,E} ->
{error,{read, E}}
end.
The question is how do I/can I get that toolchain to emit an annotated AST. I suspect I would need to add those options myself :(
How do you get the Core Erlang? I have been using
where I get a nice structure with c_let c_variable etc and with nice line numbers. However, I noticed that it is not the same Core Erlang I get when I do c("",[to_core]). For example, I get a c_case per record access, and this is optimized away in the .core file generated by c("",[to_core]).
What is the recommended approach to get Core Erlang as an internal structure to be processed by Erlang.
I tried something other first, but then the line numbers were not set.