I am working on the Google ProtoBuff where I am trying to parse the proto file using SimpleParse in python.
I am using EBNF format with SimpleParse, it shows success but there is nothing in the result Tree, not sure what is going wrong. Any help would really be appreciated.
Following is the grammar file:
proto ::= ( message / extend / enum / import / package / option / ';' )*
import ::= 'import' , strLit , ';'
package ::= 'package' , ident , ( '.' , ident )* , ';'
option ::= 'option' , optionBody , ';'
optionBody ::= ident , ( '.' , ident )* , '=' , constant
message ::= 'message' , ident , messageBody
extend ::= 'extend' , userType , '{' , ( field / group / ';' )* , '}'
enum ::= 'enum' , ident , '{' , ( option / enumField / ';' )* , '}'
enumField ::= ident , '=' , intLit , ';'
service ::= 'service' , ident , '{' , ( option / rpc / ';' )* , '}'
rpc ::= 'rpc' , ident , '(' , userType , ')' , 'returns' , '(' , userType , ')' , ';'
messageBody ::= '{' , ( field / enum / message / extend / extensions / group / option / ':' )* , '}'
group ::= label , 'group' , camelIdent , '=' , intLit , messageBody
field ::= label , type , ident , '=' , intLit , ( '[' , fieldOption , ( ',' , fieldOption )* , ']' )? , ';'
fieldOption ::= optionBody / 'default' , '=' , constant
extensions ::= 'extensions' , extension , ( ',' , extension )* , ';'
extension ::= intLit , ( 'to' , ( intLit / 'max' ) )?
label ::= 'required' / 'optional' / 'repeated'
type ::= 'double' / 'float' / 'int32' / 'int64' / 'uint32' / 'uint64' / 'sint32' / 'sint64' / 'fixed32' / 'fixed64' / 'sfixed32' / 'sfixed64' / 'bool' / 'string' / 'bytes' / userType
userType ::= '.'? , ident , ( '.' , ident )*
constant ::= ident / intLit / floatLit / strLit / boolLit
ident ::= [A-Za-z_],[A-Za-z0-9_]*
camelIdent ::= [A-Z],[\w_]*
intLit ::= decInt / hexInt / octInt
decInt ::= [1-9],[\d]*
hexInt ::= [0],[xX],[A-Fa-f0-9]+
octInt ::= [0],[0-7]+
floatLit ::= [\d]+ , [\.\d+]?
boolLit ::= 'true' / 'false'
strLit ::= quote ,( hexEscape / octEscape / charEscape / [^\0\n] )* , quote
quote ::= ['']
hexEscape ::= [\\],[Xx],[A-Fa-f0-9]
octEscape ::= [\\0]? ,[0-7]
charEscape ::= [\\],[abfnrtv\\\?'']
And this is the python code that I am experimenting with:
from simpleparse.parser import Parser
from pprint import pprint
protoGrammar = ""
protoInput = ""
protoGrammarRoot = "proto"
with open ("proto_grammar.ebnf", "r") as grammarFile:
protoGrammar=grammarFile.read()
with open("sample.proto", "r") as protoFile:
protoInput = protoFile.read().replace('\n', '')
parser = Parser(protoGrammar,protoGrammarRoot)
success, resultTree, newCharacter = parser.parse(protoInput)
pprint(protoInput)
pprint(success)
pprint(resultTree)
pprint(newCharacter)
and this the proto file that I am trying to parse
message AmbiguousMsg {
optional string mypack_ambiguous_msg = 1;
optional string mypack_ambiguous_msg1 = 1;
}
I get the output as
1
[]
0
I am new to Python but I came up with this, although I am not entirely sure of your output format. Hopefully this will point you in the right direction. Feel free to modify the code below to cater your requirements.
My output looks as follows
NOTE After
print pattern
you can tokenize further if necessary by takingpattern
as in input. I have commented one example with regex.