Lexer/filter for comments

388 Views Asked by At

Is there an OCaml tool that allows filtering comments in source files, similar to gcc -E?

Ideally, I'm looking for something that will remove everything but comments, but the other way around would also be useful.

For instance, if there is a way to use camlp4/campl5/ppx to obtain OCaml comments (including non-OCamldoc comments defined with a single asterisk), I would like to know. I haven't had much success looking for comment nodes in Camlp4's AST (though I know it must exist, because there are even bugs related to the fact that Camlp4 modifies their placement).

Here's an example: in the following file:

(*** three asterisks *)
let f () =
  Format.printf "end"

let () =
  (* one asterisk (* nested comment *) *)
  Printf.printf "hello world\n";
  (** two asterisks *)
  f();
  ()

I'd like to ideally obtain:

(*** three asterisks *)
(* one asterisk (* nested comment *) *)
(** two asterisks *)

The whitespace between them and the presence or absence of (* *) are mostly irrelevant, but it should preserve comments of all kinds. My immediate purpose is to be able to filter it to a spell checker, but cleaning comments (i.e. having a filter that strips comments only) could also be useful: I could clean the comments and then use diff to obtain what has been removed.

3

There are 3 best solutions below

0
On BEST ANSWER

Well, there is now a lexer based on ocamlwc that strips everything but the comments in the code, called ocaml-comment-sieve. It is based on the simple lexer used in ocamlwc.

However, this tool is GPL-licensed (because it is derived from ocamlwc, which is GPL-licensed), so it cannot be posted here. Still, it does satisfy my requirements, so until someone suggests a better way, I'll consider it as an answer.

2
On

I have made some interesting experiments with camlp5, playing along with the idea of pretty-printing "" for any code item. The following code:

let ignore _ _ _ = ""

let rule f = Extfun.(extend f [Evar (),false, fun _ -> Some ignore])

let () =
  Eprinter.extend Pcaml.pr_str_item None [ None, rule ];
  Eprinter.extend Pcaml.pr_sig_item None [ None, rule ]

will disable the pretty printing of any str_item (i.e. toplevel items of module implementation) or sig_item (toplevel items of module interfaces), by extending the corresponding default printer with a catch-all rule that output an empty string for any str_item. Compile pr_comment.ml with

ocamlfind ocamlc -c -package camlp5 pr_comment.ml

and use it as

camlp5o pr_o.cmo path/to/pr_comment.cmo -o only_comment.ml my_file.ml
1
On

You can use ocamldoc with a custom generator that will dump comments using the textual representation.