I have a bunch of MATLAB script/function files that I and the rest of my team need to work on. We have little to no idea what most of the files do, and little to no idea which ones belong together and which ones are separate. We do know we have a total of 36,000 lines. I'd like to know how many of those lines are comments.
Easy, right? Just count how many of them start with the comment start character %.
Well, no. I don't want to count blocks of code that have been commented out as "comments", since they don't actually tell me anything. And I'd prefer not to count "empty" lines used to make one comment line a "headline"
% %%%%%%%%
% headline
% %%%%%%%%
like so.
So how can I get a sensible estimate of how many lines of actual informative comments I have? Is there an easy way to distinguish natural language (possibly containing code snippets) from pure code?
Yes, I know code should be self-explanatory as far as is practical, but the code we have inherited clearly is not. Yes, I know we should probably refactor this mess. The purpose of figuring out how much comments we have is to highlight the technical debt we have here, so that we can allocate resources to this refactoring.
We can use the semi-documented
mtree
utility for this.Let's take for example the
.m
file that contains the definition of themtree
class itself.dbtype mtree
yields (this is just the beginning):Now, if we invoke the
mtree
utility on itself and show the result as text,here's what we get (again, just the beginning):
As you can see from the above, comment and empty lines (2-7) do not appear on the left side of the "fractions" in the output.. So if we find a way to get the "numerators", we'll get the numbers of the lines that contain actual code.
We're in luck, since there exists a method that gives us these numerators -
lineno
! So if we call it and applyunique
to the output, we'll get exactly one copy of each line:This yields a value of
269
fornCodeLines
in R2018b. If you're willing to assume that the last line in a file is always a line of code (and not a comment or a blank), you can just subtractnCodeLines
from the last element ofuLines
to get the amount of comment lines (121
in this case). Otherwise, use some other technique to count the total number of lines (example).All that's left is to write this as a function and feed the folder of
.m
files to it :)