How do I calculate the code-to-comment ratio of a C# project?

1.1k Views Asked by At

Note: I am not asking what the golden code-to-comment ratio is, nor am I attempting to affix a particular ratio to our team. Instead, we would like to improve how well our codebase is documented (we started with a "code should document itself" mentality), which can be accomplished either by removing dead code or by adding comments to live code, and we would like to measure how well we are going about doing that by measuring this ratio multiple times over the course of several months. Also note that I would like to actually measure the amount of comments we have, so something that gets LOC from the generated IL won't work.

How would I go about getting the code-to-comments ratio for a C# project? Would I need to write my own parsing script, or is there something in Roslyn I can leverage? Do any major IDEs carry this functionality directly? As a bonus, can I filter out "punctuation", such as extra whitespace, comment delimiters (// and /* */), and opening/closing curly brackets?

2

There are 2 best solutions below

0
On BEST ANSWER

Using Robert Harvey's regex, I managed to create a short C# method that calculates this metric from an input string. It goes character by character in order to properly account for lines that have both code and comments on them, and also excludes additional whitespace from the metric so that things like line indentations don't count.

To prevent catastrophic backtracking, I simplified the regex (I found you don't need the newline checks, since character exclude groups already take care of those) and also made the body of the block comment a non-backtracking group.

public static double CodeToCommentRatio(
    string text, 
    out int codeChars, 
    out int commentChars, 
    out int blankChars)
{
    // First, filter out excess whitespace, reporting the number of characters removed this way
    Regex lineStartRegex = new Regex(@"(^|\n)[ \t]+");
    Regex blanksRegex = new Regex(@"[ \t]+");
    string minWhitespaceText = blanksRegex.Replace(lineStartRegex.Replace(text, "\n"), " ");
    blankChars = text.Length - minWhitespaceText.Length;

    // Then, match all comments and report the number of characters in comments
    Regex commentsRegex = new Regex(@"(/\*(?>[^*]|(\*+[^*/]))*\*+/)|(//.*)");
    MatchCollection comments = commentsRegex.Matches(minWhitespaceText);
    commentChars = 0;
    foreach (Match comment in comments)
    {
        commentChars += comment.Length;
    }
    codeChars = minWhitespaceText.Length - commentChars;

    // Finally, return the ratio
    return (double)codeChars / commentChars;
}
2
On

You can identify the comment lines in your code using this regex:

(/\*([^*]|[\r\n]|(\*+([^*/]|[\r\n])))*\*+/)|(//.*)

Try plugging it into the "Find in Files" functionality in Visual Studio to see it in action.

https://regex101.com/r/GCrfzc/1