We are using CPD tool for code duplication detection. CPD tool includes whitespaces and comments. Could you please let us know how we can avoide white spaces, comments so that correct cases of duplicity can come? Suppose we have 4 lines of duplicate code and 4 lines of comments then it returns 8 lines instead of 4.
How to avoid Whitespaces and comments in code duplication CPD tool
272 Views Asked by user3206644 At
1
There are 1 best solutions below
Related Questions in CPD
- Add additional fields to Linq group by
- couldn't copy pdb file to another directory while consuming wcf web service
- Why are the aliases for string and object in lowercase?
- WPF MessageBox Cancel checkbox check
- Resolve object using DI container with object instance
- Creating a parametrized field name for a SELECT clause
- Does compiler optimize operation on const variable and literal const number?
- Get data from one form to another form in C#
- Writing/Overwriting to specific XML file from ASP.NET code behind
- Deleting Orphans with Fluent NHibernate
Trending Questions
- UIImageView Frame Doesn't Reflect Constraints
- Is it possible to use adb commands to click on a view by finding its ID?
- How to create a new web character symbol recognizable by html/javascript?
- Why isn't my CSS3 animation smooth in Google Chrome (but very smooth on other browsers)?
- Heap Gives Page Fault
- Connect ffmpeg to Visual Studio 2008
- Both Object- and ValueAnimator jumps when Duration is set above API LvL 24
- How to avoid default initialization of objects in std::vector?
- second argument of the command line arguments in a format other than char** argv or char* argv[]
- How to improve efficiency of algorithm which generates next lexicographic permutation?
- Navigating to the another actvity app getting crash in android
- How to read the particular message format in android and store in sqlite database?
- Resetting inventory status after order is cancelled
- Efficiently compute powers of X in SSE/AVX
- Insert into an external database using ajax and php : POST 500 (Internal Server Error)
Popular # Hahtags
Popular Questions
- How do I undo the most recent local commits in Git?
- How can I remove a specific item from an array in JavaScript?
- How do I delete a Git branch locally and remotely?
- Find all files containing a specific text (string) on Linux?
- How do I revert a Git repository to a previous commit?
- How do I create an HTML button that acts like a link?
- How do I check out a remote Git branch?
- How do I force "git pull" to overwrite local files?
- How do I list all files of a directory?
- How to check whether a string contains a substring in JavaScript?
- How do I redirect to another webpage?
- How can I iterate over rows in a Pandas DataFrame?
- How do I convert a String to an int in Java?
- Does Python have a string 'contains' substring method?
- How do I check if a string contains a specific word?
Which specific (copy-paste detector) CPD tool? There are many.
How a CPD detects duplicates depends on the primitive entities it compares. (I've built clone detectors).
Some only operate on source lines; these pretty much cannot distingish white space and comments from the programming language that you think you gave the tool. To it, your code is just raw text. Nor can these tools discover that "code block A is duplicate of code B with with regular changes (e.g., parameters)" which is what you really want know. (I think this kind of CPD give terrible answers, thus your question, but they have the advantage that they work on everything).
Some operate on language tokens, for the language(s) they happen to know. These tools tend to be pretty good about ignoring whitespace. Since they know comments are certain kinds of tokens, they typically can ignore comments, too, with some kind of command line switch. (Thus, "Which CPD tool?"). But they don't understand language structure, and thus think that the sequence
is a clone of every other such sequence. Frankly, that's a stupid clone. Secondly, such token-based detectors can only detect parameters (places where the clones vary systematically) one token wide, typically replacement of just an identifier or a constant by another constant or identifier. Still this is a big step up in usability from the line-oriented CPD tools.
Some very few operate on language structure, e.g., use the grammar of the language to control matching (I happen to make one of these, CloneDR, see my bio). These can't make the mistake of the token-based CPD tools, so you get better detected clones Further, they can detect parameters consisting of (structured) sequences of tokens, e.g, when an expression has replaced an identifier, etc. IMHO (oops, opinion!) these give much better detected clones (which is why I build CloneDR).