diff text documents but ignore single character differences? Set a minimum edit distance filter?

526 Views Asked by At

I have two versions of a large book in txt format and I'd like to compare them to find significant changes between the versions, ignoring small single character differences.

There are lots of diffing tools that can ignore whitespace differences, but I also want to ignore small typos and single or couple character differences. For example, one version of the book has a repeated misspelling of leige hundreds of times and this is corrected in the next version to liege. Some proper nouns have also changed their spelling. (I could make custom workarounds for each misspelling, but would like something more general purpose)

Since I only care about more significant multi-word differences want I really want is to set a filter that ignores changes for a line unless the Levenshtein edit distance is above some threshold.

Looking around all the diff/comparisons tools I find seem to have code in mind so they lack any feature around ignoring small text changes. Google's diff_match_patch library is great for diffing plaintext and ignoring whitespace changes (demo here) but doesn't seem to have an out of the box way to ignore single character non-whitespace differences.

tl;dr; Are there any diff tools that can compare text documents but filter out minor single character non-whitespace differences?

1

There are 1 best solutions below

0
On

In Beyond compare you can define "replacements".

An example: Differences are marked red: Differences are marked red

Then you can go to Session->Session Settings and set a replacement: Define a replacement

Or even easier: Mark the text and define the replacement immediate: enter image description here enter image description here

Now the difference is unimportant and marked blue: enter image description here

With one click you can ignore the unimportant differences (red arrow in the screenshot).

Technical remark: I use BC4 with the pro edition.