How can I merge multiple annotations to the same text file?

331 Views Asked by At

Let's say I have a file that represents a "source document" with the following text:

Source/Original Document:

A quick brown fox jumped over the log

This source document has been annotated by different authors who each have highlighted different parts of the text. We can assume that no content from the original document has been deleted, nor has any new text (other than the annotation tokens) been added.

Modified/Annotated Document #1:

A quick <annotation>brown fox</annotation> jumped over the log

Modified/Annotated Document #2:

A quick brown fox <annotation>jumped over</annotation> the log

Modified/Annotated Document #2:

A <annotation>quick</annotation> brown fox jumped over the log

My problem: I need to automatically merge these different annotations into the original text and produce a single document.

Merged Final Document:

A <annotation>quick</annotation> <annotation>brown fox</annotation> <annotation>jumped over</annotation> the log

I have tried the following approaches and consistently failed to accomplish my objective:

Diff/Patch

If I attempt to diff the different annotated documents, the resulting patches will each simply overwrite the previously applied one.

Calculating diffs between each patch and the original text does seem to produce a workable end product for a small number of patches. The typical use-case I am dealing with, however, can include dozens of annotations with a document. The dozens of patches these annotations inevitably produce merge failures. I have not determined an exact cause for the failure, but my best guess is the positioning calculated for the unified diffs are based on the original, unmodified document. Once a number of patches are applied, subsequent patches are no longer dealing with a destination content that can be addressed with the original positioning.

I attempted to use Neil Fraser's diff_match_patch library to accrue these patches in the hope that the algorithm in patch_make would recalculate the diffs (magically). I have also attempted to use git tools (like git merge-file), but failed for likely the same reasons above.

Quilt

Quilt sounds like it ought to be the exact solution to my problem because it allows for "stacking patches". But I have struggled to get this to work at all. I have tried:

quilt new multiPatch
quilt import modifiedFile1.patch
quilt import modifiedFile2.patch
quilt add originalText.txt
quilt refresh

Nothing in patch patches/patchMeUp

I am not entirely clear whether I can use the patches produced by the diff and patch programs. I am also not clear whether quilt is expecting the patches produced, or the diffs. The examples I have found assume patching a source tree or multiple patches modifying multiple files.

My questions:

  • Is there a simple tool or (even better) an api or library to accomplish what I am looking for?
  • Have I done something wrong with my above approaches that I can fix to get my desired outcome?

Thank you!

0

There are 0 best solutions below