Label:- XMLContent De-duplication

57 Views Asked by user3211037 At 21 October 2025 at 05:23

Question 1---> Currently i am working on a project where in we translate the English content to other 17 languages. To reduce the translation cost currently we are using MD5 hashcode and based on the results we decide whether the topic is New(Master) or Translated Earlier(Obselete). But the logic is so much complicated and we want to reduce the complexity by some level. Also currently we are using content management system as Filenet and is way too older..:) Basically i need best suggestion for Content de-duplication apart from the MD5 hashing

Note :- Topic means an XML file with images and is rendered via XSLT and is not a DITA standard.

Question 2--->

What is best alternative to render the non-standard XML file or not a DITA standard XML file on UI like an HTMl or PDF.?

Thanks in adavance ...Waiting for best suggestions.

Original Q&A

There are 1 best solutions below

Stefan Jung On 22 July 2015 at 05:56

Question 1

I recommend to not rely on hashes or time stamps, but that depends on your environment. If you refactor variables, change indentation add/remove comments, etc. what does not change the content and should not trigger a translation process, you may than rely on metadata to trigger a semi-automatic process. Further on, you could use a diffing mechanism to compare the current version of a document to an earlier one.

Question 2

As the first question, this one is hard to answer without knowing your environment, too. Probably it is smarter to firstly convert your files to DITA or Markdown and than use the DITA-OT or a Markdown processor for further transformation.

Label:- XMLContent De-duplication

There are 1 best solutions below

Related Questions in XML

Related Questions in TRANSLATION

Related Questions in DEDUPLICATION

Related Questions in DITA

Related Questions in CONTENT-REPOSITORY

Trending Questions

Popular # Hahtags

Popular Questions