google translate misses up the coding of my file

378 Views Asked by At

i am trying to use google translate for localization of an XML file, it has near 350K lines, but some of them contain coding for in-game font size and color, like so:

<replacement>&lt;p horizontalalignment="center"&gt;&lt;br/&gt;&lt;image enablescale="false" imagesetpath="00015590.InterD_Jeryoung_3"/&gt;&lt;br/&gt;&lt;image enablescale="true" imagesetpath="00015590.Tag_Dungeon_Six_Superior" scalerate="1.5"/&gt;&lt;image enablescale="true" imagesetpath="00015590.Tag_Dungeon_Four_Superior" scalerate="1.5"/&gt;&lt;br/&gt;&lt;image enablescale="true" imagesetpath="00009499.Field_Boss" scalerate="1.4"/&gt;Хмельной лик&lt;br/&gt;&lt;br/&gt;&lt;/p&gt;Уничтожить зараженных насекомых&lt;br/&gt;возле мест обитания их королевы。&lt;br/&gt;</replacement>

now for god knows what reason, google translate alters that code in the process of translation into some unacceptable coding, like so:

<replacement> <p horizontalalignment="center"> <br/> <image enablescale="false" imagesetpath="00015590.InterD_Jeryoung_3"/> <br/> <image enablescale = "true "imagesetpath =" 00015590.Tag_Dungeon_Six_Superior "scalerate =" 1.5 "/> <image enablescale="true" imagesetpath="00015590.Tag_Dungeon_Four_Superior" scalerate="1.5"/> <br/> <image enablescale = "true" imagesetpath = "00009499.Field_Boss" scalerate = "1.4" /> Intoxicated face <br/> <br/> </ p> Destroy infected insects <br/> habitats near their queen. <br/> </ replacement>

is there any way to avoid that, why is it happening exactly? anyhelp is appreciated on that matter,thanks

EDIT : i am also looking for a way to input my text and have it out in the same exact language with only the coding mishaps changing, so i can isolate those,build a comparison table and then use that to fix the errors after the actual translation is done, but i don't see a way for selecting the same language as input AND output in google translate, it always forces me choose a different one in input or output, kind of makes sense but if there is a way to do that, i might be able to work around it..

1

There are 1 best solutions below

7
On BEST ANSWER

Do not feed Google translate with your Xml file, as far as I know it doesn't understand Xml.

Extract the text from the Xml file.

Feed the text to translate.

Transform the text back to Xml.

You could simply transform the Xml to a text document with a single line per Xml element so it would be easier to turn it back into Xml.

More detail

According to the Toolkit you can upload:

HTML (.HTML)
Microsoft Word (.DOC/.DOCX)
OpenDocument Text (.ODT)
Plain Text (.TXT)
Rich Text (.RTF)
Wikipedia URLs

And a couple of extras such as JSON. So no Xml.

The best way I see is to transform your Xml document into one of these types (I would probably use JSON) and transform it is such a way that it can easily be transformed back again by using either position (1 line in the text file is the first element in the Xml document) or by an id (add the Id or position of the element in the xml hierarchy to the JSON element)

My guess is that the toolkit recognizes the html tags in the xml and escapes them. So another option might be to un-escape the &gt; to > and &lt to <