replace undefined character in file

128 Views Asked by At
          <ScreenOptions>
            <ScreenOption Visible="true" Locked="false" PrintCode="" DataType="Boolean" Description="Sill Support &lt;br&gt; (Champagne or Mill Finish sill support is always provided when jamb depth &gt; 8-5/8�)" ValueDescription="No" Sequence="1">
              <ComponentAttributeId>622</ComponentAttributeId>
            </ScreenOption>
          </ScreenOptions>

Hi , how can i replace this charater � with null value ?

open('decmpresed.txt', 'r') as file :
  filedata = file.read()
print(filedata)
# Replace the target string
filedata = filedata.replace('�', ' ')

# # Write the file out again
with open('decompresed.txt', 'w') as file:
   file.write(filedata)

so far this code is not working for me , any ideas please ?

1

There are 1 best solutions below

1
Thomas Weller On

What you see is the Unicode Replacement character U+FFFD. It means that your Unicode XML file has been processed incorrectly with respect to Unicode. Whenever you see the � sign, information has been lost, irrecoverably. There is no way to get the old data back.

My idea is: wherever you got this XML from, let them generate a correct XML file.

You're the next person in the chain who doesn't understand Unicode and you're going to eliminate this clear indication that data was already lost. You are hiding a bug. I don't think that will go well in the long term.

When opening the file, you can specify an encoding, e.g.

with open('decmpresed.txt', 'r', encoding='utf-16-le') as file:

or whatever the encoding of the file is. The replacement will work once you got the encoding right.