Python is replacing characters in my string when reading it from a text document

474 Views Asked by At

When reading the text a text document, python seems to be replacing parts of it with other characters.

Here's the contents of the text documents:

    zKeh&aZTo@kgLPo2
    r#Zd[$xcGa()rd:l
    asdf uo NAgyu$\+
    vB=# dsU Zkd sdV
    bb !w#d#Jkr{Pd$}
    CehD *(T izP dx 
    mMoOww}lk~"cizPx
    czgjueo#z@vruo<>
    g$$ \|T{ Z$$ikmp

(We are decrypting this for a school project. Since the code to decrypt it happens after python changes the string, i'm not including the code.)

This here is the code used to read the text document:

    # Open both the rules and test
    fileTextDoc = open("test.txt")
    fileRules = open("rule.txt")
    
    # Put each line of the text file into a list
    strTextDoc = fileTextDoc.readlines()

When the readlines() is triggered it puts each line into a list, but after this the strings all change:

(I am aware that in the text document it creates an invisible '\n' when making a new line, the '\n' is removed later on in the code.)

    0:'zKeh&aZTo@kgLPo2\n'
    1:'r#Zd[$xcGa()rd:l\n'
    2:'asdf uo\tNAgyu$\\+\n'
    3:'vB=# dsU Zkd sdV\n'
    4:'bb !w#d#Jkr{Pd$}\n'
    5:'CehD *(T izP dx\t\n'
    6:'mMoOww}lk~"cizPx\n'
    7:'czgjueo#z@vruo<>\n'
    8:'g$$ \\|T{ Z$$ikmp\n'
4

There are 4 best solutions below

1
On

If it is list use the list comprehension function to do the replace for each element. Replace the "\n" with space

strTextDoc = [s.replace('\n','') for s in infileTextDoc.readlines()]

0
On

Nothing printed is different from the input. \t is a tab (which is in the input), and \\ is the escape code for the \ character. And, as you know, the \n is a newline.

1
On

You may want to look into String literals. Much like the newline character. These characters only show up in representations, they aren't actually new characters. If you check you will find that the length of the line (via len()), and of your raw text would be the same.

So for example \\ is actually what we see as \ but because \ is used for certain characters (like newlines \n) the representation is \\.

0
On

The extra characters being added are \t and \. \t appears when the space is actually a tab character. And \ gets replaced with \\ because it is being escaped. Since \ is used before escape characters (\n, \t etc), in order to identify a pure \ the extra \ is added.

If you have text that is actually abc\n in the text file, if you don't escape the \, the program will interpret it as a new line. To avoid this all \ are escaped.