I have a large text file ingrained with "\t" character, I want to get rid of "\t" character.
I have tried using sub function from re module. When I give a part of my file as a multiline string within the code, I am getting the desired output, but when I pass on string from my file, I see the same string in output.
Below is my code
import re
from pathlib import Path
s = """
{
device_id: '2ysr9t',
message: '[44,"139076297","xyz",{\n' +
'\t"connectorId":\t1,\n' +
'\t"transactionId":\t654954,\n' +
'\t"Value":\t[{\n' +
'\t\t\t"timestamp":\t"2023-11-23T00:21:25Z",\n' +
'\t\t\t"Value":\t[{\n' +
'\t\t\t\t\t"value":\t"86237168.0",\n' +
'\t\t\t\t\t"context":\t"Periodic",\n' +
'\t\t\t\t\t"format":\t"Raw",\n' +
}"""
data_folder = Path("2ysr9t.txt")
with open (data_folder, 'r', encoding="utf8") as input:
s1 = input.read()
new_string = re.sub('\t','', s1)
print(new_string)
new_string1 = re.sub('\t','',s)
print(new_string1)
output from the text file
{
device_id: '2ysr9t',
message: '[44,"139076297","xyz",{\n' +
'\t"connectorId":\t1,\n' +
'\t"transactionId":\t654954,\n' +
'\t"Value":\t[{\n' +
'\t\t\t"timestamp":\t"2023-11-23T00:21:25Z",\n' +
'\t\t\t"Value":\t[{\n' +
'\t\t\t\t\t"value":\t"86237168.0",\n' +
'\t\t\t\t\t"context":\t"Periodic",\n' +
'\t\t\t\t\t"format":\t"Raw",\n' +
}
output from multi-line string given within code file
{
device_id: '2ysr9t',
message: '[44,"139076297","xyz",{
' +
'"connectorId":1,
' +
'"transactionId":654954,
' +
'"Value":[{
' +
'"timestamp":"2023-11-23T00:21:25Z",
' +
'"Value":[{
' +
'"value":"86237168.0",
' +
'"context":"Periodic",
' +
'"format":"Raw",
' +
}
Looks like your text file contains '\t' and not actual tab symbols. Escape sequences like '\t', '\n' are not proceed when you read them from the file.