Remove '\t' from a given text file using Python

131 Views Asked by At

I have a large text file ingrained with "\t" character, I want to get rid of "\t" character.

I have tried using sub function from re module. When I give a part of my file as a multiline string within the code, I am getting the desired output, but when I pass on string from my file, I see the same string in output.

Below is my code

import re
from pathlib import Path

s = """
{
  device_id: '2ysr9t',
  message: '[44,"139076297","xyz",{\n' +
    '\t"connectorId":\t1,\n' +
    '\t"transactionId":\t654954,\n' +
    '\t"Value":\t[{\n' +
    '\t\t\t"timestamp":\t"2023-11-23T00:21:25Z",\n' +
    '\t\t\t"Value":\t[{\n' +
    '\t\t\t\t\t"value":\t"86237168.0",\n' +
    '\t\t\t\t\t"context":\t"Periodic",\n' +
    '\t\t\t\t\t"format":\t"Raw",\n' +
    }"""

data_folder = Path("2ysr9t.txt")

with open (data_folder, 'r', encoding="utf8") as input:
    s1 = input.read()
    new_string = re.sub('\t','', s1)
    print(new_string)

new_string1 = re.sub('\t','',s)
print(new_string1)

output from the text file

{
  device_id: '2ysr9t',
  message: '[44,"139076297","xyz",{\n' +
    '\t"connectorId":\t1,\n' +
    '\t"transactionId":\t654954,\n' +
    '\t"Value":\t[{\n' +
    '\t\t\t"timestamp":\t"2023-11-23T00:21:25Z",\n' +
    '\t\t\t"Value":\t[{\n' +
    '\t\t\t\t\t"value":\t"86237168.0",\n' +
    '\t\t\t\t\t"context":\t"Periodic",\n' +
    '\t\t\t\t\t"format":\t"Raw",\n' +
    }

output from multi-line string given within code file

{
  device_id: '2ysr9t',
  message: '[44,"139076297","xyz",{
' +
    '"connectorId":1,
' +
    '"transactionId":654954,
' +
    '"Value":[{
' +
    '"timestamp":"2023-11-23T00:21:25Z",
' +
    '"Value":[{
' +
    '"value":"86237168.0",
' +
    '"context":"Periodic",
' +
    '"format":"Raw",
' +
    }

1

There are 1 best solutions below

0
On

Looks like your text file contains '\t' and not actual tab symbols. Escape sequences like '\t', '\n' are not proceed when you read them from the file.