read files with different encoding format using sys.stdin in python3

1.9k Views Asked by Yaozong Li At 30 November 2025 at 00:10

I have many files which are encoded with UTF-8 or GBK. My system encoding is UTF-8 (LANG=zh_CN.UTF-8), so I can read files encoded with UTF-8 easily. But I must read file encoding with GBK as well. I'm following Python 3: How to specify stdin encoding here:

import sys 
import io
input_stream = io.TextIOWrapper(sys.stdin.buffer, encoding='gbk')
for line in input_stream:
    print(line)

My question is how can I read all the files (both GBK and UTF-8) safely from sys.stdin. Or can you give me some better solution?

To slightly expand on this question, I want to handle files like this:

cat *.in | python3 handler.py

*.in returns many files encoded with either UTF-8 or GBK.

If I use the following code in handler.py

for line in sys.stdin:
    ...some code

it will throw an error as soon as it tries to process a GBK file:

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xd5 in position 0: invalid continuation byte

On the other hand, if I use code like this:

input_stream = io.TextIOWrapper(sys.stdin.buffer, encoding='gbk')
for line in input_stream:
    ...some code

it will throw an error on any UTF-8 file:

UnicodeDecodeError: 'gbk' codec can't decode byte 0x80 in position 25: illegal multibyte sequence

I want to find a safe way to handle both types of files (UTF-8 and GBK) within my script.

Original Q&A

There are 1 best solutions below

tripleee On 15 January 2018 at 09:09 BEST ANSWER

You can read the input as raw bytes, and then examine the input to decide what to actually decode it into.

read files with different encoding format using sys.stdin in python3

There are 1 best solutions below

Related Questions in PYTHON

Related Questions in PYTHON-3.X

Related Questions in ENCODING

Related Questions in UTF-8

Related Questions in GBK

Trending Questions

Popular # Hahtags

Popular Questions