I am working on a project whose files are different in encoding.(My OS is centos 7)
For example, $SRC/a.cpp may encoded in UTF-8, while $SRC/b.cpp is encoded in GB 2312(simplified Chinese).
Now if I enter git diff, the content will not display properly due to the mixed encoding.
I've tried iconv like this
git diff HEAD~1 | iconv -f gb2312 -t utf8 | less
It works well if all the files involved are encoded in GB 2312. But if any UTF-8 file is mixed, then iconv will broke like this
some well displayed UTF-8 text
...
iconv: illegal input sequence at position 120
My question is that if it is possible to make commands like git diff work properly without changing the file itself? I hope there can be some scripts filtering non-UTF-8 file for iconv or some git confiuration that can run iconv for non-UTF-8 file only.
Edit: The client of this project requests some files to have specific encodings and wants as less changes as possible for stability, so modifying files' encoding directly is not possible. A workaround without modifying the project is prefer.
You might need a
git configdiff driverThat driver script would first identify the encoding of each file and then convert it to UTF-8 if necessary before showing the diff.
Create a shell script (for instance
git-diff-encoding.sh, withchmod +x git-diff-encoding.sh) which identifies the encoding of the files and then converts them to UTF-8 if necessary before showing the diff.In your
.git/configfile, add the following lines to define a new diff driver called "encoding":Tell Git which files should be handled by this new diff driver. You can do this in your repository's
.gitattributesfile (create it, if it does not exist, at the root folder of your Git repository). Add lines specifying the files to be handled by your new diff driver, for example:Now, git will use your custom diff script when running
git difffor files matching the patterns specified in the.gitattributesfile.