After yet another git pull my project stopped building with bunch of messages:
error: unmappable character for encoding UTF-8
The messages point to the copyright symbol found in some of the files headers. There are many more files with same symbol but they seem to compile fine. When viewing in binary editor the good one appears as:
C2 A9
while bad one
A9
When viewing in vim both are shown as © (<©> 169, Hex 00a9, Octal 251) but IntelliJ Idea shows bad ones as diamond.
So I decided that I have messed something when merging (there were merge conflicts after pull) and went to look what files where changed with
git diff-tree --no-commit-id --name-only -r --full-index --binary 91cbe7b753d39905372c1ea41e04e7a3dbd2566e
but it produces no results. No changes found for the previous commit too. The log looks like this:
commit 91cbe7b753d39905372c1ea41e04e7a3dbd2566e
Merge: d7b4ae9 0dfc198
Author: Me Me <[email protected]>
Date: Wed Dec 23 17:50:46 2015 +0100
Merge branch 'development' of ssh://fsstash.cool.com:7999/our/server into my-branch
commit 0dfc19850b2e31d72c1d2923321430e8fc1b53cb
Merge: 724b8a7 d3478f9
Author: Good Guy <[email protected]>
Date: Wed Dec 23 14:34:33 2015 +0200
Merge branch 'development' of ssh://fsstash.cool.com:7999/our/server into development
when I do git checkout 0dfc19850b2e31d72c1d2923321430e8fc1b53cb everything compiles fine.
So the question is: how can I fix it?
By fix I mean understanding what's happend and reapplying the pull changes (maybe) so that I wouldn't have to commit anything related to this fix into upstream repo.
It seems like the bad one is UTF-16 (0x00A9) while good one is UTF-8 - (0xC2 0xA9). What might have changed it?
Build system is maven, but it's not related as same error reported by bare javac on copied and minified file. The os is ubuntu 15.10, locale says this:
locale
LANG=ru_RU.UTF-8
LANGUAGE=ru:en
LC_CTYPE="ru_RU.UTF-8"
LC_NUMERIC=ru_UA.UTF-8
LC_TIME=ru_UA.UTF-8
LC_COLLATE="ru_RU.UTF-8"
LC_MONETARY=ru_UA.UTF-8
LC_MESSAGES="ru_RU.UTF-8"
LC_PAPER=ru_UA.UTF-8
LC_NAME=ru_UA.UTF-8
LC_ADDRESS=ru_UA.UTF-8
LC_TELEPHONE=ru_UA.UTF-8
LC_MEASUREMENT=ru_UA.UTF-8
LC_IDENTIFICATION=ru_UA.UTF-8
LC_ALL=
java -version: 1.8.0_66.
Any help is highly appreciated!
PS: tried all --diff-algorithm={patience|minimal|histogram|myers} - still no changes found by git-diff-tree
PS: git reset --hard HEAD~1, git pull origin developemnt
issued from the command line didn't help, so not related to Idea.
git diff --name-only
is indeed more suited for parsing, as shown with Git 2.32 (Q2 2021), which clarifies that pathnames recorded in Git trees are most often (but not necessarily) encoded in UTF-8.See commit 9364bf4 (20 Apr 2021) by Andrey Bienkowski (
hexagonrecursion
).(Merged by Junio C Hamano --
gitster
-- in commit 93e0b28, 30 Apr 2021)diff-options
now includes in its man page:diff-options
now includes in its man page: