Update: Keeping an eye in this bug in GHC 9.0.1 as the likely culprit.
I'm seeing some strange Unicode behavior in my Haskell package when it builds under GHC 9.0.1. I understand that solving this may involve checking for changes in other Haskell packages, but my question here is whether the unexpected output I'm seeing rings any Unicode bells (Haskell or otherwise), so that I can begin to track down the reasons for the unexpected output.
Where I expect to see, respectively
β
(or\946
) andγ
(or\947
)
I instead see
β?KQHTLXOCBJSPDZRAMEWNIUYGV
andγ?EYJVCNIXWPBQMDRTAKZGFUHOS
This output also has some frustrating properties that make it hard to sort out what's going on:
- The garbage letters following the greek character, though always the same on my local machine, are not the same as those I see on builds on other platforms (e.g. on Travis CI Focal I get
β?SOVPZJAYQUIRHXLNFTGKDCMB
) - What I see and what I get when I paste what I see are different. Typicaly the leading and trailing garbage characters are truncated. So I assume the
?
is actually some special character.
Critically, none of this was happening with pre GHC 9 nightly resolvers.
Do the unexpected patterns of characters following the greek characters correspond to anything that would help track down the source of my error? Is there something about how GHC 9 or the packages in the latest nightly Stackage resolvers are handling Unicode that could be causing this?
UPDATE: After some sleuthing, it doesn't look like dependences have changed, and it seems to work on GHC 8.10 with the same dependences as nightly (I think; still working on it) but it looks like something really weird is going on with the use of a Unicode character as a key.
type Name = String
type Wiring = Mapping
type Turnovers = String
data Component = Component {
name :: !Name, -- ^ The component's 'Name'.
wiring :: !Wiring, -- ^ The component's 'Wiring'.
turnovers :: !Turnovers -- ^ The component's 'Turnovers'.
}
-- Definitions of rotor Components; people died for this information
rots_ :: M.Map Name Component
rots_ = M.fromList $ (name &&& id) <$> [
-- rotors
Component "I" "EKMFLGDQVZNTOWYHXUSPAIBRCJ" "Q",
Component "II" "AJDKSIRUXBLHWTMCQGZNPYFVOE" "E",
Component "III" "BDFHJLCPRTXVZNYEIWGAKMUSQO" "V",
Component "IV" "ESOVPZJAYQUIRHXLNFTGKDCMWB" "J",
Component "V" "VZBRGITYUPSDNHLXAWMJQOFECK" "Z",
Component "VI" "JPGVOUMFYQBENHZRDKASXLICTW" "ZM",
Component "VII" "NZJHGRCXMYSWBOUFAIVLPEKQDT" "ZM",
Component "VIII" "FKQHTLXOCBJSPDZRAMEWNIUYGV" "ZM",
Component "β" "LEYJVCNIXWPBQMDRTAKZGFUHOS" "",
Component "γ" "FSOKANUERHMBTIYCWLQPZXVGJD" ""]
and
rotors :: [Name]
rotors = M.keys rots_
and somehow — only since GHC 9 — when the name
for a Component
is a Greek character keys
, rather than returning just the Greek character, also picks up other text. What that text is varies by context. On my local machine, it is always the wiring
for the previous Component
in rots_
(which is more than weird enough!), but on Travis CI β
appends the wring
for IV
and γ
appends just an X
.
If I had to guess, this suggests that there is something going on with respect to how Unicode is actually stored by the compiler that's causing M.keys
applied to a Component
to pick us something nearby that shouldn't actually be part of keys
(or name
).
This one really has me stumped and is way above my Haskel skill level. Any help is much appreciated.
To replicate:
stack update
stack unpack crypto-enigma-0.1.1.6
cd crypto-enigma-0.1.1.6
rm -f stack.yaml && stack init --resolver nightly
stack build --resolver nightly --haddock --test --bench --no-run-benchmarks