I would like to work with RDD pairs of Tuple2<byte[], obj>, but byte[]s with the same contents are considered as different values because their reference values are different.
I didn't see any to pass in a custom comparer. I could convert the byte[] into a String with an explicit charset, but I'm wondering if there's a more efficient way. 
 
                        
Custom comparers are insufficient because Spark uses the
hashCodeof the objects to organize keys in partitions. (At least the HashPartitioner will do that, you could provide a custom partitioner that can deal with arrays)Wrapping the array to provide proper
equalsandhashCodeshould address the issue. A lightweight wrapper should do the trick:A quick test: