I am trying to effectively join two DataFrames, one of which is large and the second is a bit smaller.
Is there a way to avoid all this shuffling? I cannot set autoBroadCastJoinThreshold, because it supports only Integers - and the table I am trying to broadcast is slightly bigger than integer number of bytes.
Is there a way to force broadcast ignoring this variable?

This is a current limitation of spark, see SPARK-6235. The 2GB limit also applies for broadcast variables.
Are you sure there is no other good way to do this, e.g. different partitioning?
Otherwise you can hack your way around it by manually creating multiple broadcast variables which are each <2GB.