Weird behaviour in Java 8 with ByteBuffer and BitSet

72 Views Asked by At

I'm new to java and started to implement a UDP sender with BitSet and ByteBuffer for some reason I get behaviour which I would not expect.

import java.nio.ByteBuffer;
import java.nio.ByteOrder;
import java.util.BitSet;

public class Main
{
    public static void main(String[] args) {
        
        ByteBuffer out = ByteBuffer.allocate(2);
        BitSet byt = new BitSet(8);
        byt.set(0, true);
        byt.set(1, false);
        out.put(byt.toByteArray());
        byt.set(0, true);
        byt.set(1, false);
        byt.set(2, true);
        out.put(byt.toByteArray());
        
        System.out.println("First byte is " + out.array()[0]+ " second is " + out.array()[1]);
    }
}

where I get the output

First byte is 1 second is 5

which I think is not okay since the endianness is wrong

When I try to run this:

public class Main
{
    public static void main(String[] args) {
        
        ByteBuffer out = ByteBuffer.allocate(2);
        BitSet byt = new BitSet(8);
        byt.set(0, false);
        byt.set(1, false);
        out.put(byt.toByteArray());
        byt.set(0, true);
        byt.set(1, false);
        byt.set(2, true);
        out.put(byt.toByteArray());
        
        System.out.println("First byte is " + out.array()[0]+ " second is " + out.array()[1]);
    }
}

The output changes to

First byte is 5 second is 0

Which I think is the right answer.

Notice the change is only on line 7 while the order of the bytes also change.

A quick test is here and here

I'm fairly new to Java. So it could all be a big misunderstanding. Thanks anyway!

2

There are 2 best solutions below

0
marcinj On BEST ANSWER

BitSet.toByteArray() creates a byte array which is of the minimum length necessary to represent the BitSet. You can read in docs:

https://docs.oracle.com/en/java/javase/17/docs/api/java.base/java/util/BitSet.html#toByteArray()

byte[] bytes = s.toByteArray();
then bytes.length == (s.length()+7)/8 and
s.get(n) == ((bytes[n/8] & (1<<(n%8))) != 0)
for all n < 8 * bytes.length.

and the s.length() is:

Returns the "logical size" of this BitSet: the index of the highest set bit in the BitSet plus one. Returns zero if the BitSet contains no set bits.

So, essentially, if you store only false in your bitset, toByteArray() will return empty array.

ByteBuffer will fill the bytes as you put them in. So the order in which the bytes are stored is exactly the order in which you call the .put method.

Now lets follow your code example:

Your ByteByffer of size 2, initially:

| x | x |
  0   1

x means its empty

Then you create BitSet: [true, false, false, false, false, false, false, false] This is array: [1]. You then put it into your ByteBuffer:

| 1 | x |
  0   1

in the first entry you have now 1.

Then you create second bitset: [true, false, true, false, false, false, false, false] which is [5] (2^0 + 2^2 = 1 + 4 = 5). Then you put it to BytBuffer:

| 1 | 5 |
  0   1

So, your output is : 'First byte is 1 second is 5'. And that is correct.

which I think is not okay since the endianness is wrong

I dont think your problem has anything to do with endianess, if you expected to see 'First byte is 5 second is 1' then switch the order of adding bytes to ByteBuffer.

Now, lets follow the second example:

Your ByteByffer of size 2, initially:

| x | x |
  0   1

then a BitSet with [false, false, false, false, false, false, false, false], which is 0 in binary. But BitSet.toByteArray will produce an empty array because the BitSet has no set bits.

so, there is no change to a ByteBuffe, its still empty:

| x | x |
  0   1

The second bit set is [true, false, true, false, false, false, false, false], which is [5], after puting to ByteBuffer:

| 5 | x |
  0   1

Here, because the first BitSet added nothing, we only see the result of adding the second byte.

So, you problem originates from how BitSet produces array when using toByteArray(). You might consider not using BitSet and use some other class, see this SO for some hints: Java BitSet and byte[] usage

[edit] fix for the second example could be manually taking into account that a zero array can be returned by BitSet.toByteBuffer and manually adding [0] array. Of course, the same can be done for second byte.

    ByteBuffer out = ByteBuffer.allocate(2);
    BitSet byt = new BitSet(8);
    byt.set(0, false);
    byt.set(1, false);
    
    byte[] byteArr = byt.toByteArray();
    if(byteArr.length == 0) {
        out.put((byte)0);
    } else {
        out.put(byteArr);
    }
    
    byt.set(0, true);
    byt.set(1, false);
    byt.set(2, true);
    
    byteArr = byt.toByteArray();
    out.put(byteArr);
    
    System.out.println("First byte is " + out.array()[0]+ " second is " + out.array()[1]);
0
rzwitserloot On

Which I think is the right answer.

Not at all; you are misinterpreting what you are seeing, your test is not particularly well written to show it off.

Here's whats actually happening:

BitSet byt = new BitSet(8);
byt.set(0, false);
byt.set(1, false);
out.put(byt.toByteArray());

This writes nothing. Add System.out.println(out.position()); to see this in action.

That's because the spec of BitSet disregards entirely how large you made your bitset (the 8 in new BitSet(8) has zero effect on what .toByteArray() does), instead, it checks the most significant 1-bit in your bitset, rounds that up to the nearest integer divisible by 8, and that's how many bits are emitted. Given that your second snippet has all-zero bytes, '0 bits' is all that is needed to represent it, and therefore, .toByteArray() dutifully produces a 0-length array. And buffer.put(someZeroLenArray); dutifully does exactly what you ask it to, which is: Do nothing, successfully. Make no changes to the byte buffer and do not advance the position.'.

So, you:

  • Write nothing. Not even 8 zero bits - literally nothing.
  • Write '5'
  • Print the first byte (Which prints 5)
  • Print the second, which hasn't been set at all yet and thus defaults to printing 0.

You clearly are thinking that the '5' you see is from the second BitSet, and the 0 you see is from the first.

The endianness is right (as in, the endianness you get is [A] exactly what the java spec says you get, and [B] matches all uses of endianness, notably including network standards, except intel chips); your expectation of what endianness you expected is incorrect, and all you need to do is adjust that.

More generally endianness doesn't even apply here: endianness is a concept that applies when you write things that are larger than 8 bit in length and you never do.

This is an example of endianness:

ByteBuffer bb = ByteBuffer.allocate(4);
bb.putInt(1);
System.out.println("At pos 0: " + bb.get(0)); // 0
System.out.println("At pos 3: " + bb.get(3)); // 1
bb = ByteBuffer.allocate(4);
bb.order(ByteOrder.LITTLE_ENDIAN);
bb.putInt(1);
System.out.println("At pos 0: " + bb.get(0)); // 1
System.out.println("At pos 3: " + bb.get(3)); // 0

In other words, given that 'an int' requires writing 4 bytes, which of the 4 bytes that comprise the int do we write first - the one with the least significant digits (the 1 here - '1' in int terms is 00000000001 of course), or the one with the most (the 0)? With LITTLE_ENDIAN, the 1 is printed first.

The code snippet you pasted does not have an 'endianness'. It is neither little nor big endian. The bits are in big endian form, but then, bits being in big endian form is how it's done for all systems and all CPUs that are relevant to talk about today.