Java: HashMap performance with millions of items vs if-else searching for numerical range

838 Views Asked by At

Looking for a bit of advice if I may. I have a method in my PlayStation emulator (Java based for a university thesis which is since finished). It takes an integer memory address, and then returns the byte at that address - redirecting the read to RAM, BIOS ROM, a given I/O port, etc. depending on the address. At the moment this is implemented using a huge bundle of if-else cases which check the range of an address and read from the right place accordingly, returning the byte.

This gives a performance hit of around 9% of overall runtime for me. I figured I may be able to improve this using a dispatch table - essentially a HashMap with autoboxed Integer keys representing the memory addresses and a lambda value to handle the return of the byte depending on the address. Now bearing in mind there are approximately 2.6 million different possible addresses taking into account the memory map of the PS1, this uses a lot more memory - fine with that.

What is puzzling me is that this gives slightly worse performance than the bundle of if-else statements - around 12% of overall runtime. Is there a better way to do what I'm doing? I can't use an array solution (address as a primitive int index and lambda stored at that index) as there are gaps in the address space that this wouldn't handle without an order of magnitude too much memory usage.

I appreciate any other ideas that might get this number down a bit - I realise Java is not a great language for emulation, but part of my thesis was proving it would work (it does). Many thanks.

Regards, Phil

EDIT:

Below is the entire code of the readByte method (the address is converted to long to allow comparison of lower addresses to higher ones at values considered negative for a normal int):

/**
 * This reads from the correct area depending on the address.
 * @param address
 * @return 
 */
public byte readByte(int address) {

    long tempAddress = address & 0xFFFFFFFFL;
    byte retVal = 0;      

    if (tempAddress >= 0L && tempAddress < 0x200000L) { // RAM
        retVal = ram[(int)tempAddress];
    } else if (tempAddress >= 0x1F000000L && tempAddress < 0x1F800000L) { // Expansion Region 1
        // do nothing for now
        ;
    } else if (tempAddress >= 0x1F800000L && tempAddress < 0x1F800400L) { // Scratchpad
        // read from data cache scratchpad if enabled
        if (scratchpadEnabled()) {
            tempAddress -= 0x1F800000L;
            retVal = scratchpad[(int)tempAddress];
        }
    } else if (tempAddress >= 0x1F801000L && tempAddress < 0x1F802000L) { // I/O Ports
        if (tempAddress >= 0x1F801000L && tempAddress < 0x1F801004L) {
            int shift = (int)(tempAddress & 0x3L);
            switch (shift) {
                case 0:
                    retVal = (byte)(expansion1BaseAddress >>> 24);
                    break;
                case 1:
                    retVal = (byte)(expansion1BaseAddress >>> 16);
                    break;
                case 2:
                    retVal = (byte)(expansion1BaseAddress >>> 8);
                    break;
                case 3:
                    retVal = (byte)expansion1BaseAddress;
                    break;
            }
        }
        else if (tempAddress >= 0x1F801004L && tempAddress < 0x1F801008L) {
            int shift = (int)(tempAddress & 0x3L);
            switch (shift) {
                case 0:
                    retVal = (byte)(expansion2BaseAddress >>> 24);
                    break;
                case 1:
                    retVal = (byte)(expansion2BaseAddress >>> 16);
                    break;
                case 2:
                    retVal = (byte)(expansion2BaseAddress >>> 8);
                    break;
                case 3:
                    retVal = (byte)expansion2BaseAddress;
                    break;
            }
        } else if (tempAddress >= 0x1F801008L && tempAddress < 0x1F80100CL) {
            int shift = (int)(tempAddress & 0x3L);
            switch (shift) {
                case 0:
                    retVal = (byte)(expansion1DelaySize >>> 24);
                    break;
                case 1:
                    retVal = (byte)(expansion1DelaySize >>> 16);
                    break;
                case 2:
                    retVal = (byte)(expansion1DelaySize >>> 8);
                    break;
                case 3:
                    retVal = (byte)expansion1DelaySize;
                    break;
            }
        } else if (tempAddress >= 0x1F80100CL && tempAddress < 0x1F801010L) {
            int shift = (int)(tempAddress & 0x3L);
            switch (shift) {
                case 0:
                    retVal = (byte)(expansion3DelaySize >>> 24);
                    break;
                case 1:
                    retVal = (byte)(expansion3DelaySize >>> 16);
                    break;
                case 2:
                    retVal = (byte)(expansion3DelaySize >>> 8);
                    break;
                case 3:
                    retVal = (byte)expansion3DelaySize;
                    break;
            }
        } else if (tempAddress >= 0x1F801010L && tempAddress < 0x1F801014L) {
            int shift = (int)(tempAddress & 0x3L);
            switch (shift) {
                case 0:
                    retVal = (byte)(biosRomDelaySize >>> 24);
                    break;
                case 1:
                    retVal = (byte)(biosRomDelaySize >>> 16);
                    break;
                case 2:
                    retVal = (byte)(biosRomDelaySize >>> 8);
                    break;
                case 3:
                    retVal = (byte)biosRomDelaySize;
                    break;
            }
        } else if (tempAddress >= 0x1F801014L && tempAddress < 0x1F801018L) {
            int shift = (int)(tempAddress & 0x3L);
            switch (shift) {
                case 0:
                    retVal = (byte)(spuDelaySize >>> 24);
                    break;
                case 1:
                    retVal = (byte)(spuDelaySize >>> 16);
                    break;
                case 2:
                    retVal = (byte)(spuDelaySize >>> 8);
                    break;
                case 3:
                    retVal = (byte)spuDelaySize;
                    break;
            }
        } else if (tempAddress >= 0x1F801018L && tempAddress < 0x1F80101CL) {
            int shift = (int)(tempAddress & 0x3L);
            switch (shift) {
                case 0:
                    retVal = (byte)(cdromDelaySize >>> 24);
                    break;
                case 1:
                    retVal = (byte)(cdromDelaySize >>> 16);
                    break;
                case 2:
                    retVal = (byte)(cdromDelaySize >>> 8);
                    break;
                case 3:
                    retVal = (byte)cdromDelaySize;
                    break;
            }
        } else if (tempAddress >= 0x1F80101CL && tempAddress < 0x1F801020L) {
            int shift = (int)(tempAddress & 0x3L);
            switch (shift) {
                case 0:
                    retVal = (byte)(expansion2DelaySize >>> 24);
                    break;
                case 1:
                    retVal = (byte)(expansion2DelaySize >>> 16);
                    break;
                case 2:
                    retVal = (byte)(expansion2DelaySize >>> 8);
                    break;
                case 3:
                    retVal = (byte)expansion2DelaySize;
                    break;
            }
        } else if (tempAddress >= 0x1F801020L && tempAddress < 0x1F801024L) {
            int shift = (int)(tempAddress & 0x3L);
            switch (shift) {
                case 0:
                    retVal = (byte)(commonDelay >>> 24);
                    break;
                case 1:
                    retVal = (byte)(commonDelay >>> 16);
                    break;
                case 2:
                    retVal = (byte)(commonDelay >>> 8);
                    break;
                case 3:
                    retVal = (byte)commonDelay;
                    break;
            }
        } else if (tempAddress >= 0x1F801040L && tempAddress < 0x1F801050L) {
            // read from ControllerIO object
            retVal = cio.readByte((int)tempAddress);
        } else if (tempAddress >= 0x1F801060L && tempAddress < 0x1F801064L) {
            int shift = (int)(tempAddress & 0x3L);
            switch (shift) {
                case 0:
                    retVal = (byte)(ramSize >>> 24);
                    break;
                case 1:
                    retVal = (byte)(ramSize >>> 16);
                    break;
                case 2:
                    retVal = (byte)(ramSize >>> 8);
                    break;
                case 3:
                    retVal = (byte)ramSize;
                    break;
            }
        }
        else if (tempAddress >= 0x1F801070L && tempAddress < 0x1F801074L) { // Interrupt Status Register
            int shift = (int)(tempAddress & 0x3L);
            switch (shift) {
                case 0:
                    retVal = (byte)(interruptStatusReg >>> 24);
                    break;
                case 1:
                    retVal = (byte)(interruptStatusReg >>> 16);
                    break;
                case 2:
                    retVal = (byte)(interruptStatusReg >>> 8);
                    break;
                case 3:
                    retVal = (byte)interruptStatusReg;
                    break;    
            }
        }
        else if (tempAddress >= 0x1F801074L && tempAddress < 0x1F801078L) { // Interrupt Mask Register
            int shift = (int)(tempAddress & 0x3L);
            switch (shift) {
                case 0:
                    retVal = (byte)(interruptMaskReg >>> 24);
                    break;
                case 1:
                    retVal = (byte)(interruptMaskReg >>> 16);
                    break;
                case 2:
                    retVal = (byte)(interruptMaskReg >>> 8);
                    break;
                case 3:
                    retVal = (byte)interruptMaskReg;
                    break;    
            }
        }
        else if (tempAddress >= 0x1F801080L && tempAddress < 0x1F801100L) {
            retVal = dma.readByte(address);
        }
        else if (tempAddress >= 0x1F801100L && tempAddress < 0x1F801104L) {
            int shift = (int)(tempAddress & 0x3L);
            switch (shift) {
                case 0:
                    retVal = (byte)(timer0.counterValueRead() >>> 24);
                    break;
                case 1:
                    retVal = (byte)(timer0.counterValueRead() >>> 16);
                    break;
                case 2:
                    retVal = (byte)(timer0.counterValueRead() >>> 8);
                    break;
                case 3:
                    retVal = (byte)timer0.counterValueRead();
                    break;
            }
        }
        else if (tempAddress >= 0x1F801104L && tempAddress < 0x1F801108L) {
            int shift = (int)(tempAddress & 0x3L);
            switch (shift) {
                case 0:
                    retVal = (byte)(timer0.counterModeRead(false) >>> 24);
                    break;
                case 1:
                    retVal = (byte)(timer0.counterModeRead(false) >>> 16);
                    break;
                case 2:
                    retVal = (byte)(timer0.counterModeRead(false) >>> 8);
                    break;
                case 3:
                    retVal = (byte)timer0.counterModeRead(false);
                    break;
            }
        }
        else if (tempAddress >= 0x1F801108L && tempAddress < 0x1F80110CL) {
            int shift = (int)(tempAddress & 0x3L);
            switch (shift) {
                case 0:
                    retVal = (byte)(timer0.counterTargetRead() >>> 24);
                    break;
                case 1:
                    retVal = (byte)(timer0.counterTargetRead() >>> 16);
                    break;
                case 2:
                    retVal = (byte)(timer0.counterTargetRead() >>> 8);
                    break;
                case 3:
                    retVal = (byte)timer0.counterTargetRead();
                    break;
            }
        }
        else if (tempAddress >= 0x1F801110L && tempAddress < 0x1F801114L) {
            int shift = (int)(tempAddress & 0x3L);
            switch (shift) {
                case 0:
                    retVal = (byte)(timer1.counterValueRead() >>> 24);
                    break;
                case 1:
                    retVal = (byte)(timer1.counterValueRead() >>> 16);
                    break;
                case 2:
                    retVal = (byte)(timer1.counterValueRead() >>> 8);
                    break;
                case 3:
                    retVal = (byte)timer1.counterValueRead();
                    break;
            }
        }
        else if (tempAddress >= 0x1F801114L && tempAddress < 0x1F801118L) {
            int shift = (int)(tempAddress & 0x3L);
            switch (shift) {
                case 0:
                    retVal = (byte)(timer1.counterModeRead(false) >>> 24);
                    break;
                case 1:
                    retVal = (byte)(timer1.counterModeRead(false) >>> 16);
                    break;
                case 2:
                    retVal = (byte)(timer1.counterModeRead(false) >>> 8);
                    break;
                case 3:
                    retVal = (byte)timer1.counterModeRead(false);
                    break;
            }
        }
        else if (tempAddress >= 0x1F801118L && tempAddress < 0x1F80111CL) {
            int shift = (int)(tempAddress & 0x3L);
            switch (shift) {
                case 0:
                    retVal = (byte)(timer1.counterTargetRead() >>> 24);
                    break;
                case 1:
                    retVal = (byte)(timer1.counterTargetRead() >>> 16);
                    break;
                case 2:
                    retVal = (byte)(timer1.counterTargetRead() >>> 8);
                    break;
                case 3:
                    retVal = (byte)timer1.counterTargetRead();
                    break;
            }
        }
        else if (tempAddress >= 0x1F801120L && tempAddress < 0x1F801124L) {
            int shift = (int)(tempAddress & 0x3L);
            switch (shift) {
                case 0:
                    retVal = (byte)(timer2.counterValueRead() >>> 24);
                    break;
                case 1:
                    retVal = (byte)(timer2.counterValueRead() >>> 16);
                    break;
                case 2:
                    retVal = (byte)(timer2.counterValueRead() >>> 8);
                    break;
                case 3:
                    retVal = (byte)timer2.counterValueRead();
                    break;
            }
        }
        else if (tempAddress >= 0x1F801124L && tempAddress < 0x1F801128L) {
            int shift = (int)(tempAddress & 0x3L);
            switch (shift) {
                case 0:
                    retVal = (byte)(timer2.counterModeRead(false) >>> 24);
                    break;
                case 1:
                    retVal = (byte)(timer2.counterModeRead(false) >>> 16);
                    break;
                case 2:
                    retVal = (byte)(timer2.counterModeRead(false) >>> 8);
                    break;
                case 3:
                    retVal = (byte)timer2.counterModeRead(false);
                    break;
            }
        }
        else if (tempAddress >= 0x1F801128L && tempAddress < 0x1F80112CL) {
            int shift = (int)(tempAddress & 0x3L);
            switch (shift) {
                case 0:
                    retVal = (byte)(timer2.counterTargetRead() >>> 24);
                    break;
                case 1:
                    retVal = (byte)(timer2.counterTargetRead() >>> 16);
                    break;
                case 2:
                    retVal = (byte)(timer2.counterTargetRead() >>> 8);
                    break;
                case 3:
                    retVal = (byte)timer2.counterTargetRead();
                    break;
            }
        }
        else if (tempAddress >= 0x1F801810L && tempAddress < 0x1F801814L) {
            int shift = (int)(tempAddress & 0x3L);
            switch (shift) {
                case 0:
                    retVal = (byte)(gpu.readResponse() >>> 24);
                    break;
                case 1:
                    retVal = (byte)(gpu.readResponse() >>> 16);
                    break;
                case 2:
                    retVal = (byte)(gpu.readResponse() >>> 8);
                    break;
                case 3:
                    retVal = (byte)gpu.readResponse();
                    break;
            }
        }
        else if (tempAddress >= 0x1F801814L && tempAddress < 0x1F801818L) {
            int shift = (int)(tempAddress & 0x3L);
            switch (shift) {
                case 0:
                    retVal = (byte)(gpu.readStatus() >>> 24);
                    break;
                case 1:
                    retVal = (byte)(gpu.readStatus() >>> 16);
                    break;
                case 2:
                    retVal = (byte)(gpu.readStatus() >>> 8);
                    break;
                case 3:
                    retVal = (byte)gpu.readStatus();
                    break;
            }
        }
        else if (tempAddress >= 0x1F801800L && tempAddress < 0x1F801804L) { // CDROM
            switch ((int)tempAddress & 0xF) {
                case 0:
                    retVal = cdrom.read1800();
                    break;
                case 1:
                    retVal = cdrom.read1801();
                    break;
                case 2:
                    retVal = cdrom.read1802();
                    break;
                case 3:
                    retVal = cdrom.read1803();
                    break;
            }
        }
        else if (tempAddress >= 0x1F801C00L && tempAddress < 0x1F802000L) {
            // fake SPU read
            retVal = spu.readByte(address);
        }
    } else if (tempAddress >= 0x1F802000L && tempAddress < 0x1F803000L) { // Expansion Region 2 (I/O Ports)
        // read from BIOS post register
        if (tempAddress == 0x1F802041L) {
            retVal = biosPost;
        }
    } else if (tempAddress >= 0x1FA00000L && tempAddress < 0x1FC00000L) { // Expansion Region 3 (Multipurpose)
        // do nothing for now
        ;
    } else if (tempAddress >= 0x1FC00000L && tempAddress < 0x1FC80000L) { // BIOS ROM
        // read from memory mapped BIOS file
        tempAddress -= 0x1FC00000L;
        retVal = biosBuffer.get((int)tempAddress);
    } else if (tempAddress >= 0xFFFE0000L && tempAddress < 0xFFFE0200L) { // I/O Ports (Cache Control)
        if (tempAddress >= 0xFFFE0130L && tempAddress < 0xFFFE0134L) { // Cache Control Register
            int shift = (int)(tempAddress & 0x3L);
            switch (shift) {
                case 0:
                    retVal = (byte)(cacheControlReg >>> 24);
                    break;
                case 1:
                    retVal = (byte)(cacheControlReg >>> 16);
                    break;
                case 2:
                    retVal = (byte)(cacheControlReg >>> 8);
                    break;
                case 3:
                    retVal = (byte)cacheControlReg;
                    break;    
            }
        }            
    }

    return retVal;
}
4

There are 4 best solutions below

6
On BEST ANSWER

The best approach depends on your implementation under the hood. I see that the address space of the PSX is 32 bit, but as with many console, zones are mirrored. Now without seeing your actual implementation it's just guessing but here's some considerations.

I'll start considering this table

 KUSEG     KSEG0     KSEG1
  00000000h 80000000h A0000000h  2048K  Main RAM (first 64K reserved for BIOS)
  1F000000h 9F000000h BF000000h  8192K  Expansion Region 1 (ROM/RAM)
  1F800000h 9F800000h    --      1K     Scratchpad (D-Cache used as Fast RAM)
  1F801000h 9F801000h BF801000h  8K     I/O Ports
  1F802000h 9F802000h BF802000h  8K     Expansion Region 2 (I/O Ports)
  1FA00000h 9FA00000h BFA00000h  2048K  Expansion Region 3 (whatever purpose)
  1FC00000h 9FC00000h BFC00000h  512K   BIOS ROM (Kernel) (4096K max)
        FFFE0000h (KSEG2)        0.5K   I/O Ports (Cache Control)

So for I/O ports there not much you can do since they're separated and must be handled specifically we can try to study how to improve addressing of everything else.

We can see that mirrored regions differ from the 4 most relevant bits. Which means that we can do address &= 0x0FFFFFFF so that we ignore the region and consider only the significative part of the address.

So now we have 3 kinds of addresses:

  • the one starting at 0x0000000, mapped to main RAM
  • the group starting at 0xF000000 and ending at 0xFC00000 (+ bios rom)
  • the I/O ports at 0xFFFF0000

This could lead to an hybrid approach in which you use both if/else and a cache, eg:

byte readMemory(int address) 
{
  if ((address & 0xFF000000) == 0xFF000000)
    return ioPorts.read(address);

  // remove most significative nibble, we don't need it
  address &= 0x0FFFFFFF;

  // 0xF000000 zone
  // according to bios rom size you could need a different kind of comparison since it may wrap over 0xFFFFFFF
  if ((address & 0xF000000) == 0xF000000)
  {
    // now your address space is just from 0xF000000 to 0xFC00000 + size of BIOS ROM (4mb max?)
  }
  else
  {
    // we don't know if you map bios together with ram or separately
    return mainRam.readMemory(address);
  }
}

Now we have that address space between 0xF000000 and 0xFC000000 that must be split into multiple parts. As you can see from the memory map we have this:

F000000h
F800000h
F801000h
F802000h
FA00000h
FC00000h 

If you notice you can see that first 4 bits are always 0xF while last 12 bits are always 0, so we don't need them to understand where to dispatch the call. This means that the interesting part of the address has the following mask 0x0FFF000, so we could translate the address:

address = (address >>> 12) & 0xFFF;

Now these are only 4096 possible values which could fit in a tight LUT table.

7
On

A HashMap is O(1) look up performance, but that doesn't mean it's "instantaneous". Internally there are lots of if tests and hashcode calculations etc, more it seems than is going on with your conditions.


If there's only a couple of million bytes, I would just use a byte[] and look it up - there is no faster impl.

2
On

You mentioned auto-boxing in your question.

Keep in mind that such conversions from int to Integer do not come for free. In that sense you have to make sure to not cause too many unnecessary / unwanted boxing operations to take place.

In order to give you a more detailed answer, we would need to see some of your code.

But I agree with the other feedback you got: if your memory fits into a single array of bytes, then use that; and just make sure to provide reasonable interfaces around that.

2
On

It seems the real problem with your current implementation isn't the long if-else chain, but their highly-repetitive contents. This is a classic example of a pattern that's a good fit for object oriented programming. You should define a common interface for accessing segments of memory, so that each conditional block only needs to look like (picked a random block):

...
else if (tempAddress >= 0x1F801104L && tempAddress < 0x1F801108L) {
  return timer0.read(tempAddress);
}
...

There's a number of ways you could design such a class, but I'd suggest providing an interface and an abstract class that handles the switch behavior:

interface Location {
  byte read(long address);
}

abstract class IntWordLocation implements Location {
  @Override
  final byte read(long address) {
    int value = readInt(address & 0xfffffffd); // clear lower two bits;
    int shift = 3 - (int) (address & 0x3);
    for (int i = 0; i < shift; i++) { // replaces switch statement
      value = value >>> 8;
    }
    return (byte) value;
  }

  abstract protected int readInt(long int);
}

And you can add other Location implementations as needed. Now we've decoupled reading a particular segment of memory (which Location is now responsible for) from looking up which segment an address represents (your if-else blocks), and now your your readByte() method should be much simpler.


If you want to further clean up your readByte() method, you might find Guava's RangeMap useful - it allows you to efficiently represent mappings of ranges of values, meaning you could store one entry for each memory location, rather than attempt to map each address individually. The map would consist of only a few entries, rather than having it be massively large.

For example:

RangeMap<Long, Location> addressRanges =
  new ImmutableRangeMap.Builder<Integer, Location>()
    .put(Range.closedOpen(0L, 0x200000L), ramLocation)
    .put(Range.closedOpen(0x1F000000L, 0x1F800000L), NO_OP_LOCATION)
    .put(Range.closedOpen(0x1F800000L, 0x1F800400L), scratchpadLocation)
    // ...
    .build();

Then your readByte() method simply becomes:

public byte readByte(int address) {
  return addressRanges.get(address).read(address);
}

This has an added benefit of sanity-checking your address-space, since ImmutableRangeMap.Builder will reject overlapping ranges. You can also see any gaps by constructing a RangeSet of the map's keys and calling `RangeSet.complement().