I am trying to implement a highly scalable server in java for the following usecase

  1. Client sends a request to server in a form of COMMAND PARAM

  2. Server can send a varying size response, from a few characters (10 bytes) to large text data (of size 6-8gb, equivalent to client ram)

What should be the appropriate way to send response in these scenarios. I need to support multiple concurrent clients. Can some one point me to a reference/sample implementation?

1

There are 1 best solutions below

11
On

There are many existing solutions that use NIO under the hood, such as Netty and frameworks like Grizzly. They are borderline rocket science. Getting NIO right is extremely complicated. Use those frameworks.

highly scalable server in java

NIO is usually slower. The primary benefit of NIO is that you can handroll your buffers, vs. threaded models where the stack size is locked in (you get to configure the stack size for all stacks once as you start java itself with java -Xss1m for example, for a 1MB stack (meaning, 100 threads require 1GB of memory just for the stacks, let alone the heap).

Usually, tossing a big RAM stick at your box is many orders of magnitude more effective.

NIO shines when ALL of the following things are true:

  • You need to deal with many simultaneous connections, but not a great many (because a great many cannot be handled by one computer now matter how efficiently you write it; the solution then is sharding and distributed design. google.com does not run on one computer and never could, nor does twitter - that's an example of simultaneous connection requirements that exceed where NIO is useful).
  • NIO means you can be forced to 'swap out' at any time. For example right in the middle of parsing a command. That means you need buffers to store state. The state you need to store needs to be small, or NIO is not very useful.
  • The task that needs doing needs to not be CPU bound. If it is, NIO is just going to make things slower.
  • The task that needs doing needs to not be blocking-bound. For example, if, as part of the job of handling a connection, you need to connect to a database, unless you go out of your way to find a way to do so in a non-blocking way, you can't use NIO. NIO requires that you never block a thread for any reason. This means callback hell. You may want to look that up.
  • The performance benefit is so important, it is worth complicating the development of your app by an order of magnitude to accomodate it.

That leaves only a tiny window where full NIO is advisable. And that tiny window of exotic applications where it makes sense will very soon be even smaller, because Project Loom is hopefully heading for a preview release in java 17 (could be as soon as 9 months from now or so), further reducing whatever gains you could make happen with NIO.

The general setup of NIO works like so:

  • First, you make X threads, where X is about twice the amount of cores you have.
  • Then, you define a bunch of job handlers; each handler is an object that maintains the state of a job to do, in your case, one represents waiting for incoming socket connections, and for each open connection you'd also have a job object.
  • Each thread tries to manage all jobs simultaneously. This works by making asynchronous channels for each job. For each channel, you register what is 'interesting' (what would imply you can do work without waiting for I/O). Then, you go to sleep, asking java to wake up your thread if anything interesting happens on any job. You then broker with the other threads which one is going to handle any particular job that needs doing, and does it.

The state of 'I am interested in' is in continuous fluctuation.

For example, in the beginning, any connection is interested in 'if data comes in'. But then when data comes in, maybe only 'HEL' comes through, (the complete command is 'HELLO\n', for example), you'd have to remember that and just go right back to the loop. Once the full HELLO command is in, you want to write 'HEY THERE!' back, but when calling the send on that channel, only 'HEY T' is sent so far. You then want to stop being interested in 'data is available' and start being interested in 'you can send data now'. You did not want to be interested in that before, because then your thread is continuously woken up (you can send! you can send! you can send! you can send!), resulting in your fans spinning up and everything becoming slow as molasses.

Once you sent the full HEY THERE!, you want to stop being interested in 'can send' again, and start being interested in 'data available' again.

Juggling the brokering between threads and the interest flags on your channels is very complicated. You also get the fun of this tidbit:

If you ever block on anything, your app is broken, but you won't know. It'll just be really inefficient and slow but only once multiple connections start coming in. This is really hard to test, and a very easy trap to fall into. No exceptions will occur, for example. You'll also end up in callback hell.

Which gets us back to: It's rocket science. Use netty or grizzly.

EDIT: Your specific use case

As per a comment on this question, you want to write a server that will handle requests along the lines of 'RANGE 100000000-100002000', which is to be interpreted as: Send me the bytes at the stated index range from some large file.

I don't think NIO is useful for you in that case. But it might be. You'd design the system roughly like so:

You would have to use NIO2 to do your disk access asynchronously. Here is a tutorial for an introduction to this. If it's all in memory, that's much simpler.

Even if you use async NIO2 file access, you're just spinning your wheels and not making things any faster if the underlying disk isn't both very fast and entirely random access.

If you do this to a platter disk, hoo boy. Horrible performance.

The thing is, incoming requests ask for a consecutive block of bytes. spinning disks can do that much, much faster than asking for a bunch of bytes from a bunch of small locations scattered around the disk: Reading all over the disk requires the head to hop around, which is very slow.

The big risk here is that by aggressively NIO2-izing your disk access, you effectively turn your program into 'hops around the disk' mode, and performance gets worse, not better.

The better alternative is probably something simple, like this:

  • You have a threadpool. Maybe as low as 50 threads, maybe as many as 1000.
  • This is nowhere near enough to really make the VM break out in a sweat.
  • Your socketServer.accept() call will wait for a free thread in the pool in its 'accept a socket, make the handler object, hand it to the pool to process' loop, thus, if it needs to wait because no thread is available, accept calls are effectively stoppered up for a bit. That's good.

Effectively then your app will handle the first X (X = pool size) calls simultaneously and will then let the phone ring for a bit, so to speak, until one is done. This is probably what you want, anyway - aggressively parallellized disk access isn't the fastest way to read a disk.

If you really want to know, you're going to have to write this app twice and compare the two.