Why use backend server and RPC in Web Server infrastructure?

1.9k Views Asked by At

I'm interested in creating a web application and I've just done some research on the what makes a good web server. I've search through the facebook, twitter and foursquare. They share what software they used to build their infrastructure.

For me, some of the software used are new. I'd like to ask some questions here. why create a back end server, isn't a web server running PHP is enough? Why use java/scala for backend? Do we really need RPC framework such as thrift/protocol buffer? What is that RPC framework used for? Is it used for communication between frontend and backend servers?

Really appreciate for those who answer my questions, or if there's some books you would suggest me to read.

Thank you.

2

There are 2 best solutions below

0
On

It sounds as though you'd like to build a scalable backend infrastructure that ultimately will be used to do the following:

  1. Serve content. This is the web server layer.
  2. Perform some type of back end processing for user requests coming in from the web server layer and communicate with the data store. Call this the application server layer.
  3. Save session state and user data in a distributed, fault tolerant, eventually consistent key value store.

Also, it sounds as though you want to do this using commodity PC hardware.

This is a tall order.

Foursquare uses Scala with the Lift framework, jetty for their web server. Here's more. And more.

Facebook uses many different technologies. I know that for their data store they use HBase (they were using Cassandra)

Yahoo uses HBase to keep track of user statistics.

Twitter started as a Ruby-backend web site. They moved to Scala. Twitter is incrementally moving from mysql (I assume sharded) to Cassandra using their proprietary incremental database conversion tool.

As far as scaling on the application server and web server end, I know that what really counts is having a language that has the ability to spawn new user processes in user space and a manager process that assigns new worker processes the requests coming in. Think of it as running a very efficient company. The more work you've got coming in, the more people you hire. This is the Actor model. Some languages have actors built in,(erlang) others have actors implemented as frameworks(akka) or libraries (Scala native). Apparently, Scala's native actors are buggy so some people got together and implemented the akka framework for Scala and Java. There's a lot of discussion online regarding actors and which language and libraries one should use. Erlang has a lot going for it out of the box, however, Scala runs in the JVM and allows you to reuse a lot of the existing Java web libraries (which could have some issue if they happen to have static objects declared in them) Erlang has actors and the OTP libraries, but apparently does not have the rich libraries that Java has. So, for me it really boils down to Scala (with akka) or Erlang.

For the web server, with Scala, you can use any java app server. Foursquare uses jetty for most things. It's not written in Scala, but since Scala compiles down to bytecode that runs on the JVM it easily interops with any java app server.
People also say that there aren't that many Erlang programmers and that Erlang is harder to learn (functional programming vs imperative programming) Scala is functional and imperative at the same time (meaning you can do either)

Erlang is functional. Now, functional programming has a lot of things going for it as one expert functional programmer can get a lot more done than an expert imperative programmer. Yahoo stores was originally written and maintained in Lisp (functional language) by one man. On the other hand, imperative programming is easier to learn and used widely in a team setting. Imperative languages are good for some things, functional languages for others. The right tool for the right job.

Back to the web server discussion, with Erlang, you can use yaws or you can run a framework (Chicago Boss)

Here's more on the Scala vs Erlang debate.
Another link.
More here.
And another.
Another opinion.

On the database end, you have a lot of choices. See here. You can even eschew the database all-together and save your data in mnesia (Erlang's runtime data store)

My answer is not complete as this topic (scaling app servers, databases and web servers) is very complicated and full of debate. Some frameworks even blur the tiers (web server, application server, database) distinction and integrate a lot of the functionality of these layers within the framework itself.

0
On

For example, I encounter a lot of problems developing complex webapp using PHP only. PHP has no threads, php is lacking many good things that has scala, or another good modern language with rich syntax. PHP is slow comparing to compiled JVM language. PHP is less secure in my opinion. It is good to get a bunch of data and render as HTML page, but processing for high load is not its plus. RPC as you suggest serves as communication layer.