Understanding PHP/CGI performance considering HTTP (stateless protocol)

384 Views Asked by At

some background to the questions

I have just been browsing through some *.php files used in the moodle CMS. Moodle uses PHP-Scripts to generate content of HTML-pages send to the visitors dynamically. Often something like this happens ("inclusion cascade"):

// file: file1.php
require(file2.php);

and

// file: file2.php
require(file3.php);
require(file4.php);

etc...

Indeed starting from the requested *.php file until finally producing some output there is quite a cascade of inclusions of other files necessary. Even if this makes much sense it worries me for the reason of its impact on speed/performance. It seems that each time a lot of initialization is redone.

The question

Knowing that the HTTP protocol a stateless protocol, it would appear to me that for each request that is sent to the server, it is necessary to run through all possible initialisations done in the PHP/CGI code over and over again. Is this a valid/true assumption?

Example: I have a need to access a database and this I want to do safely using some objects that help with doing all this "safer" prepare statements/sanitizing etc. The object used for this is hence created in a file I include ( i.e. myDatabaseAccessObject.php).

With regard to the example the question is:
whether it is true, that due to the nature of HTTP being stateless, that there is no chance to keep the work of setting up (i.e parsing) the myDatabaseAccessObject.php from being done all over again upon each request?

Or does PHP over a way to cache the work already done? (if so, is done in a transparent way (i.e. the script-author can tell what to cache) or obscured way, the php-engine does some caching not visible do the author?)

Is it that I have a absolutely flawed perception of what is going on, or is indeed work done over and over again, which could be saved if some initialization necessary for the PHP-Script would have been saved between multiple subsequent requests?

2

There are 2 best solutions below

4
On

You are completely right. All database connections en other initializations are done on every PHP script execution. It is exactly because of the statelessness of the HTTP protocol.

That being said, there are ways to speed up the process. There is the PHP session handling that can do stuff for you (although it can't cache connections), Smarty for example has a decent caching and compiling system, etc.

2
On

Well, to kick off: HTTP isn't really stateless anymore. HTTP 1.1 added persistent connections, which, in itself doesn't make it stateful, but doesn't make the protocol entirely stateless as such. If HTTP 1.1 were to be truly state-less, and you would use persistent connections (chunked transfers), you'd curse the protocol for being too slow, so they've worked around it in a way, that's why I've heard HTTP 1.1 being referred to as dirty-stateless. That's the point I was getting at.

So, back to your question: Yes, a standard installation of PHP/CGI (sure you're not using fCGI?) will have to parse, compile and execute all the code for each request. It's not that big of a deal, but it's overhead nonetheless.
You can't hold states in between 2 requests, not really. This, if you come to think of it, is why many deem the static keyword rather pointless in PHP, but that's a different matter.

Your question focusses on a db connection. Well, you can use persistent DB connections, and PHP might draw the next connection from a connection-pool. But that's dangerous, messy and just an accident waiting to happen.
Conecting to a DB isn't likely to be the major bottleneck in your case. Since you're using moodle, I'd say that's going to be excessive I/O operatrions (the require-cascade of which you speak).

This can, quite easily, be avoided by caching the actual bytecode, that PHP generates when compiling your scripts. Look into APC, AFAIK, it's the most popular caching extension in use. It gives you control over what is cached, how and when...
If you like to live on the edge, and you're not working on something critical, you could even check to see how much performance gain you'd get if you compile your code to an executable