Optimising Zend Framework applications (2) – cache pages and PHP accelerator [updated]

[continue from previous post]

4. Use an op-code cache/accelerator (Apc, XCache)

PHP is very fast but it’s not compiled. A op-code cache helps. See this comparison as an example of what could be the performance increase. That does not mean we can skip the other optimisation, code bottlenecks should be removed anyway.
Optimising code is also helpful to understand what are the common code “slowness” and avoid writing them again in the future.

5. Cache pages (before zend framework bootstrap)

Even though you have optimised the code, you still have to bootstrap and run the zend application, and the whole process takes time (dispatching, controller logic, scripts  render). An solution I’ve recently used is caching the whole HTML as a result of the processing (see another post about caching HTML pages of a generic website).

There are many solutions ways to cache the output (apache modules, reverse proxies, zend server page cache). The best depends on the needs. Moving the logic to the application level usually allows more customisations.

Page caching can be done by using Zend_Cache_Frontend_Page. It basically uses ob_start() with a callback that saves the result into cache. I haven’t found any interesting article about its best use, except the fact that it should be used in a controller plugin. I’d say it’s better to instantiate a separate cache object and activate it directly into the index.php and caching before the zend application actually start (boostrap). In my local enviroment, when the page cache is valid, the response time is 2 ms, against 800 ms required to bootstrap and load the application.
See the following code to see where to place it. Note: I instantiate Zend_Application with no options just to have the autoloader available to load the needed classes.

# index.php

// create APPLICATION_* vars and set_include_path [...]

require_once 'Zend/Application.php';
$application = new Zend_Application(APPLICATION_SERVERNAME);
// Zend_Cache_Frontend_Page
$pageCache = Zend_Cache::factory(
    'Page', 'File',
    array(
        'caching' => true,
        'lifetime'=>300, //5 minutes
        'ignore_user_abort' => true,
        'ignore_user_abort' => 100,
        'memorize_headers' => true,
        // enable debug only on localhost
        'debug_header' => APPLICATION_ENV == 'localhost',
        'default_options' => array(
            'cache' => true,
            // test the following, depends on how you use sessions and browser plugins
            'cache_with_cookie_variables' => true,
            'cache_with_post_variables' => false
        ),
        // whitelist approach
        'regexps' => array(
           '^.*$' => array('cache'=>false),
            /* homepage */
            '^/$' => array('cache'=>true, 'specific_lifetime'=>650),
            /* example of other pages*/
            '^/(pagex|pagey)/' => array('cache'=>true,'specific_lifetime'=>700),
            // []...
        )
    ),
    array(
        'cache_dir' => APPLICATION_PATH . '/data/pagecache',
        'hashed_directory_level'=>2,
        'file_name_prefix'=>'zendpagecache'
    )
);
// start page cache, except for cron job and when user is logged
// note: I haven't tested yet if using Zend_Auth here is a good solution
if (PHP_SAPI !=='cli' && !Zend_Auth::getInstance()->hasIdentity()) {
  $pageCache->start();
}
// the following code is not executed when page cache is valid
$appSettings = new Zend_Config_Ini(APPLICATION_PATH . '/configs/sites.ini', APPLICATION_SERVERNAME);
$application->setOptions($appSettings->toArray());

$application->bootstrap();

if (PHP_SAPI !=='cli') {
    $application->run();
}

Of course page caching must be set carefully, depending on the traffic of the application. Ideally, a customised cron job should fetch the pages, invalidate the cache for each one and rebuild them before they normally expire, so that the user will always find the fast cached page. On my project a similar system has improved the average loading time by 8 times (so an external spider will consider the site in a different way). Links to keep cached should be at least the ones on the sitemap.

Set carefully all the settings of zend page cache frontend: cookie, get and post data, regexpr of URLs to cache. Note that no logic is executed when the cache is valid, so an eventual visitors counter or any other database writing query will not work.

Optimising Zend Framework applications (1) – cache db objects, PHP code profiling and optimisation

I’m optimizing some zend framework applications these days and I’ve been reading, researching, optimising and measuring results.

Some links to read before:
Optimising a Zend Framework application – by Rob Allen – PHPUK February 2011
Profile your PHP application and make it fly – by Lorenzo Alberton – PHPNW 9 Oct 2010

The optimisation process is iterative: measure the performances, improve the worst problem and starto over. Go on until performances are not satisfactory.
That does not mean that we can start writing our application and optimising everything later. An eye on performances is needed since the beginning, in order to select an architecture that will support the traffic.

In this post I’ll explain some common techniques I’ve used to optimise a high traffic application. Optimisation is done at the end, but we should be aware of what can be done, in order to design the application in a way that is easier to optimise if needed.

1. Cache database calls with Zend_Cache

Database is often a relevant bottleneck, especially when shared with other applications and we cannot expect it responding immediately. A database server often make the script hanging and increase the average page loading time. Note that in some scenarios (e.g: local server, simple queries operating on table with small indexes and db server caching in memory) the db could be faster than the a disk access to get the cached object. Measure if possible.

For a recent application, I’ve written a wrapper/system to automatically cache all the database models methods to get data using a default lifetime. Depending on the architecture of the site, as possible the cache should not expire and be invalidaded when the data is manually flagged as invalid (from admin area). If that is not possible, set a reasoable lifetime.

The normal cache logic is : if valid, load from disk/memory cache, if not, load fresh data (db). In case the loading of fresh data is not possible (e.g.: db not answering), I suggest a change: if the fresh data not available, use old cache instead of throwing exception or interrupt the application (and log the error). If you use Zend cache, look at the second param of  Zend_Cache_Core::load().

2. Profile and optimise queries

Even though the data is retrieved most of the time from the cache, queries might be need to be optimised: explain queries and check if indexes are added, select which index to use, change field type if needed (an interesting book about it). Remember that MySQL (and in general any db server) works very fast with simple queries (no joins) and small indexes. Consider denormalising the db structure and avoid some frequent joins. You can use MySQL triggers to automatically update the columns when changed on the parent tables.

3. Profile and optimise PHP code

To profile the PHP scripts use Xdebug  profiler + firefox extension (to enable when needed via cookie)  + K/WinCacheGrind. Another tool is xhprof (web based) by facebook, that shows the functions most frequently called.

Command line tool for CRUD (Zend Framework scaffolding libraries)

I’ve created a command line tool for my scaffolding zend framework libraries.

Those libraries are basically abstract related classes (Controller, Form, DbTable, Filter and Order forms) that contains all the logic to make CRUD (Create, Read, Update, Delete) forms for admin area. Thanks to those libraries, a complete CRUD logic requires only a few lines of codes. Everyting is completely customisable by extending featuers. See the page for demo, svn, other info

That tool reads automatically table structure and automatically creates controller with CRUD actions, forms, filters forms, order forms, templates.  Of course everyting is completely customizable (in zend framework style by using inheritance).

example

// create CRUD action to manage table categories.
// The classes named using "Categories" as prefix or suffix
php -f zfcrud.php create crud Categories -t categories -d mydb

see source file zfcrud.php

How to manage developers

I’ve just read this article and I’ll report some parts of it with my comments

Do you work with lenient working hours?
“Enforcing a 9 to 5 working day is fine when you are running a factory. But at 5 o’clock, a programmer doesn’t stop thinking about the problems and tasks at hand. […]. Their minds keeps spinning and thinks about better solutions to the problems they face every day at a rate of 24 hours a day”.. [..]I sometimes cannot do anything useful for a whole day, while the next day I keep on going and going. […]. I don’t do important migrations at 7 o’clock in the morning right after a night sleep, but I do not mind them at 2 o’clock after a whole day of work.. […]. Now, I do not say that developers should be able to walk in the door whenever they feel like, but merely that if a programmer wants to continue until late that evening, they should be able to do it and be compensated for it, most probably by time off. They don’t work like that because they want to, they work like that because their mind is filling up and they need to type it down before it’s gone. It’s important to stimulate this way of working.”

Very true. Developing is often challenging and elaborate, and cannot be done at any time. Forcing our mind to develop a central component when we are not fully concentrated will probably make us waste time only to completely redesign it later after some other dependent components have been made.
Often we have brilliant solutions to an 8 hour task that allows us to code a perfect solution for the problem in less then 30 minutes. Developers – who learn, try new solutions all the time, think widely and keep improving –  are often able to do tasks much faster than other developers. That’s not normally true for other jobs.

Do you give enough time for unit testing?
“Unit testing will reduce the number of bugs massively and it is part of the programming cycle just as compiling/interpreting, deploying and writing specifications are. Off course, deadlines are always lurking around, but they should not affect QA time. “

That’s something really hard to understand for someone who is not involved directly with developing. If well used, TDD reduces the time taken to  code the solution.

Do you give enough time for planning?
“More often than not I had to start on a project without knowing what’s going on just because there weren’t any specifications. Merely some vague idea’s and some screenshots or wire frames. Try to think of it this way: You need to build a house, but you don’t know if it’s going to be a small-sized apartment or a sky-scraper… Now, your task is to start building the concrete foundations.. Are YOU able to do that?”

I agree with that. I like explaining the problems of a project  to non-technical people using a skyscraper as an example. If I don’t have time to plan carefully I can build some floors but if later I’m asked to build other 100 floors, I’m afraid I’ll have to destroy everything, build better basements and start over. If I’m only given cardboard to build walls, and if you ask me to make windows I’ll then have to (again) destroy everything and make new walls with the right materials.

Are you and others respecting “The Zone”?
“We used to have a unwritten rule in a company I used to work for: if the headset is on, do not disturb unless there is a fire. “The zone” is when programmers are so mentally focused that they run on 110% efficiency. Getting into the zone is difficult, getting pulled out of one is way too easy. As soon as I’m in the zone and somebody comes up to me to ask if in_array() is haystack/needle or needle/haystack I’m out of the zone instantly. First of all, I don’t know the answer myself so I have to look it up on php.net, just like they would need to do. So a 10 second interruption, means that getting back in the zone will take hours, if it’s possible at all.”

That’s right. And it has nothing to do with the ability to keep concentrated. If I’m asked to help, I have to interrupt my mental process and it will take time to restart (sometimes a lot when thinking about complex solutions). Most of the answers to our technical questions are on google, so … RTFM 😀

Do you minimize meetings?
“Meetings are a great way of getting programmers out of the zone. First of all, most meetings are NOT interesting for programmers. Meetings tend to drift away into area’s that most programmers do not care about anyway. “

When two skilled developers speak, they understand each other in a few seconds whereas a communication between technical and non-technical people require much more time. It’s important to have analysts / project managers with a technical background.

Do you have enough distraction for programmers?
“At my first software company I worked for, we used to have a big couch in the middle of the hallway. That was A-MA-ZING.. You could relax, actually take a nap and nobody would bother you. [..] I need those periods a few times a day and most of the time a few minutes is more than enough. Don’t think programmers are doing nothing just because you don’t hear clickety-click all the time.”

I think that’s a good idea. We have nothing like that where I’m working now but I remember when I was working freelance I had lots of brilliant ideas when relaxing for a few minutes on the garden or walking around.

Do you give back to the opensource community?
“ Do some bugfixing for Zend framework or PHP, work on a opensource project on github, or even deploy tools you have developped internaly to the outside world as open source and let others help you improving the code. […] your programmers will see other people code, learn from it, and see how it is to work in large projects, which will benefit them, you and the company they work in.”

Some companies’ policies do not allow to re-distribute the code outside with any licence whilst they make money using open source tools. Where is the trick? Altruistic companies do the job and selfish companies make money? I don’t think so.

Do you let your programmers do research?
“How about the fact that you can’t implement new techniques just because you don’t have the knowledge or you are not even able to see their potential just because nobody around in your company can spend even 5 minutes to take a look at it? Research is important. Programmers will gain knowledge which they can pass through to the products your company develops.”

That’s EXTREMELY important ! A company could think that investing in training is a waste of time in case developers leave the company. Wrong! Differently from all the other jobs, development must include daily training / learning (blogs – especially phpntips 😀 – and articles, books, PHP conferences and meetings, and reading /understanding already written CODE) that gives immediate results and improvements.