Optimising Zend Framework applications (2) – cache pages and PHP accelerator [updated]

[continue from previous post]

4. Use an op-code cache/accelerator (Apc, XCache)

PHP is very fast but it’s not compiled. A op-code cache helps. See this comparison as an example of what could be the performance increase. That does not mean we can skip the other optimisation, code bottlenecks should be removed anyway.
Optimising code is also helpful to understand what are the common code “slowness” and avoid writing them again in the future.

5. Cache pages (before zend framework bootstrap)

Even though you have optimised the code, you still have to bootstrap and run the zend application, and the whole process takes time (dispatching, controller logic, scripts  render). An solution I’ve recently used is caching the whole HTML as a result of the processing (see another post about caching HTML pages of a generic website).

There are many solutions ways to cache the output (apache modules, reverse proxies, zend server page cache). The best depends on the needs. Moving the logic to the application level usually allows more customisations.

Page caching can be done by using Zend_Cache_Frontend_Page. It basically uses ob_start() with a callback that saves the result into cache. I haven’t found any interesting article about its best use, except the fact that it should be used in a controller plugin. I’d say it’s better to instantiate a separate cache object and activate it directly into the index.php and caching before the zend application actually start (boostrap). In my local enviroment, when the page cache is valid, the response time is 2 ms, against 800 ms required to bootstrap and load the application.
See the following code to see where to place it. Note: I instantiate Zend_Application with no options just to have the autoloader available to load the needed classes.

# index.php

// create APPLICATION_* vars and set_include_path [...]

require_once 'Zend/Application.php';
$application = new Zend_Application(APPLICATION_SERVERNAME);
// Zend_Cache_Frontend_Page
$pageCache = Zend_Cache::factory(
    'Page', 'File',
    array(
        'caching' => true,
        'lifetime'=>300, //5 minutes
        'ignore_user_abort' => true,
        'ignore_user_abort' => 100,
        'memorize_headers' => true,
        // enable debug only on localhost
        'debug_header' => APPLICATION_ENV == 'localhost',
        'default_options' => array(
            'cache' => true,
            // test the following, depends on how you use sessions and browser plugins
            'cache_with_cookie_variables' => true,
            'cache_with_post_variables' => false
        ),
        // whitelist approach
        'regexps' => array(
           '^.*$' => array('cache'=>false),
            /* homepage */
            '^/$' => array('cache'=>true, 'specific_lifetime'=>650),
            /* example of other pages*/
            '^/(pagex|pagey)/' => array('cache'=>true,'specific_lifetime'=>700),
            // []...
        )
    ),
    array(
        'cache_dir' => APPLICATION_PATH . '/data/pagecache',
        'hashed_directory_level'=>2,
        'file_name_prefix'=>'zendpagecache'
    )
);
// start page cache, except for cron job and when user is logged
// note: I haven't tested yet if using Zend_Auth here is a good solution
if (PHP_SAPI !=='cli' && !Zend_Auth::getInstance()->hasIdentity()) {
  $pageCache->start();
}
// the following code is not executed when page cache is valid
$appSettings = new Zend_Config_Ini(APPLICATION_PATH . '/configs/sites.ini', APPLICATION_SERVERNAME);
$application->setOptions($appSettings->toArray());

$application->bootstrap();

if (PHP_SAPI !=='cli') {
    $application->run();
}

Of course page caching must be set carefully, depending on the traffic of the application. Ideally, a customised cron job should fetch the pages, invalidate the cache for each one and rebuild them before they normally expire, so that the user will always find the fast cached page. On my project a similar system has improved the average loading time by 8 times (so an external spider will consider the site in a different way). Links to keep cached should be at least the ones on the sitemap.

Set carefully all the settings of zend page cache frontend: cookie, get and post data, regexpr of URLs to cache. Note that no logic is executed when the cache is valid, so an eventual visitors counter or any other database writing query will not work.

Optimising Zend Framework applications (1) – cache db objects, PHP code profiling and optimisation

I’m optimizing some zend framework applications these days and I’ve been reading, researching, optimising and measuring results.

Some links to read before:
Optimising a Zend Framework application – by Rob Allen – PHPUK February 2011
Profile your PHP application and make it fly – by Lorenzo Alberton – PHPNW 9 Oct 2010

The optimisation process is iterative: measure the performances, improve the worst problem and starto over. Go on until performances are not satisfactory.
That does not mean that we can start writing our application and optimising everything later. An eye on performances is needed since the beginning, in order to select an architecture that will support the traffic.

In this post I’ll explain some common techniques I’ve used to optimise a high traffic application. Optimisation is done at the end, but we should be aware of what can be done, in order to design the application in a way that is easier to optimise if needed.

1. Cache database calls with Zend_Cache

Database is often a relevant bottleneck, especially when shared with other applications and we cannot expect it responding immediately. A database server often make the script hanging and increase the average page loading time. Note that in some scenarios (e.g: local server, simple queries operating on table with small indexes and db server caching in memory) the db could be faster than the a disk access to get the cached object. Measure if possible.

For a recent application, I’ve written a wrapper/system to automatically cache all the database models methods to get data using a default lifetime. Depending on the architecture of the site, as possible the cache should not expire and be invalidaded when the data is manually flagged as invalid (from admin area). If that is not possible, set a reasoable lifetime.

The normal cache logic is : if valid, load from disk/memory cache, if not, load fresh data (db). In case the loading of fresh data is not possible (e.g.: db not answering), I suggest a change: if the fresh data not available, use old cache instead of throwing exception or interrupt the application (and log the error). If you use Zend cache, look at the second param of  Zend_Cache_Core::load().

2. Profile and optimise queries

Even though the data is retrieved most of the time from the cache, queries might be need to be optimised: explain queries and check if indexes are added, select which index to use, change field type if needed (an interesting book about it). Remember that MySQL (and in general any db server) works very fast with simple queries (no joins) and small indexes. Consider denormalising the db structure and avoid some frequent joins. You can use MySQL triggers to automatically update the columns when changed on the parent tables.

3. Profile and optimise PHP code

To profile the PHP scripts use Xdebug  profiler + firefox extension (to enable when needed via cookie)  + K/WinCacheGrind. Another tool is xhprof (web based) by facebook, that shows the functions most frequently called.

Automatic caching of Zend_Db (or any object) calls

I’m working on a personal Zend framework application that uses Zend_Db_Table_Abstract classes to get data from the database.
Each class has its own methods (getall, getXX, getYYY) etc…

My need was having all those method calls (around the code) automatically cached.
The first solution in my mind was a general wrapper (container) class that uses magic methods to call and cache methods of the inner class (subclass of Zend_db_Table_Abstract in this case).
The cache ID can be calculated by using class name +function name + arguments (md5+serialize).

public function __call($name, $arguments)
{
   $cacheId = get_class($this->wrappedObject) . $name
                . md5(serialize($arguments));
   // ...
   $callBack = array($this->wrappedObject, $name);
   //... cache logic...
   $ret = call_user_func_array($callBack, $arguments);
   // ...
}

I wanted the wrapper to work in a “invisible mode” in order to keep the IDE autocompletition for the model methods. So I’ve made a static method that instantiates the wrapper on the object passed by reference.

public static function createOn(&$wrappedObject, $lifetime = null)
{
    $wrappedObject = new self($wrappedObject);
    //...
}

Here it is an example of use

$modelCategories = new Application_Model_Categories();
CacheWrapper::createOn($modelCategories /*, 7200*/); //2nd param = lifetime
/* $modelCategories is now and instance of CacheWrapper 
 *  but the IDE still thinks is teh model */
$data = $modelCategories ->getAll(); //cached !
$data2 = $modelCategories ->getAll(1,2); //cached !

Cache and expiring time (value “randomized” by 10%) can be set using a static method (or taken from registry if existing).

EDIT 3/9/2013
source code moved to github: https://github.com/elvisciotti/CacheWrapper

Improve performances of any web site: cache pages with Zend Framework

Zend Framework is decoupled, so it’s possible to use only the needed features.

I had to improve the performances of an old site, with no cache at all. Changing all the database/file_get_contents requests would have been a long process, so I decided to enable the cache of the web pages with Zend Framework  FrontEnd Page Cache using only few lines of code at the beginning of each page.
That web site uses a header file so I placed there the code only once.
Continue reading

PHP security: input validation and XSS

for lots of reasons, security included, it’s very important to validate all user input and protect our application from intentional or accidental wrong inputs.

First of all, here are some advices
  • don’t use register_globals ! A malicious user may change value of script variables simply modifying the query string
  • don’t use $_REQUEST: possibile loss of data. Use $_GET and $_POST

Number validation
use cast operator to validate numbers. If the malicious input is “‘); drop table users“, the casted (int) value will be zero.

When you use casting, remember the maximum size of int to prevent overflow. you may use (float) instead of (int).
Remember: the decimal separator is the point, not the period !
A numeric value with period (,) is not numeric, and a float casting will be delete the decimal part.
String validation
First of all, the input may contain stressed letters (French and Italian languages do, for example à,è,ì,ò and ù).
Use
set_local(LC_CTYPE,”french”)
The stressed characters will be converted into the corresponding non-stressed characters (à->a, è->e, etc…).
To validate string inputs, use regular expressions !
ereg($patter, $string) // or eregi: same arguments but case insensitive
example:
if (ereg(“^[0-9]{5}$”, $_POST[‘postcode’])!==false) { /*postcode valid*/ }
“^[0-9]{5}$” means: string that contain 5 characters, each must be a number
don’t forget “^” and “$” , or “a12345b” will be considered valid.
Use regular expression to validate various type of input: e-mails, URLs. For further details, php.net examples and google !
File uploads
as second argument of
move_uploaded_file ( string $filename , string $destination )
use basename($_FILE[“fieldName”][“name”]) to increase security
$filename must be $_FILE[“fieldName”][“tmp_name”]
Don’t trust $_FILE[“fieldName”][“type”]
If you expect a image, use getimagesize(). If isn’t an image, the return value will be FALSE
If the size of the $destination file is different form the size of the temp file ($filename), delete it !!
Do not use magic quotes !
Magic quotes doesn’t escape correctly all the special characters !
it’s better to use specific db functions, such as
mysql_real_escape_string()
Pay attention to the current PHP configuration, you may do a stripslashes to $_GET data if the magic quotes are enabled. Use the get_magic_quotes_gpc()
Do not allow the user to modify serialized data !
A malicious user may be change and array with 100 elements a:100:{} to an array with millions of elements (it requires lots of memory).
If you need to pass serialized data, you should use a two-way cipher algorithm with key.
XSS
Cross site scripting allows users to insert malicious code in frontend via form submit. When the result page with the user data will be displayed (examples: weblog comments or site guestbook), the code will be executed or shown.
Examples of malicious inputs:
  • [? passthru(“rm -rf *”); ?]
  • [iframe src=”http://mysite.com/banner.php” … /]
Some tips:
  • Convert characters to html entities. the “less and greater than” will be converted in HTML entities preventing not-expected HTML formatting or PHP code execution
    pho function: htmlspecialchars() //converts “ampersand”, “double and single quote”, “less and greater than” into HTML entities
  • Pay attention to quotes or double quotes if you compose tag attributes with user’s data:
    print “[a href='{$_GET[‘link’]}’]text[/a]”;
    if $_GET[‘link’] is: #’ onclick=’alert(‘malicous js code’)’ title=’
    the result of the link will be execute a user-defined javascript
  • If the HTML tags are not allowed, you can use strip_tags() and remove all the tags from the user data. Note: strip_tags() does not convert special chars !
  • To only remove attributes to user submitted tags, use preg_replace()
  • If the user data is a URL, remember to check that it will not start with “javascript:” to prevent javascript code, or (better), parse with eregi()
  • to obtain a IP-based accesscontrol, you should also consider the proxy. If the user uses it, the real IP is contained in the variabile HTTP_X_FORWARDED_FOR, and validate it by using ip2long(), then long2ip() and check if the IP remain the same
  • HTTP_REFERER is not reliable, some browsers don’t send it