Optimising Zend Framework applications (1) – cache db objects, PHP code profiling and optimisation

I’m optimizing some zend framework applications these days and I’ve been reading, researching, optimising and measuring results.

Some links to read before:
Optimising a Zend Framework application – by Rob Allen – PHPUK February 2011
Profile your PHP application and make it fly – by Lorenzo Alberton – PHPNW 9 Oct 2010

The optimisation process is iterative: measure the performances, improve the worst problem and starto over. Go on until performances are not satisfactory.
That does not mean that we can start writing our application and optimising everything later. An eye on performances is needed since the beginning, in order to select an architecture that will support the traffic.

In this post I’ll explain some common techniques I’ve used to optimise a high traffic application. Optimisation is done at the end, but we should be aware of what can be done, in order to design the application in a way that is easier to optimise if needed.

1. Cache database calls with Zend_Cache

Database is often a relevant bottleneck, especially when shared with other applications and we cannot expect it responding immediately. A database server often make the script hanging and increase the average page loading time. Note that in some scenarios (e.g: local server, simple queries operating on table with small indexes and db server caching in memory) the db could be faster than the a disk access to get the cached object. Measure if possible.

For a recent application, I’ve written a wrapper/system to automatically cache all the database models methods to get data using a default lifetime. Depending on the architecture of the site, as possible the cache should not expire and be invalidaded when the data is manually flagged as invalid (from admin area). If that is not possible, set a reasoable lifetime.

The normal cache logic is : if valid, load from disk/memory cache, if not, load fresh data (db). In case the loading of fresh data is not possible (e.g.: db not answering), I suggest a change: if the fresh data not available, use old cache instead of throwing exception or interrupt the application (and log the error). If you use Zend cache, look at the second param of  Zend_Cache_Core::load().

2. Profile and optimise queries

Even though the data is retrieved most of the time from the cache, queries might be need to be optimised: explain queries and check if indexes are added, select which index to use, change field type if needed (an interesting book about it). Remember that MySQL (and in general any db server) works very fast with simple queries (no joins) and small indexes. Consider denormalising the db structure and avoid some frequent joins. You can use MySQL triggers to automatically update the columns when changed on the parent tables.

3. Profile and optimise PHP code

To profile the PHP scripts use Xdebug  profiler + firefox extension (to enable when needed via cookie)  + K/WinCacheGrind. Another tool is xhprof (web based) by facebook, that shows the functions most frequently called.

Implementing page counter (MySQL memory temporary table + MySQL procedure)

Although there is google analytics, web applications often have to save statistics of visits (IP, page visited, timestamp) in order to make some business logic. Example: websites with articles (blog or newspaper) with features like “most read articles”, “articles most read from the visitors of the current article” etc…

Save  visits into a MySQL memory table

We want to avoid to count more than one visit from the same user in case of accidental (or malicious) refresh. A cookie/session approach is not enough for that and a database approach is needed. We’ll use a table where we save user IP, timestamp (it will also make trivial to implement a site feature like “number of users on the site / reading the same article”).

CREATE TABLE IF NOT EXISTS `news_counter` (
  `news_id` int(10) NOT NULL,
  `remote_addr` char(15) NOT NULL,
  `created` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
  `timeslot` int(11) NOT NULL COMMENT 'time() / 600',
  UNIQUE KEY `news_id` (`news_id`,`remote_addr`,`timeslot`)
) ENGINE=MEMORY

Note: `timeslot` can be calculated with PHP ceil(time()/600) to have the same value in intervals of 10 minutes.
That value, together with the fields `news_id` and `remote_addr` (IP of the user), are used to make an UNIQUE key in order to avoid to insert a duplicate-visit record (of course the database exception must be suppressed).

Cron job to move visits to an archive table and increment page counter, using a MySQL procedure.

I’m using a MEMORY storage engine as the idea is running a cron job (every 10 minutes) to move this data to a MyISAM table (news_counter_archive) with the same structure. That will avoid having the table growing too fast and become a performance bottleneck.
That cron job can also update a INT counter on the table with articles/news. Better not to run this operation at each page visit to avoid MyISAM table locking (due to mixed read/write) and having more performances (as we don’t want the statistics immediately updated).

Here it is a MySQL (tested on v5.1.53 linux) procedure that takes the records from the memory table, move to the archive table and update the counter of the articles (`news` table)

DELIMITER //
CREATE PROCEDURE update_news_views()
BEGIN
 DECLARE var_news_id INT(11);
 DECLARE var_remote_addr CHAR(15);
 DECLARE var_created TIMESTAMP;
 DECLARE c CURSOR FOR
 SELECT `news_id`, `remote_addr`, `created` FROM `news_counter` WHERE `created` < DATE_ADD(NOW(), INTERVAL -1 MINUTE) ORDER BY `news_id`;
 OPEN c;
 BEGIN
   DECLARE EXIT HANDLER FOR SQLSTATE '02000' BEGIN END;
   LOOP
   FETCH c INTO var_news_id, var_remote_addr, var_created;
	UPDATE `news` SET `views`=`views`+1 WHERE `news_id` = var_news_id;
        INSERT INTO `news_counter_archive` (`news_id`, `remote_addr`, `created`) VALUES (var_news_id, var_remote_addr, var_created);
   END LOOP;
 END;
 DELETE FROM `news_counter` WHERE 1;
END;
//
DELIMITER ;

Cron job
using mysql (or call it with PHP):

mysql -h [host] -u [username] -p[pass] [dbname] -e "CALL update_news_views()";

Some usefult links about MySQL procedures and cursors
http://dev.mysql.com/doc/refman/5.1/en/create-procedure.html
http://forge.mysql.com/wiki/Cursors

 

Update  11/3/2011

– Mysql triggers are a very good solution when there other application (not changeable for technical reasons or time reasons) that work with the same tables. I’d say some big drawbacks are portability (migration to non-server db is a problem) and maintainability (much better to make logic with PHP).

– When statistics do not need to be exact, another easier solution (to avoid MyISAM locks), is incrementing by N when mt_rand(1,N)==1. So 1/Nth of queries and same results with relevant big numbers.

 


Automatic caching of Zend_Db (or any object) calls

I’m working on a personal Zend framework application that uses Zend_Db_Table_Abstract classes to get data from the database.
Each class has its own methods (getall, getXX, getYYY) etc…

My need was having all those method calls (around the code) automatically cached.
The first solution in my mind was a general wrapper (container) class that uses magic methods to call and cache methods of the inner class (subclass of Zend_db_Table_Abstract in this case).
The cache ID can be calculated by using class name +function name + arguments (md5+serialize).

public function __call($name, $arguments)
{
   $cacheId = get_class($this->wrappedObject) . $name
                . md5(serialize($arguments));
   // ...
   $callBack = array($this->wrappedObject, $name);
   //... cache logic...
   $ret = call_user_func_array($callBack, $arguments);
   // ...
}

I wanted the wrapper to work in a “invisible mode” in order to keep the IDE autocompletition for the model methods. So I’ve made a static method that instantiates the wrapper on the object passed by reference.

public static function createOn(&$wrappedObject, $lifetime = null)
{
    $wrappedObject = new self($wrappedObject);
    //...
}

Here it is an example of use

$modelCategories = new Application_Model_Categories();
CacheWrapper::createOn($modelCategories /*, 7200*/); //2nd param = lifetime
/* $modelCategories is now and instance of CacheWrapper 
 *  but the IDE still thinks is teh model */
$data = $modelCategories ->getAll(); //cached !
$data2 = $modelCategories ->getAll(1,2); //cached !

Cache and expiring time (value “randomized” by 10%) can be set using a static method (or taken from registry if existing).

EDIT 3/9/2013
source code moved to github: https://github.com/elvisciotti/CacheWrapper

Profiling MySQL

To analyze the db server usage in a complex PHP application, the first step is to profile the db server.
There are lots of tools to profile, but I think it’s very easy to make a customized code to save the data really needed.
The idea is save information about some queries in the production environment (about 1% of the queries is usually enough, depending on the traffic).

MySQL profiling
Hoping there is a class used to manage queries (or at least mysqli class), it doesn’t take long to replace a function that manages the queries with something similar to the following code (written in a simple way to show the idea):


class DB extends mysqli {
...
function query ($q) {

$start = microtime(1);
$this->query($q);
$wtime = microtime(1) - $start;
#save 1% of the queries + info
if ( rand(0,100)<1>query("INSERT DELAYED INTO `ProfileData` (q,wtime,created,...) ($q,$wtime, NOW(), ...) ");
}

}
...
}

What other info to save ? some ideas:

  • client IP
  • other $_SERVER info: user-agent, request_method, etc…
  • PHP backtrace (to understand which line launched the query)
  • web server load
  • mysql server load

How to analyze results making queries on the `ProfileData` table.
example: queries grouping by query and showing the average time of the queries. In this way, you can find what queries are the slowest ones.


-- select the slowest queries (average time) in the last 24 h
-- exclusion of the queries executed only once to exclude missing sql cache
SELECT `q`,AVG(`wtime) as "medium time", COUNT(`id`) as "occurences"
FROM `ProfileData`
WHERE `created` > DATE_ADD(NOW(), INTERVAL -1 DAY)
GROUP BY `q`
HAVING COUNT(`id`) > 2
ORDER BY AVG(`wtime) DESC

MySQL dump importing

Today I realized that “mysqlimport” is not working as expected on Wamp environment.
A working way to import a sql/dump file is to use the “mysql” executable

#localhost
mysql –u root -p –user=root –force [DBNAME] < [FILE.SQL]

Mass rows copying (duplicating) with filed customization – MySQL

Let’s suppose we have a table with the following structure and data:

# `table`
id | a | b | c |
—————————
1 | “aaa” | “xxx” | “ccc”
2 | “aab” | “yyy” | “ccc”
3 | “aba” | “yyy” | “ccc”
4 | “abc” | “xxx” | “ccc”
5 | “dcz” | “xxx” | “eee”

Now, we want to copy some records (only some columns) to the same table AND change some of them with a fixed value.
Requirements:
– duplicate all the rows with `b` equal to “xxx”
– for the new rows inserted, the value of `c` has to be changed to “ddd” (fixed value!)
– `id` is the primary key, auto increment

Query (pay attention to the quote characters):
INSERT INTO `table`
(SELECT

NULL as `id`,
`a`,
`b`,
“ddd” as `c`
FROM `table`
WHERE `b`=”xxx”)

Result:

id | a | b | c |
—————————
1 | “aaa” | “xxx” | “ccc” #row1
2 | “aab” | “yyy” | “ccc”
3 | “aba” | “yyy” | “ccc”
4 | “abc” | “xxx” | “ccc” #row4
5 | “dcz” | “xxx” | “eee” #row5
6 | “aaa” | “xxx” | “ddd” #copied from #row1
7 | “aab” | “xxx” | “ddd” #copied from #row4
8 | “dcz” | “xxx” | “ddd” #copied from #row5

Note 1:
The insert operation by using a subquery is not an ‘atomic’ operation for the recent versions of MySQL. If there are table constraints and a new record is not accepted (e.g: duplicate record for a key defined on columns), only that record won’t inserted (not all of them!).
Example: If before there was a record with the same values as in line 9 and there was a unique key on columns (a,b,c), the query would insert only rows on line 6,7 and 8 (9 fails).
Some old version of MySQL stop execution in case of duplicate entries. In this case, add IGNORE to the query to skip duplicates: INSERT IGNORE INTO
Note 2:
If you add a new column to the table, the query will fail !!
Tested on MySQL 5.1.36

MySQL : version differences notes (draft)

Relevant features added in Mysql 4.0

  • FULLTEXT search
  • INNODB tables (and on [delete|update features]
  • UNION statement
  • TRUNCATE TABLE statement
  • multiple table delete and update
Relevant features added in Mysql 4.1

  • Subquery (nested query) supported
  • “CREATE table LIKE table 2” syntax supported
  • added storages:
    – EXAMPLE (for developers)
    – NBCLUSTER (table partitioned over many computers)
    – CSV (comma separated values storage)
    – BLACKHOLE (it doesn’t store data)
  • added command “HELP”
  • API improved (multiple statement in a single call)
  • “INSERT … ON DUPLICATE KEY UPDATE” syntax added
  • GROUP_CONCAT() added
Relevant features added in Mysql 5.0

  • BIT data type
  • Math precision improved
  • SQLSTATE error message
  • TRIGGERS (limited support)
  • VARCHAR max length increased to 2^16 byte (64 Kb)
  • “greedy” optimizer, “merge index” to improve join performance
  • Improvement of “NOT IN”
  • VARCHAR more efficient
  • INNODB less disk space usage (-20%)
  • Added storages ARCHIVE and FEDERATED
Relevant features added in Mysql 5.1

  • Partitioning (tables in different locations)
  • plugin API to load/unload components at runtime
  • event scheduler (like cronotab)
  • customizable MySQL logs (to file or tables)
  • XML and XPath functions
Relevant features added in Mysql 6.0
  • UTF8 (4 bytes), UTF16 and UTF32
  • BACKUP_DATABASE and RESTORE statements (to backup)
  • performance optimization for subqueries and joins
  • transactional locks
  • LOAD_XML statement

INNODB storage engine: notes

Advantages

  • transactions
  • row-level locking
  • foreign key constraints ! and behaviour
    ON [DELETE|UPDATE] [RESTRICT|CASCADE|SET NULL|NO ACTION|SET DEFAULT]
Disadvantages

  • INNODB does NOT support FULLTEXT index (MyISAM does)
  • not initial value for auto_increment keys