Implementing page counter (MySQL memory temporary table + MySQL procedure)

Although there is google analytics, web applications often have to save statistics of visits (IP, page visited, timestamp) in order to make some business logic. Example: websites with articles (blog or newspaper) with features like “most read articles”, “articles most read from the visitors of the current article” etc…

Save  visits into a MySQL memory table

We want to avoid to count more than one visit from the same user in case of accidental (or malicious) refresh. A cookie/session approach is not enough for that and a database approach is needed. We’ll use a table where we save user IP, timestamp (it will also make trivial to implement a site feature like “number of users on the site / reading the same article”).

CREATE TABLE IF NOT EXISTS `news_counter` (
  `news_id` int(10) NOT NULL,
  `remote_addr` char(15) NOT NULL,
  `created` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
  `timeslot` int(11) NOT NULL COMMENT 'time() / 600',
  UNIQUE KEY `news_id` (`news_id`,`remote_addr`,`timeslot`)
) ENGINE=MEMORY

Note: `timeslot` can be calculated with PHP ceil(time()/600) to have the same value in intervals of 10 minutes.
That value, together with the fields `news_id` and `remote_addr` (IP of the user), are used to make an UNIQUE key in order to avoid to insert a duplicate-visit record (of course the database exception must be suppressed).

Cron job to move visits to an archive table and increment page counter, using a MySQL procedure.

I’m using a MEMORY storage engine as the idea is running a cron job (every 10 minutes) to move this data to a MyISAM table (news_counter_archive) with the same structure. That will avoid having the table growing too fast and become a performance bottleneck.
That cron job can also update a INT counter on the table with articles/news. Better not to run this operation at each page visit to avoid MyISAM table locking (due to mixed read/write) and having more performances (as we don’t want the statistics immediately updated).

Here it is a MySQL (tested on v5.1.53 linux) procedure that takes the records from the memory table, move to the archive table and update the counter of the articles (`news` table)

DELIMITER //
CREATE PROCEDURE update_news_views()
BEGIN
 DECLARE var_news_id INT(11);
 DECLARE var_remote_addr CHAR(15);
 DECLARE var_created TIMESTAMP;
 DECLARE c CURSOR FOR
 SELECT `news_id`, `remote_addr`, `created` FROM `news_counter` WHERE `created` < DATE_ADD(NOW(), INTERVAL -1 MINUTE) ORDER BY `news_id`;
 OPEN c;
 BEGIN
   DECLARE EXIT HANDLER FOR SQLSTATE '02000' BEGIN END;
   LOOP
   FETCH c INTO var_news_id, var_remote_addr, var_created;
	UPDATE `news` SET `views`=`views`+1 WHERE `news_id` = var_news_id;
        INSERT INTO `news_counter_archive` (`news_id`, `remote_addr`, `created`) VALUES (var_news_id, var_remote_addr, var_created);
   END LOOP;
 END;
 DELETE FROM `news_counter` WHERE 1;
END;
//
DELIMITER ;

Cron job
using mysql (or call it with PHP):

mysql -h [host] -u [username] -p[pass] [dbname] -e "CALL update_news_views()";

Some usefult links about MySQL procedures and cursors
http://dev.mysql.com/doc/refman/5.1/en/create-procedure.html
http://forge.mysql.com/wiki/Cursors

 

Update  11/3/2011

– Mysql triggers are a very good solution when there other application (not changeable for technical reasons or time reasons) that work with the same tables. I’d say some big drawbacks are portability (migration to non-server db is a problem) and maintainability (much better to make logic with PHP).

– When statistics do not need to be exact, another easier solution (to avoid MyISAM locks), is incrementing by N when mt_rand(1,N)==1. So 1/Nth of queries and same results with relevant big numbers.

 


Automatic caching of Zend_Db (or any object) calls

I’m working on a personal Zend framework application that uses Zend_Db_Table_Abstract classes to get data from the database.
Each class has its own methods (getall, getXX, getYYY) etc…

My need was having all those method calls (around the code) automatically cached.
The first solution in my mind was a general wrapper (container) class that uses magic methods to call and cache methods of the inner class (subclass of Zend_db_Table_Abstract in this case).
The cache ID can be calculated by using class name +function name + arguments (md5+serialize).

public function __call($name, $arguments)
{
   $cacheId = get_class($this->wrappedObject) . $name
                . md5(serialize($arguments));
   // ...
   $callBack = array($this->wrappedObject, $name);
   //... cache logic...
   $ret = call_user_func_array($callBack, $arguments);
   // ...
}

I wanted the wrapper to work in a “invisible mode” in order to keep the IDE autocompletition for the model methods. So I’ve made a static method that instantiates the wrapper on the object passed by reference.

public static function createOn(&$wrappedObject, $lifetime = null)
{
    $wrappedObject = new self($wrappedObject);
    //...
}

Here it is an example of use

$modelCategories = new Application_Model_Categories();
CacheWrapper::createOn($modelCategories /*, 7200*/); //2nd param = lifetime
/* $modelCategories is now and instance of CacheWrapper 
 *  but the IDE still thinks is teh model */
$data = $modelCategories ->getAll(); //cached !
$data2 = $modelCategories ->getAll(1,2); //cached !

Cache and expiring time (value “randomized” by 10%) can be set using a static method (or taken from registry if existing).

EDIT 3/9/2013
source code moved to github: https://github.com/elvisciotti/CacheWrapper

Eclipse hangs on SVN commit

I’m currently using Eclipse PDT at work (build 20100917-0705). I usually prefer Netbeans as more stable and with more or less the same features, but the application I’m working on contains a huge amount of classes and the eclipse quick search (open resource) is extremely useful to navigate and open files (currently better and faster than Netbeans file search).

The most annoying bug of Eclipse I’ve found happens when the IDE hangs when commiting with SVN. It needs to kill the process, restart, clean svn, reinsert SVN password and commit again (after having lost the SVN comment inserted), enjoyable isn’t it ?

After some googling (read here) it seems that the problem is due to a limit on the console output.

Untick the option Limit Console Output under Eclipse Preferences ->Team -> SVN -> Console and the problem should be solved.

Script to download Youtube videos and merge into an AVI file

youtube downloader mergerI had the need to watch a TV program on youtube splitted into 12 videos when I was on holiday at home. Due to my poor connection there, I decided to obtain the merged version in AVI format and watch it on the TV screen via HD player. I did not found any software/script that suited me on linux (apparently on windows nothing similar exists neither) so my developer instinct made me developing something appropriate … and made me go to sleep late as usual :).

I’ve eventually made a PHP script that downloads all the youtube videos (using the already existing python script youtube-dl) from the URLs specified in the Command line, then convert them to AVI (pipeline process with more than one file) and merge together into a final AVI file (FLV of different framerates and size are supported, using mencoder). The script is tested on linux ubuntu 10 (mencoder obtainable from repositories).

Download here (public SVN): http://svn.phpntips.com/projects/youtube-merge/

Example of use with 2 short videos:

youtube-multi "http://www.youtube.com/watch?v=MaZRgTJlN2Y&NR=1" 
 "http://www.youtube.com/watch?v=na_3XoysbCk"

See other options and instruction inside the file (max number of simultaneous processes, AVI bitrate etc..).

copy the file into /usr/bin to have them available evertywhere