97 things every programmer should know: personal notes

Last month I saw a book flying about the office, containing useful tips from developers. 97 suggestions from experienced developers. Amazon is still selling it, so I suggest you read it if work as a software developer or you occasionally dip into coding: 97 Things Every Programmer Should Know: Collective Wisdom from the Experts

I’ve finished reading it, and I took some notes while reading. Some concepts and suggestions were obvious, as I hear them many times from the community (blogs, conferences, colleagues) or I’ve already personally experienced them. I’ll paste those my notes here, with the hope that you can find something valuable from looking through them.

Continue reading

How to set apache to skip a directory before a catch-all RewriteRule

Most of PHP applications use internal PHP routing (any MVC framework like ZF, CakePHP… , including wordpress) use a catch-all rewrite rule.

In case a directory needs to be ignored from that (to avoid broken urls falling back to that catch-all route), use a RewriteCond

 

#.htaccess. Redirect all the URLs to index.php, except the ones  starting with /admin

RewriteCond %{REQUEST_URI} !^/admin
RewriteCond %{REQUEST_FILENAME} -s [OR]
RewriteCond %{REQUEST_FILENAME} -l [OR]
RewriteCond %{REQUEST_FILENAME} -d
RewriteRule ^.*$ index.php [NC,L]

My considerations about Db models and RESTful application

I’ve worked on more than one project using complicate DB layers and HTTP client/server systems (e.g. RESTful apps) to interface with the DB server.

My opinion: when working on a project (especially a team project), easy solutions are better then clever ones. OK, OK, a design pattern is an elegant solution to a common problem, but before doing that, keep in mind two basic design principles: KISS (Keep It Simple, Stupid) and YAGNI (You Ain’t Gonna Need It).

A DB layer is the ground of an application, and it must be simple, solid and extensible.

My idea: Simple, maintainable and extensible Model classes for DB access

Make a class for each DB table, all of them extending a common abstract one (for future extensions). Each class has its own methods to insert/retrieve/delete/update the data. Of course do not make a method for each operations, group them using multi-purpose methods and class constants in arrays. Methods accepts params as array and return data using arrays (so easy to deal with). Common insert/delete/update return data in the same format (e.g. numbers of records updated, or exception, no different values for each method). All the input and return params must be documented with PHPDoc comments including all the possible values of an input array key . You can write a constant for each column in the db, so you do not have to look at the DB to know what’s the column name, just use the IDE autocompletition for everything.
e.g.

$newsModel = News::getInstance();
/* get news of the category id=2, limited by 5 and ordered by date.
Join with parent table and select with columns are needed. */
$rows = $newModel->getRecords(array(
  News::FILTER_CATEGORY = 2,
  News::ORDER_CREATED_DESC,
  News::LIMIT => 5,
  News::JOIN => News::JOIN_AUTHORS,
  News::COLUMNS => array(News::FIELD_TITLE, Authors::FIELD_NAME)
));
foreach ($rows as $row) {
   //echo $row[News::FIELD_TITLE]; //piece of news title
   //echo $row[Authors::FIELD_NAME]; //author name (joined)
];

}

Semplicity and Maintainability

One db table => only one class. All the logic inside the models. If I alter the db table, easy to change the centralized logic and costants.
From the controller, I write the code to access the database using IDE autocompletition in a few seconds.
If I want to add another condition to the query, just add to the array. To add a support to something new to the class (new filter, new join, new column), just add the constants, the logic and update the PHPDoc comments.

Caching advantage

This approach make very easy to implement a cache system with automatic tagging and cache invalidation by using wrappers and magic methods [see post about it ].

Extensibility, e.g. move to a client/server via HTTP (web service)

– You want to switch from Zend_Db to Doctrine / NoSQL or optimize the queries ? no problems, just replace the inside logic. the methods still have same input/output. Much easier if you have written unit testing. You sure don’t have to write another layer to dynamically switch between database (that requires a huge effort, in contrast with YAGNI ).

– You want to move the database to a different server and use HTTP calls to get the data (like RESTful apps) ? I have a elegant, don’t-repeat-yourself and easy solution with PHP 5.3. Make the getInstance() of the super class returning a wrapper (you can do that, or alternatively, use a external factory class) that through magic methods, makes the HTTP calls to the server passing [class name, method name, array of params] and decode the response into an array (just one class for all models !). The server will use the same db models classes (shared libs) and implement the other side of the logic with a correspondent one class ($model = new $modelName(); return $model->$methodName($params);). So, basically you code as you are using a local db, but the request is actually going through HTTP to another server. You can switch back to the local db approeach by just changing the abstract DB model class to return the db instance instead of the wrapper.

Drawbacks ?

Most of the application can use this approach and do not need over-engineering of the DB layer. Please follow with your comments with any problems you see in this approach.

Implementing page counter (MySQL memory temporary table + MySQL procedure)

Although there is google analytics, web applications often have to save statistics of visits (IP, page visited, timestamp) in order to make some business logic. Example: websites with articles (blog or newspaper) with features like “most read articles”, “articles most read from the visitors of the current article” etc…

Save  visits into a MySQL memory table

We want to avoid to count more than one visit from the same user in case of accidental (or malicious) refresh. A cookie/session approach is not enough for that and a database approach is needed. We’ll use a table where we save user IP, timestamp (it will also make trivial to implement a site feature like “number of users on the site / reading the same article”).

CREATE TABLE IF NOT EXISTS `news_counter` (
  `news_id` int(10) NOT NULL,
  `remote_addr` char(15) NOT NULL,
  `created` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
  `timeslot` int(11) NOT NULL COMMENT 'time() / 600',
  UNIQUE KEY `news_id` (`news_id`,`remote_addr`,`timeslot`)
) ENGINE=MEMORY

Note: `timeslot` can be calculated with PHP ceil(time()/600) to have the same value in intervals of 10 minutes.
That value, together with the fields `news_id` and `remote_addr` (IP of the user), are used to make an UNIQUE key in order to avoid to insert a duplicate-visit record (of course the database exception must be suppressed).

Cron job to move visits to an archive table and increment page counter, using a MySQL procedure.

I’m using a MEMORY storage engine as the idea is running a cron job (every 10 minutes) to move this data to a MyISAM table (news_counter_archive) with the same structure. That will avoid having the table growing too fast and become a performance bottleneck.
That cron job can also update a INT counter on the table with articles/news. Better not to run this operation at each page visit to avoid MyISAM table locking (due to mixed read/write) and having more performances (as we don’t want the statistics immediately updated).

Here it is a MySQL (tested on v5.1.53 linux) procedure that takes the records from the memory table, move to the archive table and update the counter of the articles (`news` table)

DELIMITER //
CREATE PROCEDURE update_news_views()
BEGIN
 DECLARE var_news_id INT(11);
 DECLARE var_remote_addr CHAR(15);
 DECLARE var_created TIMESTAMP;
 DECLARE c CURSOR FOR
 SELECT `news_id`, `remote_addr`, `created` FROM `news_counter` WHERE `created` < DATE_ADD(NOW(), INTERVAL -1 MINUTE) ORDER BY `news_id`;
 OPEN c;
 BEGIN
   DECLARE EXIT HANDLER FOR SQLSTATE '02000' BEGIN END;
   LOOP
   FETCH c INTO var_news_id, var_remote_addr, var_created;
	UPDATE `news` SET `views`=`views`+1 WHERE `news_id` = var_news_id;
        INSERT INTO `news_counter_archive` (`news_id`, `remote_addr`, `created`) VALUES (var_news_id, var_remote_addr, var_created);
   END LOOP;
 END;
 DELETE FROM `news_counter` WHERE 1;
END;
//
DELIMITER ;

Cron job
using mysql (or call it with PHP):

mysql -h [host] -u [username] -p[pass] [dbname] -e "CALL update_news_views()";

Some usefult links about MySQL procedures and cursors
http://dev.mysql.com/doc/refman/5.1/en/create-procedure.html
http://forge.mysql.com/wiki/Cursors

 

Update  11/3/2011

– Mysql triggers are a very good solution when there other application (not changeable for technical reasons or time reasons) that work with the same tables. I’d say some big drawbacks are portability (migration to non-server db is a problem) and maintainability (much better to make logic with PHP).

– When statistics do not need to be exact, another easier solution (to avoid MyISAM locks), is incrementing by N when mt_rand(1,N)==1. So 1/Nth of queries and same results with relevant big numbers.

 


Free web-based software for project management

project managementAfter being part of a new team that works with outsourcing team with dynamic allocation of resources (developers) without using a software to plan and schedule the project, I’m now interested in experimenting some free web-based software for project management [project magement wiki].

As expected, the awesome wikipedia contains a page about the software used for project management, as well as a comparison of the  project management software.

Among the open source, web-based, here is the list of the ones that seem more insteresting, with some notes. I’m making this list in order to having them read to try.

See in this article how to manage developers

Validator, Auto complete and Date picker Jquery scripts

In this post I’ll quickly present three Jquery plugin I often use:

Jquery Validator [link] [download my working example html+js.zip]

After defining the field validators (email, string length range, regex, equalTo another element, int…) for all the form elements in one array. It shows the error messages next to the form elements and prevent the form submitting.

Jquery Autocomplete [link] [docs] [download my working example html+js.zip]

It shows a dropdown menu under the text fields suggesting (and autocompleting) data according to the text typed inside the field. It supports preload with json or ajax requests.

Jquery Date Picker [link] [download my example]

It shows a date picker to select and browse dates. Date format customizable.

Other plugins

http://jqueryui.com/

Functional testing for web applications: Selenium IDE for firefox


In a previous post I wrote about PHP Unit testing.

In software science, the levels above Unit testing are Integration, System and System integration testing.

For a standard web application I suggest to use only Selenium [official site], a powerful complete framework , distributed with apache licence 2.0 (a free-software, GPL compatible).
Selenium includes many components. In my opinion the most useful is Selenium IDE [page], a firefox plugin to create action tests, play and check results.

It’s very easy to use the IDE [official manual], but to write good tests it needs some practice.
Basically, a test case is a sequence of commands, and a test suite is a cluster of test cases.
To create actions the tool allows to record user actions (clicks, page opening…) or manually insert/edit (mainly to insert verify-actions) and play the test case (or test suite), then check results.
It’s possible to export the test cases in many formats (html table, o export (html table format, PHPunit selenium extension, etc…) and to import (only html table).

Every command has only one or two arguments. The command reference is present in selenium IDE. Here are the commands I mainly use and I suggest:
  • open dir/page.html
    open the url specified
  • type id=phone 123456
    fill in field with id=phone with the text “123456”
  • clickAndWait id=elementID
    click on the link with ID equal to “elementID” .
  • clickAndWait link=text
    click on the link with “text” as internal text (a “text-dependent” approach is plainly not recommended)
  • verifyTextPresent done
    Verify the page contains the text “done”. If not, the test will fail at runtime. (be careful: “text-dependent”).
  • verifyElementPresent id=elementID
    Verify the page contains the element with ID equal to “elementID”. If not, the test will fail at runtime.
  • assertConfirmation
    assert a javascript confirmation appear
Instead of use the ID to locate an element (example: “id=linkA”) it’s possibile to use XPath expressions [syntax by w3c school] as locators. Example: “//div//a[2]” is the second link inside the “div” tag. I don’t suggest to use XPath: if you (or the designer) change the template (position, or text inside links, or change teh text) the test will not work !
To write solid reliable tests, remember to use a univocal ID for elements to test (message confirmation, links, form fields..), and write the test as soon as the application is working (you will easily and quickly test the correct application behaviour after adding new features).

To test CRUD forms, I suggest to write the html code (template) using a different ID for confirmation messages and error messages
[?php if ($result ): ?]
[div id=”postok”]submit ok[/div]
[?php else: ?]
[div id=”posterr”][?php echo $messageError ?][/div]
[?php endif; ?]
To test the result of this form:
verifyElementPresent postok
or
verifyTextNotPresent posterr


Selenium IDE screenshot