Saturday, September 29, 2007

Introducing PHP 5's Standard Library

Much of the buzz surrounding PHP5 has focused on its new object-oriented syntax and capabilities, and comparisons with Java. While all that was going on, the promisingly named "Standard PHP Library" (SPL) extension quietly made its way into the core PHP 5 distribution.

Although work is still in progress, the Standard PHP Library's current offering significantly increases the chances of getting PHP developers to agree on something (thereby increasing the chances of code re-use). It may also make your cunningly constructed class interface very easy for other people to use, as the SPL extension makes it possible to "overload" basic PHP syntax and make objects look like normal PHP arrays.

In this tutorial, I'll introduce the functionality available with the SPL extension and PHP5 with just enough examples to get you started. Be warned: PHP5's syntax will be used. If you need to catch up, try SitePoint's PHP5 review.

Today's iterations:

  • Introducing the SPL: what's it all about?
  • Looping the Loop: did someone say Iterator?
  • Iterations foreach of us: the "wow" factor
  • Admiring the Tree: a short tour of SPL classes and interfaces
  • Objects as Arrays: easier for your web page designer
  • The Big Deal: why you gotta like it

Don't for get to download all the code included in this article for your own use.

Introducing the SPL

The "Standard PHP Library" is a PHP extension developed by Marcus Boerger which (as the manual says) "is a collection of interfaces and classes that are meant to solve standard problems." As part of the core distribution of PHP5, it should be "always on".

If you've been around the block with PHP4, you'll know there are a few areas in which wheels are perpetually re-invented by almost every new PHP project. Standardizing some of the fundamentals is a good way to get PHP developers singing from the same sheet, and increases the chances of our being able to re-use code from Project X in Project Y.

Today the SPL extension addresses a single subset of problems: Iterators. What makes the SPL Iterator implementation interesting is not only that it defines a standard for everyone to use in PHP5, but also that it "overloads" certain parts of PHP syntax such as the foreach construct and basic array syntax, making it easier to work with objects of your classes.

Looping the Loop

So, what is an Iterator? In this context, Iterator refers to a software "design pattern", identified by the "Gang of Four" in their ground-breaking Design Patterns book.

The intent of an Iterator is to "provide an object which traverses some aggregate structure, abstracting away assumptions about the implementation of that structure."

As with all general definitions, the exact meaning of this statement may be none too clear at first glance.

By "aggregate structure" we're basically talking about anything you might "loop over" in PHP, such as the rows in a database result set, a list of files in a directory or each new line in a text file.

Using "normal" PHP, you might use the following to loop through a MySQL query:

// Fetch the "aggregate structure"
$result = mysql_query("SELECT * FROM users");

// Iterate over the structure
while ( $row = mysql_fetch_array($result) ) {
// do stuff with the row here
}

To read the contents of a directory, you might use:

// Fetch the "aggregate structure"
$dh = opendir('/home/harryf/files');

// Iterate over the structure
while ( $file = readdir($dh) ) {
// do stuff with the file here
}

And to read the contents of a file, you might use:

// Fetch the "aggregate structure"
$fh = fopen("/home/hfuecks/files/results.txt", "r");

// Iterate over the structure
while (!feof($fh)) {

$line = fgets($fh);
// do stuff with the line here

}

A glance at the above examples shows that they're very similar. Although each one works with a different type of resource, and uses PHP functions specific to that resource, the mantra is simple: "fetch resource; loop over contents".

If it was somehow possible to "abstract out" the specific PHP functions from the above examples and use some kind of generic interface instead, it might be possible to make the job of looping over the data look the same, irrespective of the type of resource that was being used. With no requirement to modify the loop for a different data source, it may be possible for the code in which the loop appears (perhaps a function that generated an HTML list) to be reused elsewhere.

That's where an Iterator comes in. The Iterator defines an abstract interface for use by your code. Specific implementations of the Iterator take care of each different type of structure with which you want to work, without the code that uses the Iterator having to care about the details.

That's the basic theory of Iterators. If you're interested to know more, you'll find starting points at the C2 Wiki and Wikipedia. More thoughts from me can be found at phpPatterns on the Iterator Pattern and in The PHP Anthology - Volume II, Applications.

Iterations foreach of Us

So what's so exciting about the SPL Iterators? Well, if you've written more than a line or two of PHP, you've probably run into the foreach construct, which is used to make easy work of looping through an array:

// A list of colors
$colors = array (
'red',
'green',
'blue',
);

foreach ( $colors as $color ) {
echo $color.'
';
}

Wouldn't it be nice if all loops where that easy, irrespective of whatever it was that you were looping over?

How about this?

valid();
}
return TRUE;
}
return FALSE;
}

function rewind() {
parent::rewind();
}
}

// Create a directory reader for the current directory
$Reader = new DirectoryReader('./');

// Loop through the files in the directory ?!?
foreach ( $Reader as $Item ) {
echo $Item.'
';
}
?>

Filename: directoryreader.php

If you ignore the class itself for a moment and look at the last few lines, you'll see that I've used the DirectoryReader object right there in the foreach loop. I've pulled items from it without having to call any of its methods! So long as you obey certain rules (which I'll get to shortly), the SPL extension allows to iterate over your own classes (where appropriate) in just the same way.

In fact, with the above example, I've jumped in at the deep end! Let's take a few steps back so I can explain what really happened here.

Iteration with SPL

Now that your appetite is whet, you first need to be warned that the PHP manual currently lacks the capabilities needed to fully document the SPL extension. It's geared primarily to documenting native functions, and lacks a clear means to fully describe something like an in-built class; interfaces fail even to get a mention.

Instead, you'll need to look at the generated documentation Marcus maintains, and trawl the source under CVS. Be aware also that the SPL extension is a moving target that's being actively developed and expanded. The code in this tutorial was tested under PHP 5.0.1, but if you're reading at a significantly distant point in the future, you may find parts of this code outdated.

The SPL extension defines a hierarchy of classes and interfaces. Some of these will already be loaded in your PHP5 installation (see what get_declared_classes() turns up). They correspond the interface and class definitions defined here and here (the PHP files found here should disappear eventually, once Marcus has time to implement them in C). Some of classes found in the examples directory (with the .inc extension) also form part of the hierarchy, but are not loaded by default; if you wish to use them, you'll need to make sure copies for inclusion are located somewhere in your PHP include path. More examples of the classes' use can be found with the tests while independent examples can be found at http://www.wiki.cc/php/PHP5#Iterators.

Although the number of classes and interfaces in the hierarchy may be daunting at first, don't panic! Basic use of the iterators requires only a single interface. If you're new to the idea of interfaces, have a look at this discussion of interfaces on SitePoint.

I'll summarize the purpose of all the pre-loaded classes and interfaces later in this tutorial, for you to browse at your leisure. Once you start to grasp what's on offer, you'll realize that Marcus has done an amazing job of addressing the most common, loop-related problems that recur in PHP. Life will get easier...

Let's return to the DirectoryReader example. How was it that I was able to iterate over my DirectoryReader object using foreach? The magic comes from the class I extended from, DirectoryIterator, which implements an interface called Iterator that's defined by the SPL extension.

Any class I write that implements the Iterator interface can be used in a foreach loop (note that this article explains how this works from the point of view of PHP internals). The Iterator interface is defined as follows:

interface Iterator extends Traversable {

/**
* Rewind the Iterator to the first element.
* Similar to the reset() function for arrays in PHP
* @return void
*/
function rewind();

/**
* Return the current element.
* Similar to the current() function for arrays in PHP
* @return mixed current element from the collection
*/
function current();

/**
* Return the identifying key of the current element.
* Similar to the key() function for arrays in PHP
* @return mixed either an integer or a string
*/
function key();

/**
* Move forward to next element.
* Similar to the next() function for arrays in PHP
* @return void
*/
function next();

/**
* Check if there is a current element after calls to rewind() or next().
* Used to check if we've iterated to the end of the collection
* @return boolean FALSE if there's nothing more to iterate over
*/
function valid();

}

Note that the SPL extension registers the Traversable interface from which Iterator inherits with the Zend Engine to allow the use of foreach. The Traversable interface is not meant to be implemented directly in PHP, but by other built-in PHP classes (currently, the SimpleXML extension does this; the SQLite extension probably should do this but, right now, it talks directly to the Zend API).

To implement this interface, your class must provide all of the methods defined above.

To show you how this works, I'll start by re-inventing the wheel and implementing an Iterator for native PHP arrays. Obviously, this is a pointless exercise, but it helps us understand how it works without getting lost in specific details.

To begin, I define a class to manage the iteration:

/**
* An iterator for native PHP arrays, re-inventing the wheel
*
* Notice the "implements Iterator" - important!
*/
class ArrayReloaded implements Iterator {

/**
* A native PHP array to iterate over
*/
private $array = array();

/**
* A switch to keep track of the end of the array
*/
private $valid = FALSE;

/**
* Constructor
* @param array native PHP array to iterate over
*/
function __construct($array) {
$this->array = $array;
}

/**
* Return the array "pointer" to the first element
* PHP's reset() returns false if the array has no elements
*/
function rewind(){
$this->valid = (FALSE !== reset($this->array));
}

/**
* Return the current array element
*/
function current(){
return current($this->array);
}

/**
* Return the key of the current array element
*/
function key(){
return key($this->array);
}

/**
* Move forward by one
* PHP's next() returns false if there are no more elements
*/
function next(){
$this->valid = (FALSE !== next($this->array));
}

/**
* Is the current element valid?
*/
function valid(){
return $this->valid;
}
}

Filename: arrayreloaded.php

Notice the "implements Iterator" at the start. This says I'm agreeing to abide by the Iterator "contract" and will provide all the required methods. The class then provides implementations of each method, performing the necessary work using PHP's native array functions (the comments explain the detail).

There are a couple of points of the Iterator's design that are worth being aware of when you write your own. The current() and key() Iterator methods could be called multiple times within a single iteration of the loop, so you need to be careful that calling them doesn't modify the state of the Iterator. That's not a problem in this case, but when working with files, for example, the temptation may be to use fgets() inside the current() method, which would advance the file pointer.

Otherwise, remember the valid() method should indicate whether the current element is valid, not the next element. What this means is that, when looping over the Iterator, we'll actually advance one element beyond the end of the collection and only discover the fact when valid() is called. Typically, it will be the next() and rewind() methods that actually move the Iterator and take care of tracking whether the current element is valid or not.

I can now use this class as follows:

// Create iterator object
$colors = new ArrayReloaded(array ('red','green','blue',));

// Iterate away!
foreach ( $colors as $color ) {
echo $color."
";
}

It's very easy to use! Behind the scenes, the foreach construct calls the methods I defined, beginning with rewind(). Then, so long as valid() returns TRUE, it calls current() to populate the $color variable, and next() to move the Iterator forward one element.

As is typical with foreach, I can also populate another variable with the value returned from the key() method:

// Display the keys as well
foreach ( $colors as $key => $color ) {
echo "$key: $color
";
}

Of course, nothing requires me to use foreach. I could call the methods directly from my code, like so:

// Reset the iterator - foreach does this automatically
$colors->rewind();

// Loop while valid
while ( $colors->valid() ) {

echo $colors->key().": ".$colors->current()."
";
$colors->next();

}

This example should help you see what foreach actually does to your object.

Note that the crude benchmarks I've performed suggest that calling the methods directly is faster than using foreach, because the latter introduces another layer of redirection that must be resolved at runtime by PHP.

Admiring the Tree

Now you've seen how to write a basic Iterator, it's worth summarizing the interfaces and classes offered internally by the SPL extension, so that you know what their jobs are. This list may change in future, but it summarizes what's on offer right now.

Interfaces

  • Traversable: as mentioned above, this is an Iterator interface for PHP internals. Unless you're writing an extension, ignore this.
  • Iterator: as you've seen, this defines the basic methods to iterate forward through a collection.
  • IteratorAggregate: if you would rather implement the Iterator separately from your "collection" object, implementing Iterator Aggregate will allow you to delegate the work of iteration to a separate class, while still enabling you to use the collection inside a foreach loop.
  • RecursiveIterator: this defines methods to allow iteration over hierarchical data structures.
  • SeekableIterator: this defines a method to search the collection that the Iterator is managing.
  • ArrayAccess: here's another magic interface with a special meaning for the Zend engine. Implementing this allows you to treat your object like an array with normal PHP array syntax (more on this below).

Classes

  • ArrayIterator: this Iterator can manage both native PHP arrays and the public properties of an object (more on this shortly).
  • ArrayObject: this unifies arrays and objects, allowing you to iterate over them and use array syntax to access the contents. See "Objects as Arrays" below (we'll grow our own class with similar behaviour).
  • FilterIterator: this is an abstract class that can be extended to filter the elements that are being iterated over (perhaps removing unwanted elements for a search).
  • ParentIterator: when using a ResursiveIterator, the ParentIterator allows you to filter out elements that do not have children. If, for example, you have a CMS in which documents can be placed anywhere under a tree of categories, the ParentIterator would allow you to recurse the tree but display only the "category nodes", omitting the documents that appear under each category.
  • LimitIterator: this class allows you to specify a range of elements to Iterator over, starting with a key offset and specifying a number of elements to access from that point. The concept is the same as the LIMIT clause in MySQL.
  • CachingIterator: this manages another Iterator (which you pass to its constructor). It allows you to check whether the inner Iterator has more elements, using the hasNext() method, before actually advancing with the next() method. Personally, I'm not 100% sure about the name; perhaps LookAheadIterator would be more accurate?
  • CachingRecursiveIterator: this is largely the same as the CachingIterator, but allows iteration over hierarchical data structures.
  • DirectoryIterator: to iterate over a directory in a file system, this Iterator provides a bunch of useful methods like isFile() and isDot() that save a lot of hassle.
  • RecursiveDirectoryIterator: this class allows iteration over a directory structure so that you can descend into subdirectories.
  • SimpleXMLIterator: this makes SimpleXML even simpler! Currently, the best examples can be found with the SPL tests -- see the files beginning "sxe_*"
  • RecursiveIteratorIterator: this helps you do cool stuff like "flatten" a hierarchical data structure so that you can loop through it with a single foreach statement, while still preserving knowledge of the hierarchy. This class could be very useful for rendering tree menus, for example.

To see it in action, try using the DirectoryTreeIterator (whichextendsRecursiveIteratorIterator), like so:

$DirTree = new

No comments: