Two techniques are used for parsing XML documents in PHP:
SAX(Simple API for XML) andDOM
(Document Object Model). By using SAX, the parser
goes through your document and fires events for every start and stop tag or
other element found in your XML document. You decide how to deal with the
generated events. By using DOM, the whole XML file is parsed into a tree that
you can walk through using functions from PHP. PHP 5 provides another way
of parsing XML: the SimpleXML extension. But first, we explore the two
mainstream methods.
We now leave the somewhat boring theory behind and start with an example.
Here, we’re parsing the example XHTML file we saw earlier. We do that by
using the XML functions available in PHP (http://php.net/xml
)
. First, we create
a parser object:
$xml = xml_parser_create('UTF-8');
xml_set_element_handler($xml, 'start_handler', 'end_handler');
xml_set_character_data_handler($xml, 'character_handler');
function start_handler ($xml, $tag, $attributes)
{
global $level;
echo "\n". str_repeat(' ', $level). ">>>$tag";
foreach ($attributes as $key => $value) {
echo " $key $value";
}
$level++;
}
DOM
Parsing a simple X(HT)ML file with a SAX parser is a lot of work. Using the
DOM (http://www.w3.org/TR/DOM-Level-3-Core/) method is much easier, but
you pay a price—memory usage. Although it might not be noticeable in our
small example, it’s definitely noticeable when you parse a 20MB XML file with
the DOM method. Rather than firing events for every element in the XML file.
<?php
$dom = new DomDocument();
$dom->load('test2.xml');
$root = $dom->documentElement;
process_children($root);
function process_children($node)
{
$children = $node->childNodes;
foreach ($children as $elem) {
if ($elem->nodeType == XML_TEXT_NODE) {
if (strlen(trim($elem->nodeValue))) {
echo trim($elem->nodeValue)."\n";
}
} else if ($elem->nodeType == XML_ELEMENT_NODE) {
process_children($elem);
}
}
}
?>
The output is the following:
XML Example
Moved to
example.org
.
foo & bar