A discussion of the basics of XML Parsing in PHP

The file that has the code in this discussion is called phpDiscussionOfXMLCode.php.
If you would like to see it in action, click here.
If you would like a copy of phpDiscussionOfXMLCode.php that generated the above XML parse click here.
For the XML file that is being read by the above code ("phpXMLSourceData.xml") click here. [Note that this XML file has the first two lines indented to show the XML but the rest are single line entries of the same tagged information.]


The following is an analysis of the phpDiscussionOfXMLCode code mentioned above:
[click here to download/view the full code]


There are more steps involved in PHP XML parsing than with the parsing dlls used available to use with ASP. This makes it a little more difficult to work with, because of your involvement with the process, but also a little more in your control over the parsing process. The following code line initiates the entire process by calling the function readTrialBalanceInfo which function will return a usable array [$TrialBalance] of information that you got from the XML file:

<?php
$TrialBalance = readTrialBalanceInfo();

?>

This process begins by the creation of the XML parser:

<?php
function readTrialBalanceInfo()
{
$xml_parser = xml_parser_create();

...
}
?>

This is followed by the function which tells the parser the name of the functions that will handle the start and end tags for the XML file. These functions will be the ones that take capture and any other actions that you define to happen when the start tag is encountered or the end tag is encountered. In the following code part, function startElement and function endElement handle the starting and ending tags respectively. These functions are filled with whatever actions that you want them to do:

<?php
function readTrialBalanceInfo()
{
$xml_parser = xml_parser_create();
xml_set_element_handler($xml_parser, "startElement", "endElement");

...
}
?>

Then we need to tell the parser what function, again one that we can control what it does, will handle the data inside the tags. In the following sample code, the name of that function is: function characterData

<?php
function readTrialBalanceInfo()
{
$xml_parser = xml_parser_create();
xml_set_element_handler($xml_parser, "startElement", "endElement");
xml_set_character_data_handler($xml_parser,"characterData");

...
}
?>

The next line is the code that actually loads the XML file into the parser and reads 4096 bytes of the file at a time:

<?php
function readTrialBalanceInfo()
{
$xml_parser = xml_parser_create();
xml_set_element_handler($xml_parser, "startElement", "endElement");
xml_set_character_data_handler($xml_parser,"characterData");
if (!($fp = fopen($file, "r"))) {
die("Could not open $file for reading") ;
}
while (($data = fread($fp, 4096))) {
if (!xml_parse($xml_parser, $data, feof($fp))){
die(sprintf("XML error at line %d column %d",
xml_get_current_line_number($xml_parser),
xml_get_current_column_number($xml_parser)));
}
}

...
}
?>

Finally, the parser is freed from memory and the filled global array variable, $TrialBalance is returned from this function to be used by the function that called this function:

<?php
function readTrialBalanceInfo()
{
$xml_parser = xml_parser_create();
xml_set_element_handler($xml_parser, "startElement", "endElement");
xml_set_character_data_handler($xml_parser,"characterData");
if (!($fp = fopen($file, "r"))) {
die("Could not open $file for reading") ;
}
while (($data = fread($fp, 4096))) {
if (!xml_parse($xml_parser, $data, feof($fp))){
die(sprintf("XML error at line %d column %d",
xml_get_current_line_number($xml_parser),
xml_get_current_column_number($xml_parser)));
}
}
xml_parser_free($xml_parser);
return $TrialBalance;

}
?>

Now, to discuss how the functions that the parser is told to use, is used in this simple XML parsing sample. The first function is the parser automatic call to the startElement everytime the parser encounters a starting tag. This function, the way this sample is designed, merely captures the name of the tag to the global variable $currentTag for use in the next discussed function.

<?php
function startElement($parser, $name, $attr)
{
global $currentTag;
$currentTag = $name;
}
?>

This next function is also automatically called by the php parser and is used, in this sample, to build the array of values from the elements that we wish to capture. In this sample, we look at the tag name, captured above to $currentTag, and see which global variable we have chosen to update with the content in the tag. Here the account number is captured to $ACCTNO and the account description is captured to $ACCTDESC:

<?php
function characterData($parser, $data)
{
global $ACCTNO, $ACCTDESC, $currentTag;

if (strcmp($currentTag, "ACCTNO") == 0)
{
$ACCTNO .= $data;
}
elseif (strcmp($currentTag, "ACCTDESC") == 0)
{
$ACCTDESC .= $data;
}
}
?>

Now, its time to build up the XML array for use. We will add the current global variable information captured above, to the total array and then reset those variables for the next line capture. The timing of the array addition is the parser automatic call to the function when the ending or closing tag is encountered:

<?php
function endElement($parser, $name)
{
global $ACCTNO, $ACCTDESC, $TrialBalance;

if (strcmp($name, "ACCTITEM") == 0)
{
//Assignment to array being built:
$TrialBalance[] = array("TheAccountNumber"=>$ACCTNO, "TheAccountDescription" => $ACCTDESC);
//Now clear variables:
$ACCTNO = "";
$ACCTDESC = "";
}
}
?>

Finally, after the entire file has been read, and the array $TrialBalance[] now has the information that we decided to capture, we can now display that information with the function that started the whole process off (you may recognize the code snippet $TrialBalance = readTrialBalanceInfo(); from the first snippet above). Notice that we refer to the variable information two ways:

We captured the tag information (see snippet above) for the account number to the array reference "TheAccountNumber".
We captured the tag information (see snipped above) for the account description to the array reference "TheAccountDescription"
And finally, to see each "record", the ordinal reference is the first reference (ex: [0] expressed through the for loop x)

<?php
$TrialBalance = readTrialBalanceInfo();
$y;
$y = 0;
echo "The number of accounts found were " . count($TrialBalance) . " as follows:<br>";
echo "<hr>";
for ($x = 0; $x < count($TrialBalance); $x++)
{
$y = $x + 1;
if ($y > 9)
{
echo " " . ($x + 1) . ". The Account <font color=\"#ff0000\">Number</font> is: <b>" . $TrialBalance[$x]["TheAccountNumber"] . "</b> and the Account <font color=\"#ff0000\">Description</font> is: <b>" . $TrialBalance[$x]["TheAccountDescription"] . "</b><br>";
}
else
{
echo " " . ($x + 1) . ". The Account <font color=\"#ff0000\">Number</font> is: <b>" . $TrialBalance[$x]["TheAccountNumber"] . "</b> and the Account <font color=\"#ff0000\">Description</font> is: <b>" . $TrialBalance[$x]["TheAccountDescription"] . "</b><br>";
}
}
echo "<hr>";
?>

I hope that this simple example helps. There is a great deal more functionality for XML parsing in PHP and a lot of the XML parsing functions allow for your customization. It does not seem to offer the same recordset XML parsing found in Microsoft parsing dlls but it is definitely a good parsing engine.

Christopher Koniges, Systems Programmer (206) 954-0423 www.jazzysystems.com  chris@jazzysystems.com