This document gives code which can be used to run some simple tests
by copy/pasting the code segments into a Php file:
<?php
# testfile.php
//sample code
running it from the command line.
It also provides a complete Php project PhpWordSearch
which can be installed and executed from the default website.
Download and install the
PhpWordSearch.zip
archive to do so.
You will need the Dojo toolkit installed as described in
the Php AJAX + Dojo document.
Command-line Php execution
Ideally you want to be able to create and run
a .php anywhere within the system via:
php mySampleFile.php
On a Linux system, you simply
have to ensure that the command-line version of the Php package
(on Ubuntu, php5-cli) is installed.
On Windows, using our installation setup,
the command-line executable is
C:\php\php.exe
It is likely that C:\php\
is not a component of the PATH. Your options are:
Make an entry of the folder C:\php in the PATH
Run the test script(s) from the C:\php folder
Use the full path name C:\php\php to execute the
test script(s)
The first alternative is most desirable because you may find
Php to be a useful programming alternative.
Regular expression functions
The Php regular expression functions of main interest are
the so-called Perl-compatible versions
which use the "preg_" prefix:
preg_match, preg_match_all: validate, extract
preg_replace, preg_replace_callback: substitute
preg_quote:
escape a string
to be used in a regular expression as a literal string
preg_split: split a string into an array:
preg_grep: extract matching elements from an array:
The general form of these operations is this:
preg_OPERATION( "/REGULAR_EXPR/qualifiers", ... )
The regular expression used for the operation is always bounded by
the "/" delimiters. The main qualifier is "i" which
indicates case-insensitivity in matching.
Validation via a regular expression
Validation usually
means that a string must completely match a regular expression.
Unlike Java's String.match operation,
the only way to force the "completeness" is to use pin down the match
with the beginning and ending
anchors ^ and $, respectively.
Here are some simple examples where the pattern string,
$patternStr,
represents a signed number which is an integer with no leading zeros
optionally followed by two decimal digits.
As is the case in all of Java's regular expression handling, when
pattern string is defined externally as above,
the literal "\" in the pattern must be escaped,
creating occurrences of "\\".
The pattern string can be bounded by double quotes to allow interpolation.
In this case the literal "$" should also be escaped unless it's at
the end of the string. Thus, these are both OK:
Alternatively, the pattern string can be placed directly into the
preg_match
function, in which case we need not escape the backslashes.
We would write it this way:
To achieve case-insensitivity in a match (or other operations), add the
"i" qualifier after the terminal "/" as in these examples:
preg_match( "/^a+$/", "aaa") is true
preg_match( "/^a+$/", "aAa") is false
preg_match( "/^a+$/i", "aAa") is true
Matched substring extraction
When we're interested in extracting substrings from matched portions of
a larger string we use preg_match and preg_match_all
with a third
parameter used to capture the matched information. In these situations,
we typically do not want to use the beginning and ending anchors.
Consider the following program:
A true return value indicates that some substring matches
the pattern. In this first case
$matches[0] captures the first match, i.e.,
$matches[0] = "+22"
In the second case using preg_match_all, the matching operation
goes to the end of the test string, obtaining:
$matches[0] = array( "+22", "-4.51", "8", "0" )
You may consider that this procedure seems an odd use of $matches
having
$matches[0] hold everything.
What about the $matches[1], ... ?
The answer has to do with subpattern matches.
Subpattern matches
In many circumstances we're interested in subpatterns of a matched
pattern. For example, consider the pattern and test string as
inputs to the matching operations:
The pattern represents a lower case letter sequence
followed by digit sequence.
The two parenthesized subpatterns
separate the letter sequence from
the digit sequence. As before,
preg_match pertains only to the
first match, storing into $matches:
Array( [0] => c55, [1] => c, [2] => 55 )
The "0" entry is the entire match, and entry n is the match of the
n-th parenthesized subpattern.
In the second example using preg_match_all
makes $matches[0][k] be the full k-th match,
and $matches[n][k] the
n-th parenthesized portion of the k-th match.
Thus $matches is
where my_callback_function is a function is defined as follows:
function my_callback_function($m) {
// using $m[0] = the entire matched portion
// and/or, $m[n] = the matched portion for the n-th parenthesis group
return /* the replacement code */;
}
The following web application imitates some of the features we
presented in the Java-based
WordSearch
application.
In this case we can read and display from
a resticted set of files (the examples subdirectory) on the server
side. The text in these files is presented in the web application
along with a keyword search mechanism for highlighting desired keywords.
The idea behind (complete) keyword search is to create a regular
expression which uses the word boundary anchors
around the embedded keyword in a case-insensitive search. We
want to replace matched instances with some sort of HTML-based
highlight features perhaps with color and bolding.
Given
a suitable $keyword we might use a replacement like this:
In the sample program below, there are
a number of other features of interest:
Using an iframe to hold the display content. We
access the iframe document via the JavaScript:
window.frames[0].document
Using the Dojo "color-picker" (different from the "color-chooser") widget.
This example draws
from the rich web enhancements available
from the so-called dojox (dojo extension) repertoire.
With proper initial loadings, the color-picker is instantiated
by the single HTML element:
The point about the color-chooser is that it's
a complex widget precreated which only needs plugging into
to be of use. At issue is the additional client-side code
requirements and implied bandwidth requirements.