Download the
perl-basics.zip archive.
This contains the sample programs discussed here.
Perl on Linux systems
Perl is a standard part of any Linux system.
In contrast to Bash, Perl is a complete programming system intended
to work in any environment, although it will
be most robust in a UNIX environment in which it was developed.
Additionally, there is a vast library of independently-maintained
packages called the Comprehensive Perl Archive Network
(CPAN) which extends the functionality of Perl to every
facet of programming; its home page is:
Perl syntax is based on the C language
and is very different from
Bash; nevertheless, Perl emulates a number of the sytax
features used in Bash scripts such as these:
ability to omit parentheses in function calls creating a
command-like syntax
simple, intuitive list definition and concatenation
(the list syntax being much simpler than that of Bash)
regular expression usage syntax analogous
to that used in sed and awk.
Perl has many unique syntactic features including these:
variable designators:
scalars variables begin with a "$"
list (array) variables begin with a "@"
hash (associative array) variables begin with a "%"
an extensive set of quoting operators
syntax dedicated for simplified usage of regular expressions
Perl has extensive online information available
through the man pages.
On Ubuntu system the documentation is part of the separate
perl-doc
package which may have to be installed. Just double-check:
$ sudo apt-get install perl-doc
You can start with
$ man perl
A very useful one is the man page giving information about
Perl functions:
$ man perlfunc
Running a Perl script
A perl script file can be run on any system with perl
installed (and available to the execution PATH)
by executing this in a shell:
$ perl scalars.pl
On Linux the script file itself, as with Bash, can function like
an executable file subject to these criteria:
The file must be executable. You can always make a file executable by
doing this from a shell:
chmod +x <file>
The first line of the script file (the shebang line) must be:
#!/usr/bin/perl -w
The path name /usr/bin/perl is the full
path name of the perl executable.
The -w flag is optional, but it provides useful warnings
in certain situations.
On UNIX systems
the .pl suffix is merely a convention.
This "hello world" program
illustrates some basic programmatic features.
hello.pl
#!/usr/bin/perl -w
print( "hello" );
print " world\n";
print "hello", " again", "\n";
# this last one illustrates the concatenation operator "."
print " hello " . " one " . " more " . " time " , "\n";
We observe:
the basic output function is
print;
in contrast to the Bash echo,
newline is not automatically generated
function calls can be omit parentheses if there is no ambiguity.
the concatenation operator is "."
(in Bash it is simply juxtaposition)
a terminating semicolon is required on all statements
comment lines are started with "#"
Dealing with Windows systems
If the script is created
on a Windows system and sent to a UNIX system in binary
mode (for example, in an archive), the extra carriage return
characters in the line may cause problems in execution.
Linux provides the executable dos2unix,
which you must install on Ubuntu by:
$ sudo apt-get install perl-doc
which will get rid of the extra carriage
returns from all lines of a text file. Call the executable like this:
$ dos2unix <script>
Compilation/Execution and packages
Perl, unlike Bash,
compiles its scripts and executes a compiled code. External code
and other settings in the form of a package or
module
which is simply a file with the .pm suffix
which exports functions and other constructs.
The package
can be brought in to a Perl script at either
the compilation or execution phase with these commands:
use <PACKAGE NAME> loaded at compile time
require <PACKAGE NAME> loaded at run time
The use statement is quite similar to the Java import,
although the Perl modules can have a much more varied affect
that Java classes.
For example, the most commonly loaded package is
the strict package via the command:
use strict;
Loading this package (at compile time) has an effect on the Perl
compiler in that it requires all variables to be declared
in one way or another.
Initially, the only declarator of interest is the
local declarator, my,
used like this:
my $x;
The operator :: can also be used to indicate that
the package is in a subdirectory (like the Java "." operator).
Thus,
use Data::Dumper;
means to look for the file
Dumper.pm within the Data directory
residing in one of the standard package subdirectories.
All Perl packages on a UNIX system have information available
through man pages. For example, try:
man strict
man Data::Dumper
Scalars
Scalar variables begin with a $.
They represent strings
or numeric values but can also be references
to more complex structures.
With the strict package, strings must be quoted except in a
few circumstances. Numerical values are not quoted.
String values can be created in several ways, using
uninterpolated quotes: '..'
interpolated quotes: ".."
shell-evaluated back-quotes: `..`
the q{ } operator (uninterpolated)
or q(..), q<..>, q/../, etc.
the qq{ } operator (interpolated)
or qq(..), qq<..>, qq/../, etc.
the qx{ } operator (shell-evaluated)
or qx(..), qx<..>, qx/../, etc
"<<XXX" (heredoc) style
One difference with Bash is that Perl doesn't manage
whitespace characters in string values. In particular, for
shell evaluated strings, the
newline inserted after the command output remains there
and must explicitly be removed, typically by
the chomp operator like this:
my $z = `date`;
chomp $z;
The sample program is this:
scalars.pl
#!/usr/bin/perl -w
use strict;
my $x = 'aa';
my $y = "bb BB";
my $z = `date`;
my $w = qx/pwd/;
my $u = "${x}QQ$y";
my $v = '${x}QQ$y';
# 6 different ways to create printed output
print '$x= ', $x, "\n";
print '$y= ' . $y . "\n";
print "\$z= $z\n";
print qq(\$w = $w\n);
print q|$u = |, $u, "\n";
print <<ENDSTR
\$v = $v
ENDSTR
;
my $complex_str = <<END
All your problems with 'quoting' will END
There's just the "extra" terminating
END
;
chomp($complex_str); # remove trailing line-break
print "$complex_str\n";
my $a = 14; my $b = '375.75'; # $a is numeric, $b is a string
print qq(\$a = $a, \$b = $b\n);
print q($a+$b=), $a + $b, "\n"; # + defines a "numeric" context
print q($a.$b=), $a . $b, "\n"; # . defines a "string" context
print "Type a line to be read => ";
$x = <STDIN>;
print "You typed this => |$x|";
Like Bash, Perl has an operator, eval,
but its usage is more specialized. For example,
$x = "y";
eval "\$$x = 5";
has the effect of "$y = 5". However
the usual "use strict;" and "-w flag"
usages would either prevent or complain about it.
Control Structures
Perl has control structures similar to C/Java, with some differences.
Here are some key points:
brackets { .. } are required in Perl for sub-statement
blocks, even with one statement
Perl has the elsif contraction used for chained if-else-if
decisions, e.g.:
Perl has these two new "opposite sense" operators unless and until:
unless(expr) means if(!exp)
until(expr) means while(!expr)
loop iteration control uses different operators:
next instead of continue
last instead of break
Perl has
a shortened syntax version of if, unless,
while, until,
for and foreach
for single statements, like this:
$y = 1 if $x == 3;
instead of
if ($x == 3) { $y = 1; }
Perl has the usual
boolean operators: &&, ||, !
as well as other lower precedence
versions: and, or, not,
used for joining statements
Perl's false values are numeric
0 and the empty string, "";
all others are true.
Variables and expressions, like Bash, can be undefined.
Perl uses boolean operator defined to test this. Unlike
Bash, the empty string is not considered undefined.
If an undefined value is, say printed,
it is treated like the empty string. However
the usage of an undefined value often
signals a programming mistake, and so, with the "-w" flag
present, Perl will flag most usages with a warning.
Scalar comparisons
There are two types of boolean comparison operators
numerical comparisons:
== != < > <= >=
lexicographic (string) comparisons:
eq ne lt gt le ge
It's important to use eq for string comparison.
With the -w flag present,
Perl will warn you about unintended numerical comparisons of
non-numerical strings.
controls.pl
#!/usr/bin/perl -w
use strict;
my $a;
my $b = "";
my $c = 0;
my $d = " ";
print '$a defined: ', (defined $a) ? "YES\n" : "NO\n";
print '$b defined: ', (defined $b) ? "YES\n" : "NO\n";
print '$c defined: ', (defined $c) ? "YES\n" : "NO\n";
print "\n";
# an undefined variable like $a can be tested without warning
# but other expressions using the value of $a generate warnings
print '$a true: ', ($a) ? "YES\n" : "NO\n";
print '$b true: ', ($b) ? "YES\n" : "NO\n";
print '$c true: ', ($c) ? "YES\n" : "NO\n";
print '$d true: ', ($d) ? "YES\n" : "NO\n";
print "\n";
# undef is a function, but it can act like a value due to the
# fact that in the expression undef(), we may drop the parens
undef $b;
$c = undef;
print '$b defined: ', (defined $b) ? "YES\n" : "NO\n";
print '$c defined: ', (defined $c) ? "YES\n" : "NO\n";
print "\n";
my $x = '123'; my $y = '57';
if ($x <= $y) {
print "true:\t $x <= $y\n";
} else {
print "false:\t $x <= $y\n";
}
if ($x lt $y) {
print "true:\t $x lt $y\n";
} else {
print "false:\t $x lt $y\n";
}
print "\n";
$x = 'xx'; $y = 'yyy';
if ($x == $y) { # we shouldn't be doing this, "-w" will warn
print "true:\t $x == $y\n";
} else {
print "false:\t $x == $y\n";
}
if ($x eq $y) {
print "true:\t $x eq $y\n";
} else {
print "false:\t $x eq $y\n";
}
Perl Arrays (Lists)
List variables, regarded as aggregates, always
appear with the @ prefix.
List (array) literals are formed in several ways:
the (...) operator
the qw{...} operator (quoted words, uninterpolated)
or qw(...), qw<...>, qw/.../, etc.
the range operator,
n..m,
generating the list from n to m
Perl can print lists directly:
A list printed by itself concatenates all the elements.
A list variable is interpolated inside double quotes; in this case
Perl interspaces the elements with a list separator string,
by default consisting of one space. One can change this list separator
character by reassigning the Perl special variable $" to the
desired separator.
Additionally, a list, @L, appearing in a numeric context forces reduction to
scalar(@L) # L's length
This is very convenient for testing a list's length in an operation
such as this:
if (@L == 1) { ... } # if L has 1 element, ...
Individual elements of a list are regarded as scalars, and Perl uses
the scalar prefix $ to represent selection from a list such as this:
$L[3] # element 3 in list L
Array elements can be assigned at any position in an array; the largest
position dictates the array's length and Perl "fills in" the missing
elements with undef values.
The most common way to process list elements uses the foreach
construct which iterates through the elements (not the indices) of
a list.
arrays.pl
#!/usr/bin/perl -w
use strict;
my @a = ('First', 22, 'a third',); # left-over commas OK
my @b = qw/ Fourth Fifth /; # quoted words
my @c = 19..23; # range: integers between 19 and 23
print "\@a = @a\n"; # @ is interpolated in double quotes
print "\@b = @b, \@c = @c\n";
print '@a = ', @a, "\n"; # outside, elements have no separators
my @d = ( @a, "xx", @c ); # concatenation is simple
print "\@d = @d\n";
print '@d = ', join( ", ", @d ), "\n"; # array->string using "join"
print "\n";
my $len = @d; # scalar = array: scalar = array-length
print "length of \@d = $len
last position in \@d is $#d\n";
print "\n";
my ($x, $y, $z, $w) = (@b, @c); # assign a list of variables from array
print "\$x=$x \$y=$y \$z=$z \$w=$w\n";
print "\n";
print "\$a[2] = $a[2]\n"; # array element access via [ ]
print "\n";
my @e; $e[0] = 'aa'; $e[2] = 'bb'; # creates "undef" at $e[1]
print '@e = ', join( ',', @e ), "\n";
print "\n";
@a = (33, 44, 55, 66);
print "\@a = @a\n";
my $sum = 0;
for (my $i = 0; $i < @a; ++$i) { # traditional array sum
$sum += $a[$i];
}
print "sum \@a #1: $sum\n";
$sum = 0;
foreach my $val (@a) { # using the "foreach" loop through elements
$sum += $val;
}
print "sum \@a #2: $sum\n";
$sum = 0;
foreach (@a) {
$sum += $_; # using the "default loop scalar" $_
}
print "sum \@a #3: $sum\n";
$sum = 0;
$sum += $_ foreach (@a); # shortened foreach syntax
print "sum \@a #4: $sum\n";
$sum = 0;
map { $sum += $_ } @a; # array map operator
print "sum \@a #5: $sum\n";
Splice-based operations
Perl has many functions to manipulate arrays.
Some of the commonly used ones are:
shift / unshift: remove / add from left side
pop / push: remove / add to right side
The unshift and push
operators can be expressed in another way:
unshift(@x, 33)
is the same as @x = (33, @x)
push(@x, 66)
is the same as @x = (@x, 66)
The general array manipulation
operator is splice, from which all others can be derived:
The array operator [ .. ] can be
generalized to allow the selection of subarrays when integer lists
are used as the selector. In this case, the "@" is
retained. For example, given
@A = qw/ this is an example of array slice selection /;
then
@A[0,2,4] is qw/ this an of /;
@A[2..5] is qw/ an example of array /;
The singleton selector can be used:
@A[2]
but Perl will warn you that you probably are intending to use:
$A[2]
The following sample script illustrates these array operation features.
arrayops.pl
#!/usr/bin/perl -w
use strict;
my @x = (3..9);
print "\@x = @x\n";
print "pop(@x) = ", pop(@x), "\n";
print "shift(@x) = ", shift(@x), "\n";
print "\@x = @x\n";
unshift @x, 33;
print "unshift(\@x,33); \@x = @x\n";
push @x, 66;
print "push(\@x,66); \@x = @x\n\n";
# Yank out 4 elements starting from position 1 and insert ('xxx', 'yyy')
print '==> @yanked_out = splice @x, 1, 4, qw( xxx yyy )', "\n";
my @yanked_out = splice @x, 1, 4, qw( xxx yyy );
print "yanked out = @yanked_out\n";
print "\@x = @x\n";
# delete position 3
print '==> splice @x, 3, 1', "\n";
splice @x, 3, 1;
print "\@x = @x\n";
# insert "hello" at position 2
print '==> splice @x, 2, 0, "hello"', "\n";
splice @x, 2, 0, "hello";
print "\@x = @x\n\n";
my @A = qw/ this is an example of array slice selection /;
print "\@A:\t\t", "@A\n";
print "\@A[0,2,4]:\t", "@A[0,2,4]\n";
print "\@A[2..5]:\t", "@A[2..5]\n";
print "\@A[0..2,5,7]:\t", "@A[0..2,5,7]\n";
print "\@A[-3..-1]:\t", "@A[-3..-1]\n";
Hashes (Associative Arrays)
A Perl
hash is an associative array, i.e., a list of
(key,value)
such that no two
pairs have the same key. In Perl, the key is always a string.
In mathematical terms, this is
called a map or partial function.
As an aggregate, a hash is prefixed by the % symbol which
indicates that it is regarded as a different type of data structure
than an array.
The { } operator is used to access
the value of a certain key, similar to an array selection
using the [ ] operator.
A common way to initialize hashes is to assign
a literal list of pairs to defined by (..).
It is a convention that, when possible, replace the
comma separating pairs by the => operator. When
you use => between key/value pairs, you don't need
to quote the key, it is done automatically.
Looping through the key/value pairs can be done with the
each operator, or by using the keys operator to
extract the array of keys. One issue about Perl hashes is
that the order of key retrieval is effectively random;
in particular, it bears no relationship to the key entry.
The Dumper function (from the Data::Dumper package)
is useful for dumping the contents of hashes and
other complex data structures.
hashes.pl
#!/usr/bin/perl -w
use strict;
use Data::Dumper; # allow use of the Dumper function
# these settings control the output of Dumper; see "man Data::Dumper"
$Data::Dumper::Terse = 1;
$Data::Dumper::Indent = 0;
my %grade = ( A => 4.0, B => 3.1, C => 2.0, D => 1.0 );
print q/$grade{'A'}=/, $grade{'A'}, "\n";
$grade{B} = '3.0'; # can omit the quotes on the key here
print '$grade{B}=', $grade{B}, "\n";
print "\n";
# observe output order, compare values of B and C
print "%grade: ", Dumper(\%grade), "\n";
# A hash can be created exclusively from assignments
my %weight;
$weight{John} = 180;
$weight{Ellen} = 135;
print "%weight: ", Dumper(\%weight), "\n" ;
print "\n";
# when you use qw, careful not to put quotes, commas, etc.
my %age = qw( John 50 Joe 34 Ellen 15 Marty 44 );
# loop through key/value pairs using while/each
print "%age: ";
while (my ($key, $value) = each %age) {
print "$key => $value ";
}
print "\n";
my @age_keys = keys %age;
my @age_values = values %age;
print "%age keys: ", join( ',', @age_keys ), "\n";
print "%age values: ", join( ',', @age_values ), "\n";
print "\n";
# loop through key/value pairs using foreach on keys
print "%age: ";
foreach (keys %age) {
print "$_ => $age{$_} "
}
print "\n";
# delete works correctly for hashes
delete $age{Joe};
print "%age after deleting 'Joe': ", Dumper(\%age), "\n" ;
References (Pointers)
The elements in an array or hash can only be scalars.
Perl creates complex data structures through the use
of references, or
pointers to arrays or hashes.
References are scalars and thus can be held in arrays or hashes.
An array reference is created
by either:
the [ ] operator around a list
the \ operator in front of an
array variable (@)
and a hash reference is created by either:
the { } operator around a list (of pairs)
the \ operator in front of a hash variable (%)
In general you have to deference
or cast a reference to be able to use
it as the original array or hash. The Perl deference operator
is the binary "->"; one can use it only to access an
element through a reference. The cast operators
are array cast, @{ }. and hash cast, %{ },
used to cast references to the aggregates.
If we have:
$aref->[0] is "AA" and $href->{MM} is 31
@{$aref} is ("AA",22) and %{$href} is ( DD=>11, MM=>31 )
When an array or hash element is a pointer (to another array or hash),
then the connecting dereference operator can be omitted.
For example, if
@a = ([ qw/AA BB/ ], [ qw/XX YY/ ]), then
$a[1]->[0] is "XX" and so is $a[1][0]',
The following sample program illustrates these points.
A Perl package, or module, is most
often a group of functions and variables
which can be imported by a Perl program.
Additionally, Perl has sytactic support to make a package act like a class.
Like a class, it must declare itself to be a package:
package SomePack;
and then the file must be SomePack.pm.
Perl packages are usually well documented. The executable, perlpod,
(like Java's javadoc)
take special internal documentation and generates
external documentation in a
variety of formats such as man pages, HTML, etc. The documenting comments
look like this:
=SOME_TOKEN other stuff
... Comments
=cut
The comment-start line is one in which
the "=" must be left justified and immediately followed by an
alphanumeric token.
The comment-end line is the "=cut" line exactly as seen.
These comments can be used anywhere
but they are most commonly used in packages with the aim of
generating external documentation.
The array @INC is the package-path
variable; it controls where Perl
looks for packages or directories containing packages. The
hash %INC then indicates the mapping of package name to the
actual location of the package.
You can add your own directory to @INC as follows:
use lib <SOME DIRECTORY>;
This has the effect of adding this new directory to the
front of the search path, ensuring that a package in
this directory will actually be found in this directory.
For example, to ensure you find a package in the
script's current directory, use:
use lib ".";
because normally "." is the last component
of the @INC search path.
Any function defined
in a package can be accessed by a program which imports that package.
Variables defined in a package can only be accessed outside
if they are declared
using the our keyword rather than the "local-only"
my keyword.
For example, if,
we have the package
package SomePack;
our $var = 4;
sub foo { ...
}
Then we can always access these entities with the
SomePack:: accessor
as follows:
#!/usr/bin/perl -w
use SomePack;
SomePack::foo();
print $SomePack::var; # note how the variable retains its "$" prefix.
Exporting entities
Perl can use certain package-defined variables and
functions without the package name
accessor prefix. To do so, the package must be
an Exporter, meaning, as in object heirarchies,
the package extends the Exporter package.
In Perl terms we express this as:
The array ISA (IS-A) expresses all the packages from which it inherits.
This could conceivably be more than one, but this would mean mulitple
inheritance. Once this has been established, there are three
shared variables which define what can be exported:
@EXPORT: these entities are exported when the use statement has no parameters
@EXPORT_OK: these entities are
exported when present in a parameter list of the use statement
%EXPORT_TAGS: these refer to
groups of exportable variables and functions exported
through a parameter list of the use statement
sample package and usages
The example packages
illustrates all three ways of creating export lists
of variables and functions.
testpack.pm
package testpack;
use strict;
use Exporter;
our @ISA = qw( Exporter );
our @EXPORT = qw( $x @a foo ); # same as ( '$x', '@a', 'foo' )
our @EXPORT_OK = qw( foo bar $y %h );
our %EXPORT_TAGS = ( 'one' => [ qw( foo $y ) ],
'two' => [ qw( foo bar @a %h ) ],
);
our $x = 77;
our $y = 88;
our @a = qw( This Is My List );
our %h = ( Jim => 12, Bob => 7 );
sub foo { return "foo"; }
sub bar { return "bar"; }
return 1;
All variables and functions in this package
can be accessed with the testpack:: prefix.
The issue is which ones can be accessed without this prefix.
This usage automatically exports all on the @EXPORT list and
no others.
use testpack;
Only $x, @a and foo need not use the
testpack:: prefix.
This usage restricts exportables to the @EXPORT_OK list.
use testpack qw( $y %h bar );
Only $y, %h and bar need not use the
testpack:: prefix.
This usage (with ":" prefix) means we consult the
$EXPORT_OK{:one} list.
use testpack qw( :one );
Only $y and foo need not use the
testpack:: prefix.
This usage means we consult the $EXPORT_OK{:one}
and $EXPORT_OK{:two} lists
use testpack qw( :one :two );
In this case all exportables except $x need not use the
testpack:: prefix.
Functions
The declaration of
a function is done with the sub keyword.
Functions are normally defined in packages and imported
into the Perl program with a use command.
In contrast
to Bash, Perl functions can be
defined in the script after the usage (because Perl is compiled),
but in that case the call
to the function usage must employ parentheses.
Perl functions have no named parameter lists. Instead,
parameters are accessed through the default array, @_.
For example, this suggests common ways of capturing the arguments
inside a function:
my ($x,$y,$z) = @_; # capture first 3 "named" argument
my $x = shift; # capture first argument, the @_ argument is implicit
Keep in mind that hashes and arrays must be passed by reference to a function
in order to retain their integrity. For example, we may have:
@a = (12, 5, 77); @b = (6, 44);
foo1(@a, @b); # same as foo1(12, 5, 77, 6, 44)
foo2(\@a, \@b); # here arrays are not merged
sub foo2 {
my ($aref, $bref) = @_; # we know reference are coming in
my @a = @{$aref}; # cast the references to arrays
my @b = @{$bref}; # in order to work with them
}
Object-oriented package usage
Perl provides many syntactically different ways to access the functions
of a package including an object-oriented style.
What Perl calls an object is a blessed hash reference,
which is a hash reference with package name associated with it.
The Perl operation bless creates this association
of a package name to a hash reference. The following example
illustrates the point.
Because the hash reference $ref is blessed with
the name MyPack,
it can then be used to call a function in MyPack
in an object-oriented style:
$ref->foo();
The last three statements
illustrate additional ways of calling a function in
a package. The first, with "::" is familiar. The last
two represent equivalent
object-oriented function calls in which the package name is
passed in as a string to the function.
In particular, we can emulate typical object-oriented syntax.
Perl has no built-in new function, but we can create
one and use it in the "usual" manner. Consider the following example:
MyClass.pm
package MyClass;
sub new {
my ($classname, $arg) = @_; # params: "MyClass", $arg
my $ref = { data => $arg };
bless $ref, $classname; # $classname is "MyClass"
return $ref;
}
sub getData {
my ($this) = @_; # the object is the parameter
return $this->{data};
}
return 1;
This program represents some of the "usual" object-oriented statements:
useMyClass.pl
#!/usr/bin/perl -w
use strict;
use MyClass;
my $x = new MyClass(1);
my $y = MyClass->new(2);
print '$x->getData()', "\t", $x->getData(), "\n";
print '$y->{data}', "\t", $y->{data}, "\n";
As you see, the MyClass package is acting like a class of sorts.
The notation "$y->{data}" in which we access the value
of the "data" as a key is analogous to the Java
"y.data", or C++
"y->data"
in which we would access the value of the data
as a class member.
Of course, there is no data privacy since we can skip
the use of getData and use the data key directly.
Regular expression operations
There are five principle operations
which use regular expressions:
match a string with the pattern, giving a boolean (yes/no) result
extract portion(s) of a string which matches the pattern
substitute portion(s) of a string which matches the pattern
split a string into an array by removing portions which match
the pattern
extract a subarray of array elements which match the pattern
The Perl match and substitute operations
use the dedicated operator =~ like this:
$string =~ [Pattern expression] and $string !~ [Pattern expression]
The [Pattern expression] uses a notation inspired
by the UNIX sed executable. Match and substitute
employ single-character options to control the matching;
some common options are these:
g: general -- perform all matches in string, not just the first
i: match letters in a case-insensitive way
e: in substitution, evaluate the subsituted expression
match/substitute
The match operation is written in one of the two forms (positive/negative match):
$string =~ m/reg-expr/options and $string !~ m/reg-expr/options
where the reg-expr is a Perl regular expression.
The right-hand side
can be expressed in a number of equivalent ways:
$string =~ m@pattern@options also %, ^, &, #, !, |, ~, ;, :, +, -, =
$string =~ m(pattern)options also { }, [ ], < >
$string =~ /pattern/options "m" unnecessary when "/" is used
When the resulting expression is used
in a scalar context, the "=~"
operator delivers a
true/false result of the success of the match.
The "!~" operator delivers the opposite boolean result.
The substitute operation is written similarly:
The main difference is that $string by
the replacements when a match occurs.
Portions of, or the entire pattern, can be defined as a string external
from the operator usage and then inserted into the pattern. This is
essentially the only way to use regular expressions in Java. In this
case one has to remember to add, in the external
pattern string, additional escapes in front of common
escape sequences such as \s, \d, etc.
Subpattern matches
When the match operations are
used in an array context the resulting array
represents the list of all matching substrings
used in the parenthesized portions.
This array context is particularly useful when the "g" (general) option
is used. In that way it can capture all substrings of a string which
match a pattern.
The parenthesized portions of a pattern in a successful
match define back-reference
match variables, $1, $2, $3, ...,
consisting of the matched
substring portions corresponding to the parethesized pattern.
The order of the matches is based on the order of the left parentheses.
When the match operations are used in an array context the resulting array
represents the list of all matching substrings used in the parenthesized
portions. The array context is particularly useful
when the "g" (general)
option is used in order to can capture all substrings of a string
which match a pattern.
match.pl
#!/usr/bin/perl -w
use strict;
my $str = "AA474";
print '$str', "\t" x 6, "'$str'\n"; # the "x" operator means string-repeat
print '$str matches ', '\\w\\d{2}', "\t" x 4;
if ( $str =~ /\w\d{2}/ ) { print "T\n"; } else { print "F\n"; }
my $pat = '^\\d+$'; # "\"'s needed in front of special chars
print '$str matches ', $pat, "\t" x 4;
if ( $str =~ /$pat/ ) { print "T\n"; } else { print "F\n"; }
print "\n";
$str = " AaA: 272xy7-88";
print '$str:', "\t" x 6, "'$str'\n";
print 'match: /(a+)\\s*:\\s*(\\d+)\\s*(\\w+)/i', "\t" x 2;
$str =~ /(a+)\s*:\s*(\d+)\s*(\w+)/i;
print "\$1=$1, \$2=$2, \$3=$3\n";
print "\n";
my @all;
@all = ( $str =~ /(a+)\s*:\s*(\d+)\s*(\w+)/i );
print 'all matches:', "\t" x 5, "@all\n";
@all = ( $str =~ /(\d+)/g );
print 'all digit string matches:', "\t" x 3, "@all\n";
substitute.pl
#!/usr/bin/perl -w
use strict;
my $str = " 234aaAA 22 3 ";
my $copy;
print '$str', "\t" x 5, "'$str'\n";
$copy = $str;
$copy =~ s/\d+/***/;
print 'after: s/\\d+/***/', "\t" x 3, "'$copy'\n";
$copy = $str;
$copy =~ s/\d+/***/g;
print 'after: s/\\d+/***/g', "\t" x 3, "'$copy'\n";
$copy = $str;
$copy =~ s/(\d+)/'*' x length($1)/eg;
print q<after: s/(\d+)/'*' x length($1)/eg>, "\t", "'$copy'\n";
substitute with back references and evaluation
The Perl back reference variables can also be inserted into pattern
sustitution definitions. Furthermore, the substitution portion
can be evaluated as an expression before the substitutions is made.
This gives a fantastically general way of making substitutions.
For example, the following program
locates each "free-standing" number and sustitutes the number
multiplied by 10:
multiplynums.pl
#!/usr/bin/perl -w
use strict;
# this pattern means: all digit sequence on a word boundary
my $pattern = "\\b(\\d+)\\b";
my $string = "22, u4: 18, j2: 34";
print "$string\n";
$string =~ s/$pattern/$1*10/eg;
print "$string\n";
minimal vs. maximal matchine
There are situations, with match/extraction
and substitution where you want the repetitive constructions
* and +
to generate minimal matches
(the smallest number of characters possible)
instead of the default maximal matches (the largest number
of characters possible).
Creation of a minimal repetition is effected by appending
a ? to the * or + operator, getting the operators
*?
and +?, respectively. The comparison is:
A.*A as many characters between two A's, possibly including other A's
A.*?A as few characters between two A's thus not including other A's
Note that the latter is not the same as:
A(.*)?A
which is effectively the same as the former.
A more useful example is the
one we have here, where the task is to
find the C-style comments (between /* and */)
within a program:
minmaxsubs.pl
#!/usr/bin/perl
use strict;
my $sample1 = <<END
Here is a text with some
/*
C-style comments
on multiple line
*/
not just above, but also below:
/*
this will become part
of a single maximal match
*/
to illustrate a point
END
;
my $sample2 = $sample1;
# using the maximal match
$sample1 =~ s/\/\*(.|\n)*\*\//COMMENTS/g;
print "1 ------- maximal -------\n$sample1\n";
# using the minimal match
$sample2 =~ s/\/\*(.|\n)*?\*\//COMMENTS/g;
print "2 ------- minimal --------\n$sample2\n";
The syntactic issue mess here is due to the fact that both
"/" and "*" are special characters in the pattern.
split
The split function applies a regular expression pattern
to a string and creates an array of scalars by
removing strings defined by the regular expression pattern.
The calls are these:
The single-blank split has a special significance. What happens is that the string
is trimmed first, and then split on whitespace sequences.
Another very common usage of split is to capture the lines
in the output of a command into an array. For example:
@my_file_lines = split "\n", `cat lines.txt`;
grep
The grep function is applied to arrays.
It returns a subarray of all elements
which match the regular expression pattern.
The call is:
@subarray = grep /pattern/, @array
One common usage is to find an element in an array. For
example
@A = (33, 5, -3, 18, 533, 22, -99, 88 );
$x = some_integer;
if ( grep /^$x$/, @A ) { print "$x is in the array\n"; }