Perl Basics
— print (last updated: Jun 16, 2009) print

Select font size:
Download the perl-basics.zip archive. This contains the sample programs discussed here.

Perl on Linux systems

Perl is a standard part of any Linux system. In contrast to Bash, Perl is a complete programming system intended to work in any environment, although it will be most robust in a UNIX environment in which it was developed. Additionally, there is a vast library of independently-maintained packages called the Comprehensive Perl Archive Network (CPAN) which extends the functionality of Perl to every facet of programming; its home page is:
http://www.cpan.org
Perl syntax is based on the C language and is very different from Bash; nevertheless, Perl emulates a number of the sytax features used in Bash scripts such as these: Perl has many unique syntactic features including these: Perl has extensive online information available through the man pages. On Ubuntu system the documentation is part of the separate perl-doc package which may have to be installed. Just double-check:
$ sudo apt-get install perl-doc
You can start with
$ man perl
A very useful one is the man page giving information about Perl functions:
$ man perlfunc

Running a Perl script

A perl script file can be run on any system with perl installed (and available to the execution PATH) by executing this in a shell:
$ perl scalars.pl
On Linux the script file itself, as with Bash, can function like an executable file subject to these criteria:
  1. The file must be executable. You can always make a file executable by doing this from a shell:
    chmod +x <file>
    
  2. The first line of the script file (the shebang line) must be:
    #!/usr/bin/perl -w
    
    The path name /usr/bin/perl is the full path name of the perl executable. The -w flag is optional, but it provides useful warnings in certain situations.
On UNIX systems the .pl suffix is merely a convention.

This "hello world" program illustrates some basic programmatic features.

hello.pl
#!/usr/bin/perl -w print( "hello" ); print " world\n"; print "hello", " again", "\n"; # this last one illustrates the concatenation operator "." print " hello " . " one " . " more " . " time " , "\n";
We observe:

Dealing with Windows systems

If the script is created on a Windows system and sent to a UNIX system in binary mode (for example, in an archive), the extra carriage return characters in the line may cause problems in execution. Linux provides the executable dos2unix, which you must install on Ubuntu by:
$ sudo apt-get install perl-doc
which will get rid of the extra carriage returns from all lines of a text file. Call the executable like this:
$ dos2unix <script>

Compilation/Execution and packages

Perl, unlike Bash, compiles its scripts and executes a compiled code. External code and other settings in the form of a package or module which is simply a file with the .pm suffix which exports functions and other constructs. The package can be brought in to a Perl script at either the compilation or execution phase with these commands:
use <PACKAGE NAME>          loaded at compile time
require <PACKAGE NAME>      loaded at run time
The use statement is quite similar to the Java import, although the Perl modules can have a much more varied affect that Java classes. For example, the most commonly loaded package is the strict package via the command:
use strict;
Loading this package (at compile time) has an effect on the Perl compiler in that it requires all variables to be declared in one way or another. Initially, the only declarator of interest is the local declarator, my, used like this:
my $x;
The operator :: can also be used to indicate that the package is in a subdirectory (like the Java "." operator). Thus,
use Data::Dumper;
means to look for the file Dumper.pm within the Data directory residing in one of the standard package subdirectories.

All Perl packages on a UNIX system have information available through man pages. For example, try:
man strict
man Data::Dumper

Scalars

Scalar variables begin with a $. They represent strings or numeric values but can also be references to more complex structures. With the strict package, strings must be quoted except in a few circumstances. Numerical values are not quoted. String values can be created in several ways, using One difference with Bash is that Perl doesn't manage whitespace characters in string values. In particular, for shell evaluated strings, the newline inserted after the command output remains there and must explicitly be removed, typically by the chomp operator like this:
my $z = `date`;
chomp $z;
The sample program is this:

scalars.pl
#!/usr/bin/perl -w use strict; my $x = 'aa'; my $y = "bb BB"; my $z = `date`; my $w = qx/pwd/; my $u = "${x}QQ$y"; my $v = '${x}QQ$y'; # 6 different ways to create printed output print '$x= ', $x, "\n"; print '$y= ' . $y . "\n"; print "\$z= $z\n"; print qq(\$w = $w\n); print q|$u = |, $u, "\n"; print <<ENDSTR \$v = $v ENDSTR ; my $complex_str = <<END All your problems with 'quoting' will END There's just the "extra" terminating END ; chomp($complex_str); # remove trailing line-break print "$complex_str\n"; my $a = 14; my $b = '375.75'; # $a is numeric, $b is a string print qq(\$a = $a, \$b = $b\n); print q($a+$b=), $a + $b, "\n"; # + defines a "numeric" context print q($a.$b=), $a . $b, "\n"; # . defines a "string" context print "Type a line to be read => "; $x = <STDIN>; print "You typed this => |$x|";
Like Bash, Perl has an operator, eval, but its usage is more specialized. For example,
$x = "y";
eval "\$$x = 5";
has the effect of "$y = 5". However the usual "use strict;" and "-w flag" usages would either prevent or complain about it.

Control Structures

Perl has control structures similar to C/Java, with some differences. Here are some key points:
  1. brackets { .. } are required in Perl for sub-statement blocks, even with one statement
  2. Perl has the elsif contraction used for chained if-else-if decisions, e.g.:
    with elsif
    if ($x == 3) {
      $y = 1;
    } elsif ($x == 4) {
      $y = 2;
    }
    
    without elsif
    if ($x == 3) {
      $y = 1;
    } else {
      if ($x == 4) {
        $y = 2;
      }
    }
    
  3. Perl has these two new "opposite sense" operators unless and until:
  4. loop iteration control uses different operators:
  5. Perl has a shortened syntax version of if, unless, while, until, for and foreach for single statements, like this:
    $y = 1 if $x == 3;
    
    instead of
    if ($x == 3) { $y = 1; }
    
  6. Perl has the usual boolean operators: &&, ||, ! as well as other lower precedence versions: and, or, not, used for joining statements
  7. Perl's false values are numeric 0 and the empty string, ""; all others are true.
  8. Variables and expressions, like Bash, can be undefined. Perl uses boolean operator defined to test this. Unlike Bash, the empty string is not considered undefined.
  9. If an undefined value is, say printed, it is treated like the empty string. However the usage of an undefined value often signals a programming mistake, and so, with the "-w" flag present, Perl will flag most usages with a warning.

Scalar comparisons

There are two types of boolean comparison operators It's important to use eq for string comparison. With the -w flag present, Perl will warn you about unintended numerical comparisons of non-numerical strings.

controls.pl
#!/usr/bin/perl -w use strict; my $a; my $b = ""; my $c = 0; my $d = " "; print '$a defined: ', (defined $a) ? "YES\n" : "NO\n"; print '$b defined: ', (defined $b) ? "YES\n" : "NO\n"; print '$c defined: ', (defined $c) ? "YES\n" : "NO\n"; print "\n"; # an undefined variable like $a can be tested without warning # but other expressions using the value of $a generate warnings print '$a true: ', ($a) ? "YES\n" : "NO\n"; print '$b true: ', ($b) ? "YES\n" : "NO\n"; print '$c true: ', ($c) ? "YES\n" : "NO\n"; print '$d true: ', ($d) ? "YES\n" : "NO\n"; print "\n"; # undef is a function, but it can act like a value due to the # fact that in the expression undef(), we may drop the parens undef $b; $c = undef; print '$b defined: ', (defined $b) ? "YES\n" : "NO\n"; print '$c defined: ', (defined $c) ? "YES\n" : "NO\n"; print "\n"; my $x = '123'; my $y = '57'; if ($x <= $y) { print "true:\t $x <= $y\n"; } else { print "false:\t $x <= $y\n"; } if ($x lt $y) { print "true:\t $x lt $y\n"; } else { print "false:\t $x lt $y\n"; } print "\n"; $x = 'xx'; $y = 'yyy'; if ($x == $y) { # we shouldn't be doing this, "-w" will warn print "true:\t $x == $y\n"; } else { print "false:\t $x == $y\n"; } if ($x eq $y) { print "true:\t $x eq $y\n"; } else { print "false:\t $x eq $y\n"; }

Perl Arrays (Lists)

List variables, regarded as aggregates, always appear with the @ prefix. List (array) literals are formed in several ways: Perl can print lists directly: Additionally, a list, @L, appearing in a numeric context forces reduction to
scalar(@L)   # L's length
This is very convenient for testing a list's length in an operation such as this:
if (@L == 1) { ... }    # if L has 1 element, ...
Individual elements of a list are regarded as scalars, and Perl uses the scalar prefix $ to represent selection from a list such as this:
$L[3]     # element 3 in list L
Array elements can be assigned at any position in an array; the largest position dictates the array's length and Perl "fills in" the missing elements with undef values.

The most common way to process list elements uses the foreach construct which iterates through the elements (not the indices) of a list.

arrays.pl
#!/usr/bin/perl -w use strict; my @a = ('First', 22, 'a third',); # left-over commas OK my @b = qw/ Fourth Fifth /; # quoted words my @c = 19..23; # range: integers between 19 and 23 print "\@a = @a\n"; # @ is interpolated in double quotes print "\@b = @b, \@c = @c\n"; print '@a = ', @a, "\n"; # outside, elements have no separators my @d = ( @a, "xx", @c ); # concatenation is simple print "\@d = @d\n"; print '@d = ', join( ", ", @d ), "\n"; # array->string using "join" print "\n"; my $len = @d; # scalar = array: scalar = array-length print "length of \@d = $len last position in \@d is $#d\n"; print "\n"; my ($x, $y, $z, $w) = (@b, @c); # assign a list of variables from array print "\$x=$x \$y=$y \$z=$z \$w=$w\n"; print "\n"; print "\$a[2] = $a[2]\n"; # array element access via [ ] print "\n"; my @e; $e[0] = 'aa'; $e[2] = 'bb'; # creates "undef" at $e[1] print '@e = ', join( ',', @e ), "\n"; print "\n"; @a = (33, 44, 55, 66); print "\@a = @a\n"; my $sum = 0; for (my $i = 0; $i < @a; ++$i) { # traditional array sum $sum += $a[$i]; } print "sum \@a #1: $sum\n"; $sum = 0; foreach my $val (@a) { # using the "foreach" loop through elements $sum += $val; } print "sum \@a #2: $sum\n"; $sum = 0; foreach (@a) { $sum += $_; # using the "default loop scalar" $_ } print "sum \@a #3: $sum\n"; $sum = 0; $sum += $_ foreach (@a); # shortened foreach syntax print "sum \@a #4: $sum\n"; $sum = 0; map { $sum += $_ } @a; # array map operator print "sum \@a #5: $sum\n";

Splice-based operations

Perl has many functions to manipulate arrays. Some of the commonly used ones are: The unshift and push operators can be expressed in another way: The general array manipulation operator is splice, from which all others can be derived:
@yanked_out = splice( @source, start-pos, size, INSERT );
where INSERT can be an scalar or array.

Array Slices

The array operator [ .. ] can be generalized to allow the selection of subarrays when integer lists are used as the selector. In this case, the "@" is retained. For example, given
@A = qw/ this is an example of array slice selection /;
then
@A[0,2,4]    is     qw/ this an of /;
@A[2..5]     is     qw/ an example of array /;
The singleton selector can be used:
@A[2]
but Perl will warn you that you probably are intending to use:
$A[2]
The following sample script illustrates these array operation features.

arrayops.pl
#!/usr/bin/perl -w use strict; my @x = (3..9); print "\@x = @x\n"; print "pop(@x) = ", pop(@x), "\n"; print "shift(@x) = ", shift(@x), "\n"; print "\@x = @x\n"; unshift @x, 33; print "unshift(\@x,33); \@x = @x\n"; push @x, 66; print "push(\@x,66); \@x = @x\n\n"; # Yank out 4 elements starting from position 1 and insert ('xxx', 'yyy') print '==> @yanked_out = splice @x, 1, 4, qw( xxx yyy )', "\n"; my @yanked_out = splice @x, 1, 4, qw( xxx yyy ); print "yanked out = @yanked_out\n"; print "\@x = @x\n"; # delete position 3 print '==> splice @x, 3, 1', "\n"; splice @x, 3, 1; print "\@x = @x\n"; # insert "hello" at position 2 print '==> splice @x, 2, 0, "hello"', "\n"; splice @x, 2, 0, "hello"; print "\@x = @x\n\n"; my @A = qw/ this is an example of array slice selection /; print "\@A:\t\t", "@A\n"; print "\@A[0,2,4]:\t", "@A[0,2,4]\n"; print "\@A[2..5]:\t", "@A[2..5]\n"; print "\@A[0..2,5,7]:\t", "@A[0..2,5,7]\n"; print "\@A[-3..-1]:\t", "@A[-3..-1]\n";

Hashes (Associative Arrays)

A Perl hash is an associative array, i.e., a list of (key,value) such that no two pairs have the same key. In Perl, the key is always a string. In mathematical terms, this is called a map or partial function. As an aggregate, a hash is prefixed by the % symbol which indicates that it is regarded as a different type of data structure than an array. The { } operator is used to access the value of a certain key, similar to an array selection using the [ ] operator.

A common way to initialize hashes is to assign a literal list of pairs to defined by (..). It is a convention that, when possible, replace the comma separating pairs by the => operator. When you use => between key/value pairs, you don't need to quote the key, it is done automatically.

Looping through the key/value pairs can be done with the each operator, or by using the keys operator to extract the array of keys. One issue about Perl hashes is that the order of key retrieval is effectively random; in particular, it bears no relationship to the key entry.

The Dumper function (from the Data::Dumper package) is useful for dumping the contents of hashes and other complex data structures.

hashes.pl
#!/usr/bin/perl -w use strict; use Data::Dumper; # allow use of the Dumper function # these settings control the output of Dumper; see "man Data::Dumper" $Data::Dumper::Terse = 1; $Data::Dumper::Indent = 0; my %grade = ( A => 4.0, B => 3.1, C => 2.0, D => 1.0 ); print q/$grade{'A'}=/, $grade{'A'}, "\n"; $grade{B} = '3.0'; # can omit the quotes on the key here print '$grade{B}=', $grade{B}, "\n"; print "\n"; # observe output order, compare values of B and C print "%grade: ", Dumper(\%grade), "\n"; # A hash can be created exclusively from assignments my %weight; $weight{John} = 180; $weight{Ellen} = 135; print "%weight: ", Dumper(\%weight), "\n" ; print "\n"; # when you use qw, careful not to put quotes, commas, etc. my %age = qw( John 50 Joe 34 Ellen 15 Marty 44 ); # loop through key/value pairs using while/each print "%age: "; while (my ($key, $value) = each %age) { print "$key => $value "; } print "\n"; my @age_keys = keys %age; my @age_values = values %age; print "%age keys: ", join( ',', @age_keys ), "\n"; print "%age values: ", join( ',', @age_values ), "\n"; print "\n"; # loop through key/value pairs using foreach on keys print "%age: "; foreach (keys %age) { print "$_ => $age{$_} " } print "\n"; # delete works correctly for hashes delete $age{Joe}; print "%age after deleting 'Joe': ", Dumper(\%age), "\n" ;

References (Pointers)

The elements in an array or hash can only be scalars. Perl creates complex data structures through the use of references, or pointers to arrays or hashes. References are scalars and thus can be held in arrays or hashes. An array reference is created by either: and a hash reference is created by either: In general you have to deference or cast a reference to be able to use it as the original array or hash. The Perl deference operator is the binary "->"; one can use it only to access an element through a reference. The cast operators are array cast, @{ }. and hash cast, %{ }, used to cast references to the aggregates. If we have:
$aref = [ "AA", 22 ];
$href = { DD => 11, MM => 31 };
then
$aref->[0] is "AA"  and  $href->{MM} is 31
@{$aref} is ("AA",22)  and  %{$href} is ( DD=>11, MM=>31 )
When an array or hash element is a pointer (to another array or hash), then the connecting dereference operator can be omitted. For example, if @a = ([ qw/AA BB/ ], [ qw/XX YY/ ]), then
$a[1]->[0] is "XX"  and so is  $a[1][0]',
The following sample program illustrates these points.

references.pl
#!/usr/bin/perl -w use strict; use Data::Dumper; $Data::Dumper::Terse = 1; $Data::Dumper::Indent = 0; my $aref = [ "AA", 22 ]; my $href = { DD => 11, MM => 31 }; print '$aref', "\t\t\t", Dumper($aref), "\n"; print '$href', "\t\t\t", Dumper($href), "\n"; print '$aref (literally)', "\t", "$aref\n"; print '$href (literally)', "\t", "$href\n"; print '$aref->[0]', "\t\t", "$aref->[0]\n"; print '$href->{MM}', "\t\t", "$href->{MM}\n"; my @a = @{$aref}; my %h = %{$href}; print '@{$aref}', "\t\t", join(",", @a), "\n"; print '%{$href}', "\t\t", join(",", %h), "\n"; my @mix = ( $aref, [ 22..25 ] ); # an array of arrays print '@mix', "\t\t\t", Dumper(\@mix), "\n"; print '$mix[0]->[0]', "\t\t", $mix[0]->[0], "\n"; print '$mix[1][1]', "\t\t", $mix[1][1], "\n";

Package structure

A Perl package, or module, is most often a group of functions and variables which can be imported by a Perl program. Additionally, Perl has sytactic support to make a package act like a class. Like a class, it must declare itself to be a package:
package SomePack;
and then the file must be SomePack.pm.

Perl packages are usually well documented. The executable, perlpod, (like Java's javadoc) take special internal documentation and generates external documentation in a variety of formats such as man pages, HTML, etc. The documenting comments look like this:
=SOME_TOKEN other stuff
  ... Comments
=cut
The comment-start line is one in which the "=" must be left justified and immediately followed by an alphanumeric token. The comment-end line is the "=cut" line exactly as seen. These comments can be used anywhere but they are most commonly used in packages with the aim of generating external documentation. The array @INC is the package-path variable; it controls where Perl looks for packages or directories containing packages. The hash %INC then indicates the mapping of package name to the actual location of the package. You can add your own directory to @INC as follows:
use lib <SOME DIRECTORY>;
This has the effect of adding this new directory to the front of the search path, ensuring that a package in this directory will actually be found in this directory. For example, to ensure you find a package in the script's current directory, use:
use lib ".";
because normally "." is the last component of the @INC search path.

Any function defined in a package can be accessed by a program which imports that package. Variables defined in a package can only be accessed outside if they are declared using the our keyword rather than the "local-only" my keyword. For example, if, we have the package
package SomePack;

our $var = 4;

sub foo { ...
}
Then we can always access these entities with the SomePack:: accessor as follows:
#!/usr/bin/perl -w
use SomePack;

SomePack::foo();
print $SomePack::var;   # note how the variable retains its "$" prefix.

Exporting entities

Perl can use certain package-defined variables and functions without the package name accessor prefix. To do so, the package must be an Exporter, meaning, as in object heirarchies, the package extends the Exporter package. In Perl terms we express this as:
package SomePack;
require Exporter;

our @ISA = ( "Exporter" );  
The array ISA (IS-A) expresses all the packages from which it inherits. This could conceivably be more than one, but this would mean mulitple inheritance. Once this has been established, there are three shared variables which define what can be exported:

sample package and usages

The example packages illustrates all three ways of creating export lists of variables and functions.

testpack.pm
package testpack; use strict; use Exporter; our @ISA = qw( Exporter ); our @EXPORT = qw( $x @a foo ); # same as ( '$x', '@a', 'foo' ) our @EXPORT_OK = qw( foo bar $y %h ); our %EXPORT_TAGS = ( 'one' => [ qw( foo $y ) ], 'two' => [ qw( foo bar @a %h ) ], ); our $x = 77; our $y = 88; our @a = qw( This Is My List ); our %h = ( Jim => 12, Bob => 7 ); sub foo { return "foo"; } sub bar { return "bar"; } return 1;
All variables and functions in this package can be accessed with the testpack:: prefix. The issue is which ones can be accessed without this prefix.
  1. This usage automatically exports all on the @EXPORT list and no others.
    use testpack;
    
    Only $x, @a and foo need not use the testpack:: prefix.
  2. This usage restricts exportables to the @EXPORT_OK list.
    use testpack qw( $y %h bar );
    
    Only $y, %h and bar need not use the testpack:: prefix.
  3. This usage (with ":" prefix) means we consult the $EXPORT_OK{:one} list.
    use testpack qw( :one );  
    
    Only $y and foo need not use the testpack:: prefix.
  4. This usage means we consult the $EXPORT_OK{:one} and $EXPORT_OK{:two} lists
    use testpack qw( :one :two );
    
    In this case all exportables except $x need not use the testpack:: prefix.

Functions

The declaration of a function is done with the sub keyword. Functions are normally defined in packages and imported into the Perl program with a use command. In contrast to Bash, Perl functions can be defined in the script after the usage (because Perl is compiled), but in that case the call to the function usage must employ parentheses.

Perl functions have no named parameter lists. Instead, parameters are accessed through the default array, @_. For example, this suggests common ways of capturing the arguments inside a function:
my ($x,$y,$z) = @_;  # capture first 3 "named" argument
my $x = shift;       # capture first argument, the @_ argument is implicit
Keep in mind that hashes and arrays must be passed by reference to a function in order to retain their integrity. For example, we may have:
@a = (12, 5, 77); @b = (6, 44);

foo1(@a, @b);        # same as foo1(12, 5, 77, 6, 44)
foo2(\@a, \@b);      # here arrays are not merged

sub foo2 {
  my ($aref, $bref) = @_;     # we know reference are coming in
  my @a = @{$aref};           # cast the references to arrays
  my @b = @{$bref};           # in order to work with them
}

Object-oriented package usage

Perl provides many syntactically different ways to access the functions of a package including an object-oriented style. What Perl calls an object is a blessed hash reference, which is a hash reference with package name associated with it. The Perl operation bless creates this association of a package name to a hash reference. The following example illustrates the point.

MyPack.pm
package MyPack; sub foo { print "foo(", join(", ", @_), ")\n"; } return 1;

useMyPack.pl
#!/usr/bin/perl -w use MyPack; # MyPack.pm my $ref = { a => 1 }; print '$ref', "\t\t\t\t", $ref, "\n"; bless $ref, "MyPack"; print 'after: bless($ref,"MyPack")', "\t", $ref, "\n\n"; print '$ref->foo(1)', "\t"; $ref->foo(1); print "\n"; print 'MyPack::foo(2) =>', "\t"; MyPack::foo(2); print 'MyPack->foo(3) =>', "\t"; MyPack->foo(3); print 'foo MyPack(4) =>', "\t"; foo MyPack(4);
The execution output of useMyPack.pl is this:
$ref                            HASH(0x8939c28)
after: bless($ref,"MyPack")     MyPack=HASH(0x8939c28)

$ref->foo(3)    foo(MyPack=HASH(0x8939c28), 1)

MyPack::foo(1)  =>  foo(2)
MyPack->foo(2)  =>  foo(MyPack, 3)
foo MyPack(4)   =>  foo(MyPack, 4)
Because the hash reference $ref is blessed with the name MyPack, it can then be used to call a function in MyPack in an object-oriented style:
$ref->foo();
The last three statements illustrate additional ways of calling a function in a package. The first, with "::" is familiar. The last two represent equivalent object-oriented function calls in which the package name is passed in as a string to the function. In particular, we can emulate typical object-oriented syntax. Perl has no built-in new function, but we can create one and use it in the "usual" manner. Consider the following example:

MyClass.pm
package MyClass; sub new { my ($classname, $arg) = @_; # params: "MyClass", $arg my $ref = { data => $arg }; bless $ref, $classname; # $classname is "MyClass" return $ref; } sub getData { my ($this) = @_; # the object is the parameter return $this->{data}; } return 1;
This program represents some of the "usual" object-oriented statements:

useMyClass.pl
#!/usr/bin/perl -w use strict; use MyClass; my $x = new MyClass(1); my $y = MyClass->new(2); print '$x->getData()', "\t", $x->getData(), "\n"; print '$y->{data}', "\t", $y->{data}, "\n";
As you see, the MyClass package is acting like a class of sorts. The notation "$y->{data}" in which we access the value of the "data" as a key is analogous to the Java "y.data", or C++ "y->data" in which we would access the value of the data as a class member. Of course, there is no data privacy since we can skip the use of getData and use the data key directly.

Regular expression operations

There are five principle operations which use regular expressions:
  1. match a string with the pattern, giving a boolean (yes/no) result
  2. extract portion(s) of a string which matches the pattern
  3. substitute portion(s) of a string which matches the pattern
  4. split a string into an array by removing portions which match the pattern
  5. extract a subarray of array elements which match the pattern
The Perl match and substitute operations use the dedicated operator =~ like this:
$string =~ [Pattern expression]   and    $string !~ [Pattern expression]
The [Pattern expression] uses a notation inspired by the UNIX sed executable. Match and substitute employ single-character options to control the matching; some common options are these:

match/substitute

The match operation is written in one of the two forms (positive/negative match):
$string =~ m/reg-expr/options      and      $string !~ m/reg-expr/options
where the reg-expr is a Perl regular expression. The right-hand side can be expressed in a number of equivalent ways:
$string =~ m@pattern@options     also %, ^, &, #, !, |, ~, ;, :, +, -, =
$string =~ m(pattern)options     also { }, [ ], < >
$string =~ /pattern/options      "m" unnecessary when "/" is used
When the resulting expression is used in a scalar context, the "=~" operator delivers a true/false result of the success of the match. The "!~" operator delivers the opposite boolean result.

The substitute operation is written similarly:
$string =~ s/reg-expr/replacement/options
$string !~ s/reg-expr/replacement/options
The main difference is that $string by the replacements when a match occurs.

Portions of, or the entire pattern, can be defined as a string external from the operator usage and then inserted into the pattern. This is essentially the only way to use regular expressions in Java. In this case one has to remember to add, in the external pattern string, additional escapes in front of common escape sequences such as \s, \d, etc.

Subpattern matches

When the match operations are used in an array context the resulting array represents the list of all matching substrings used in the parenthesized portions. This array context is particularly useful when the "g" (general) option is used. In that way it can capture all substrings of a string which match a pattern.

The parenthesized portions of a pattern in a successful match define back-reference match variables, $1, $2, $3, ..., consisting of the matched substring portions corresponding to the parethesized pattern. The order of the matches is based on the order of the left parentheses.

When the match operations are used in an array context the resulting array represents the list of all matching substrings used in the parenthesized portions. The array context is particularly useful when the "g" (general) option is used in order to can capture all substrings of a string which match a pattern.

match.pl
#!/usr/bin/perl -w use strict; my $str = "AA474"; print '$str', "\t" x 6, "'$str'\n"; # the "x" operator means string-repeat print '$str matches ', '\\w\\d{2}', "\t" x 4; if ( $str =~ /\w\d{2}/ ) { print "T\n"; } else { print "F\n"; } my $pat = '^\\d+$'; # "\"'s needed in front of special chars print '$str matches ', $pat, "\t" x 4; if ( $str =~ /$pat/ ) { print "T\n"; } else { print "F\n"; } print "\n"; $str = " AaA: 272xy7-88"; print '$str:', "\t" x 6, "'$str'\n"; print 'match: /(a+)\\s*:\\s*(\\d+)\\s*(\\w+)/i', "\t" x 2; $str =~ /(a+)\s*:\s*(\d+)\s*(\w+)/i; print "\$1=$1, \$2=$2, \$3=$3\n"; print "\n"; my @all; @all = ( $str =~ /(a+)\s*:\s*(\d+)\s*(\w+)/i ); print 'all matches:', "\t" x 5, "@all\n"; @all = ( $str =~ /(\d+)/g ); print 'all digit string matches:', "\t" x 3, "@all\n";

substitute.pl
#!/usr/bin/perl -w use strict; my $str = " 234aaAA 22 3 "; my $copy; print '$str', "\t" x 5, "'$str'\n"; $copy = $str; $copy =~ s/\d+/***/; print 'after: s/\\d+/***/', "\t" x 3, "'$copy'\n"; $copy = $str; $copy =~ s/\d+/***/g; print 'after: s/\\d+/***/g', "\t" x 3, "'$copy'\n"; $copy = $str; $copy =~ s/(\d+)/'*' x length($1)/eg; print q<after: s/(\d+)/'*' x length($1)/eg>, "\t", "'$copy'\n";

substitute with back references and evaluation

The Perl back reference variables can also be inserted into pattern sustitution definitions. Furthermore, the substitution portion can be evaluated as an expression before the substitutions is made. This gives a fantastically general way of making substitutions. For example, the following program locates each "free-standing" number and sustitutes the number multiplied by 10:

multiplynums.pl
#!/usr/bin/perl -w use strict; # this pattern means: all digit sequence on a word boundary my $pattern = "\\b(\\d+)\\b"; my $string = "22, u4: 18, j2: 34"; print "$string\n"; $string =~ s/$pattern/$1*10/eg; print "$string\n";

minimal vs. maximal matchine

There are situations, with match/extraction and substitution where you want the repetitive constructions * and + to generate minimal matches (the smallest number of characters possible) instead of the default maximal matches (the largest number of characters possible).

Creation of a minimal repetition is effected by appending a ? to the * or + operator, getting the operators *? and +?, respectively. The comparison is:
A.*A     as many characters between two A's, possibly including other A's
A.*?A    as few characters between two A's thus not including other A's
Note that the latter is not the same as:
A(.*)?A
which is effectively the same as the former. A more useful example is the one we have here, where the task is to find the C-style comments (between /* and */) within a program:

minmaxsubs.pl
#!/usr/bin/perl use strict; my $sample1 = <<END Here is a text with some /* C-style comments on multiple line */ not just above, but also below: /* this will become part of a single maximal match */ to illustrate a point END ; my $sample2 = $sample1; # using the maximal match $sample1 =~ s/\/\*(.|\n)*\*\//COMMENTS/g; print "1 ------- maximal -------\n$sample1\n"; # using the minimal match $sample2 =~ s/\/\*(.|\n)*?\*\//COMMENTS/g; print "2 ------- minimal --------\n$sample2\n";
The syntactic issue mess here is due to the fact that both "/" and "*" are special characters in the pattern.

split

The split function applies a regular expression pattern to a string and creates an array of scalars by removing strings defined by the regular expression pattern. The calls are these:
@array = split /pattern/, $string
@array = split /pattern/, $string, limit
One very common operation is to extract the non-whitespace sequences from a string, $str. There are two ways to do it expressed here:
@A = ( $str =~ m/(\S+)/g );
# or
@A = split ' ', $str;  # single-blank split
The single-blank split has a special significance. What happens is that the string is trimmed first, and then split on whitespace sequences. Another very common usage of split is to capture the lines in the output of a command into an array. For example:
@my_file_lines = split "\n", `cat lines.txt`;

grep

The grep function is applied to arrays. It returns a subarray of all elements which match the regular expression pattern. The call is:
@subarray = grep /pattern/, @array
One common usage is to find an element in an array. For example
@A = (33, 5, -3, 18, 533, 22, -99, 88 );
$x = some_integer;
if ( grep /^$x$/, @A ) { print "$x is in the array\n"; }


© Robert M. Kline