Perl Extras
— print (last updated: Jun 23, 2009) print

Select font size:
Download the perl-extras.zip archive.

Command-line parameters

Perl has a special array variable @ARGV (0 based) which holds the command-line parameters. Like Bash, $0 is the script name (full path) itself. If the parameters are files, Perl provides a simple syntax for accessing the contents of the files, line by line, using the syntax:
<ARGV> 
In this modality, ARGV is called a file handle and the bracket syntax <..> represents "the next line" from the files in the command-line parameter list. Consider this Perl script:

dump-files.pl
#!/usr/bin/perl foreach (<ARGV>) { print $_; };
Try this execution:
$ dump-files.pl dump*
The version accesses the lines of standard input using the STDIN file handle:

dump-stdin.pl
#!/usr/bin/perl foreach (<STDIN>) { print $_; };
Try these executions:
$ dump-stdin.pl < dump-stdin.pl
$ cat dump* | dump-stdin.pl
Finally, Perl provides the empty file handle which represents either ARGV if there are command-line arguments or STDIN if there are none:

dump-any.pl
#!/usr/bin/perl foreach (<>) { print $_; };
Try these executions:
$ dump-any.pl dump*
$ dump-any.pl < dump-any.pl
$ cat dump* | dump-any.pl
Because the line-by-line processing of input is such a standard way to manipulate text files, many Perl scripts use this structure as an outline.

foreach vs. while

The foreach loops can be replaced by while loops with similar, but slightly different behavior. For example, consider this program compared to the "foreach" version above:

dump-stdin-while.pl
#!/usr/bin/perl while (<STDIN>) { print $_; };
The difference is that the while version will immediately process input on a line-by-line basis whereas, using the foreach version, standard input must be "complete" before all lines are processed. Compare the executions to illustrate the differences:
$ dump-stdin.pl
First
Second
Ctrl-D    (terminate)
vs.
$ dump-stdin-while.pl
First
Second
Ctrl-D    (terminate)

Capturing options with getopts

Perl also has a getopts feature which can extract the command-line options; the feature is part of the auxiliary package Getopt. Here is a sample test program which illustrates these features:

arg-opts.pl
#!/usr/bin/perl -w use strict; use File::Basename; use Getopt::Std; use Data::Dumper; $Data::Dumper::Terse = 1; $Data::Dumper::Indent = 0; print "this script: ", basename($0), "\n"; print "arguments: @ARGV\n"; # if getopts sees an unknown option, it returns false otherwise, returns true my %options; getopts( "qos", \%options ) or die("*** options error\n"); print "\n%options = ", Dumper(\%options), "\n"; # the presence of an option can be tested as follows: print "\n-q option: "; if ( defined $options{q} ) { print "defined\n"; } else { print "not defined\n"; } # getopts automatically pulls out the options out up # to the first non-option argument print "\narguments after options: @ARGV\n"; my $non_opt_arg = shift @ARGV; if (defined $non_opt_arg) { print "\nextract non-option argument: $non_opt_arg\n"; } else { exit; } print "\nafter non-opt. arg., remain: @ARGV\n"; # for options which expect arguments, if argument is # not present, it returns false %options = (); getopts( "d:", \%options ) or die("*** options error\n"); print "\n%options = ", Dumper(\%options), "\n"; print "\nafter options, remaining arguments: @ARGV\n";
To test it, try running these commands:
$ args-test.pl -q -o  FILE.zip -d /usr/local
$ args-test.pl -o  FILE.zip -d /usr/local
$ args-test.pl -qs  FILE.zip -d/usr/local
$ args-test.pl -x  FILE.zip -d/usr/local
$ args-test.pl -o  FILE.zip -d

System calls

Perl allows access to commands in the underlying operating system in several ways:
  1. the system command: execute a command (in a subshell)
  2. shell-evaluated quotes
  3. the exec command: execute a command; do not return to the calling program
For the first two, the exit status is available, and, like in Bash, the variable $? holds its value. In contrast, a successful exec never returns to the program.

system.pl
#!/usr/bin/perl -w use strict; my $command; $command = "ls"; print "==> $command\n"; if (system("$command > /dev/null 2>&1") == 0) { print " success, exit status = $?\n"; } else { print " failure, exit status = $?\n"; } $command = "ls xxxxx"; print "==> $command\n"; my $output = `$command`; if ( $? == 0 ) { # you can also use this terse syntax: if ( !$? ) print " success, exit status = $?\n"; } else { print " failure, exit status = $?\n"; }
You have to understand that system calls (executed by a shell) do not have side-effects for subsequent commands. For example, in a shell script you can change the scripts working directory by executing:
cd /home
whereas in Perl, if you were to use
system "cd /home";
it would have no effect on the working directory of the Perl script. What you mean to use is Perl's own chdir command:
chdir "/home";

File Access

Assuming we're running in a UNIX-like environment, one way to read a file, say the file "system.pl", into a Perl program is as follows:
$contents = `cat system.pl`;
Once we have this contents we can choose to iterate through the lines of this file as follows:
@lines = split /\n/, $contents;
foreach my $line (@lines) 
{
   # do something with $line
}
The only problem with this approach is that it is system dependent, i.e., it won't work on, say, a Windows system where there is no cat operation. Perl, of course, does permit system-independent file access. The operation to open system.pl for read access goes like this:
open F, "system.pl" or die( "can't open system.pl\n" );
The symbol F is called a file handle, and is one of the few accepted usages of a bareword. The validity of the open (especially open for reading) operation should always be checked. With F we could do:
$line = <F>;   # read a line from F
or read all the lines from the file by using <F> in array context:
@lines = <F>;  # read all lines from F into the @lines array

The open operation also can be used to open a file for writing, concatenation, etc. by prefacing the file name with characters ">", ">>", etc., respectively.

If we open a file for writing, say,
open F, ">some_file";
then we can using the print operation to write to it, as follows:
print F "some line\n";
The trick is to make sure to avoid putting a comma after the file handle, F. A file handle also supports a read operation which avoids reading line-by-line, and so would be more appropriate to reading/writing a binary file. The following program illustrates several usages of reading and writing files using file handles.

read-write.pl
#!/usr/bin/perl -w use strict; my $input_file = shift; die "missing file argument\n" unless defined $input_file && -f $input_file; my $output_file = "SampleOut"; my $contents; open F, "$input_file" or die("can't open $input_file"); read F, $contents, -s "$input_file"; close F; print "\n===>> printing $input_file to standard output\n"; print $contents; print "\n===>> printing $input_file to $output_file\n"; open G, ">$output_file"; print G $contents; close G; open F, "$input_file" or die("can't open $input_file"); print "\n===>> enumerate $input_file to standard output\n"; my $lineno = 0; foreach my $line (<F>) { printf "%03d: ", ++$lineno; # add line number in front print $line; } close F;
A simple test-run of this program with minimal output is:
$ read-write.pl dump-any.pl
$ cat SampleOut
Bash gives the ability to redirect standard I/O with the > and < operators. The equivalent can be done in Perl by opening the special file handles STDIN, STDOUT, STDERR. Here is a sample program controlling standard output.

redirect-stdio.pl
#!/usr/bin/perl use strict; print "One\n"; print "Two\n"; open STDOUT, ">TestFile"; print "Three\n"; print "Four\n";

String operations

There are many functions and operations for creating and manipulating strings:
  1. length: the string length
  2. x: string repeat operator, e.g.:   "A" x 5   is   "AAAAA"
  3. index & rindex: search for substrings
  4. substr: extract and replace substrings
  5. sprintf: create a substring by insertion into key string; same idea as the C function of the same name
Below is an example using the index function. The call:
index $string, $substr, $startindex
returns the first occurrence of a substring, $substr, within a $string, starting from $startindex (default is 0). If no such occurrence, -1 is returned. The rindex function does the same operation, except in reverse, starting from the end of the string. Here is a sample usage that finds all occurrences of a substring within a string using repeated calls to index.

search.pl
#!/usr/bin/perl -w use strict; my $line = "The rain in Spain falls mainly on the plain."; my $search = "ain"; my $ind; print "search string: $line\n\n"; $ind = index $line, $search; while($ind != -1) { # a failed find gives returns index -1 print "next occurrence of '$search' is at postion $ind\n"; $ind = index $line, $search, $ind+1; }
The substr function computes the substring of a given string starting and ending at given indices:
substr $line, $start, $length;
If $length is omitted, it is assumed to be the maximum possible length.

The substr function can also be used to replace the extacted portion by another string simply by assigning a new substring to the function. From the perspective of programming language theory, we say that the substr has an lvalue (location value), and so can be assigned to. Here is an example program which illustrates these points.

substr-replace.pl
#!/usr/bin/perl -w use strict; my $line = "abcdefghijklmnopqrstuvwxyz"; print "\$line: $line\n\n"; print "substr(\$line,0,6)=", substr($line, 0, 6), "\n"; print "substr(\$line,6,3)=", substr($line, 6, 3), "\n"; print "substr(\$line,9) =", substr($line, 9), "\n"; my $replace = "+++++++++"; print "\$replace=$replace\n"; # Replace at postion 8 a substring of the replacement's length substr($line, 8, length($replace)) = $replace; print "after replacement, \$line=$line\n"; # Delete a substring of length 4 at position 3 substr($line, 3, 4) = ""; print "after another replacement, \$line=$line\n";


© Robert M. Kline