Lesson 22g: Regular Expression : Global and other options

Regular Expression Options

Regular expression matches and substitutions have a whole set of
options which you can toggle on by appending one or more of the i,
m, s, g, e
or x modifiers to the end of the operation.

See Programming Perl
Page 153 for more information. Some example:

1
2
3
$string = 'Big Bad WOLF!';
print "There's a wolf in the closet!" if $string =~ /wolf/i;
# i is used for a case insensitive match
i
Case insensitive match.

g
Global match (see below).

e
Evalute right side of s/// as an expression.

m
Treat string as multiple lines. ^ and $ will match at start
and end of internal lines, as well as at beginning and end of
whole string. Use \A and \Z to match beginning and end of whole
string when this is turned on.

s
Treat string as a single line. “.” will match any character at
all, including newline.

o
Defining that a variable used as a pattern will never change, so perl will not attempt to interpolate the variable.

Global Matches

Adding the g modifier to the pattern causes the match to be
global. Called in a scalar context (such as an if or
while statement), it will match as many times as it can.

This will match all codons in a DNA sequence, printing them out on
separate lines:

Code:

1
2
3
4
  $sequence = 'GTTGCCTGAAATGGCGGAACCTTGAA';
  while ( $sequence =~ /(.{3})/g ) {
    print $1,"\n";
  }

Output:

GTT
GCC
TGA
AAT
GGC
GGA
ACC
TTG

If you perform a global match in a list context (e.g. assign
its result to an array), then you get a list of all the subpatterns
that matched from left to right. This code fragment gets arrays of
codons in three reading frames:

1
2
3
@frame1 = $sequence =~ /(.{3})/g;
@frame2 = substr($sequence,1) =~ /(.{3})/g;
@frame3 = substr($sequence,2) =~ /(.{3})/g;

The position of the most recent match can be determined by using the
pos function. The pos function returns the position where the next
attempt begins. Remember that pos will return in 0-base notation, the first postion is 0 not 1.
Code:

1
2
3
4
5
6
7
8
9
10
#file:pos.pl
my $seq = "XXGGATCCXX";
 
if ( $seq =~ /(GGATCC)/gi ){
  my $pos = pos($seq);
  print "Our Sequence: $seq\n";
  print '$pos = ', "1st postion after the match: $pos\n";
  print '$pos - length($1) = 1st postion of the match: ',($pos-length($1)),"\n";
  print '($pos - length($1))-1 = 1st postion before the the match: ',($pos-length($1)-1),"\n";
}

Output:

~]$ ./pos.pl
Our Sequence: XXGGATCCXX
$pos = 1st postion after the match: 8
$pos - length(GGATCC) = 1st postion of the match: 2
($pos - length(GGATCC))-1 = 1st postion before the the match: 1

Print Friendly

Leave a Reply

Your email address will not be published. Required fields are marked *