Lesson 22a: Regular Expressions: Overview

Regular expressions is a language you can use within perl to identify patterns in text.

A regular expression is a string template against which you can match a piece of text. They are something like shell wildcard expressions, but much more powerful.

Examples of Regular Expressions (more details to follow!!)

This bit of code loops through each line of a file. Finds all lines containing an EcoRI site, and bumps up a counter:

Code:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
#!/usr/bin/perl -w
#file: EcoRI1.pl
 
use strict;
 
my $filename = "example.fasta";
open (FASTA , '>' ,"$filename") or print "$filename does not exist\n";
my $sites;
 
while (my $line = <FASTA>) {
  chomp $line;
 
  if ($line =~ /GAATTC/){
    print "Found an EcoRI site!\n";
    $sites++;
  }
}
 
if ($sites){
  print "$sites EcoRI sites total\n";
}else{
  print "No EcoRI sites were found\n";
}
 
#note: if $sites is declared inside while loop you would not be able to
#print it outside the loop

Output:~]$ ./EcoRI1.pl
Found an EcoRI site!
Found an EcoRI site!
.
.
.
Found an EcoRI site!
Found an EcoRI site!
34 EcoRI sites total

This does the same thing, but counts one type of methylation site (Pu-C-X-G) instead. /[GA]C.?G/

- G or an A

[GA]

- followed by a C

C

- followed by one of anything, but could be nothing

.?

- followed by a G

G

Code:

1
2
3
4
5
6
7
8
9
10
11
12
13
#file:methy.pl
while (my $line = <FASTA> ) {
  	chomp $line;
 
  	if ($line =~ /[GA]C.?G/){
    	  $sites++;
  	}
}
if ($sites){
	print "$sites Methylation Sites total\n";
}else{
	print "No Methylation Sites were found\n";
}

Output:

~]$ ./methy.pl
723 Methylation Sites total

Print Friendly

Leave a Reply

Your email address will not be published. Required fields are marked *