SHARE:
Uncategorized

Regular Expressions, Say What (/s|..$/) ?

Flatiron School / 22 October 2013

The following is a guest post by Bandana Malik and originally appeared on her blog. Bandana is currently in the Ruby-003 class at The Flatiron School. You can follow her on Twitter here.

What is a Regular Expression?

In programming languages, a regular expression is a pattern that can be matched against a string. The origins of the term Regular Expression go back to a language classification when linguist Noam Chomsky developed a model to classify languages according to a given set of rules. These included regular languages, context free languages, context-sensitive languages, and recursively enumerable languages. Mathematician Stephen Kleen further formalized concept of Regular Expressions for programming languages and Unix. 

So.. What is a regular language?

Imagine you have an abstract machine made of lego. Our lego machine can run through a finite number of states for a given operation to produce a unique computation. A regular language, then is a language that can pass through this our lego machine, which we’ll call a Finite State Machine. Similarly combination locks that had a finite amount of memory or a fixed number of states (right, left, and down). If we input a strings of numbers, the set of strings that could passed would be regular language. And all Regular Languages contain Regular Expressions.

Though this is a little abstract, for our purposes Regular Expressions in programming are used to read data files and parse through specific data that we want to add or omit. There a some version of a regular expressions in every programming language, such as Javascript, Java, C#, and, lucky for us, the Ruby language has built in Regular Expression capabilities (of course, it’s Ruby :).

Common Types of Regular Expressions

We’ll focus on four broad categories of Regular Expressions including Anchors,Characters and WhitespacesCharacter Classes and Subexpression Modifiers. Though a variety of methods like gsub, match, split, can be used with Regular Expressions, we’ll be using the scan method to get a basic understanding of how the above categories really work.

Sound fun? Let’s try it!

Anchors

Anchors, allows us to search for or return a part of its string based on its location. 

So if we had the following string:

We can use the Regular Expressions as anchors as seen below:

Characters and Whitespaces

Regular Expressions for characters and whitespaces allow us to match a character, a digit, a whitespace, or an underscore against a given string.

Here are some of the most helpful of these expressions:

So if we had the following yoda string:

We could do a scan for virtually any character in the string we want to return:

Character Classes

Character classes allow you to match a specific set of characters. So if we had the following string:

We could scan the string for the following classes of characters:

Subexpression Modifiers

Subexpression modifiers allow us to match the occurrences of a specific character. 

Here’s our yoda string again: 

And here we use the following Regular Expressions to parse through our code:

That was fun, wasn’t it?

Finally, I leave you with a lego machine. A Turing Machine, known as the first computer, a bit more sophisticated than our machine but, very, very cool.

The Inner Game: Learning Faster by Ignoring Your Inner Critic Previous Post RSpec: A Misunderstood Complement to Ruby Next Post