The following is a guest post by Bandana Malik and originally appeared on her blog. Bandana is currently in the Ruby-003 class at The Flatiron School. You can follow her on Twitter here.
What is a Regular Expression?
In programming languages, a regular expression is a pattern that can be matched against a string. The origins of the term Regular Expression go back to a language classification when linguist Noam Chomsky developed a model to classify languages according to a given set of rules. These included regular languages, context free languages, context-sensitive languages, and recursively enumerable languages. Mathematician Stephen Kleen further formalized concept of Regular Expressions for programming languages and Unix.
So.. What is a regular language?
Imagine you have an abstract machine made of lego. Our lego machine can run through a finite number of states for a given operation to produce a unique computation. A regular language, then is a language that can pass through this our lego machine, which we’ll call a Finite State Machine. Similarly combination locks that had a finite amount of memory or a fixed number of states (right, left, and down). If we input a strings of numbers, the set of strings that could passed would be regular language. And all Regular Languages contain Regular Expressions.
Though this is a little abstract, for our purposes Regular Expressions in programming are used to read data files and parse through specific data that we want to add or omit. There a some version of a regular expressions in every programming language, such as Javascript, Java, C#, and, lucky for us, the Ruby language has built in Regular Expression capabilities (of course, it’s Ruby :).
Common Types of Regular Expressions
We’ll focus on four broad categories of Regular Expressions including Anchors,Characters and Whitespaces, Character Classes and Subexpression Modifiers. Though a variety of methods like gsub, match, split, can be used with Regular Expressions, we’ll be using the scan method to get a basic understanding of how the above categories really work.
Sound fun? Let’s try it!
Anchors
Anchors, allows us to search for or return a part of its string based on its location.
So if we had the following string:
We can use the Regular Expressions as anchors as seen below:
Characters and Whitespaces
Regular Expressions for characters and whitespaces allow us to match a character, a digit, a whitespace, or an underscore against a given string.
Here are some of the most helpful of these expressions:
So if we had the following yoda string:
We could do a scan for virtually any character in the string we want to return:
Character Classes
Character classes allow you to match a specific set of characters. So if we had the following string:
We could scan the string for the following classes of characters:
Subexpression Modifiers
Subexpression modifiers allow us to match the occurrences of a specific character.
Here’s our yoda string again:
And here we use the following Regular Expressions to parse through our code:
That was fun, wasn’t it?
Finally, I leave you with a lego machine. A Turing Machine, known as the first computer, a bit more sophisticated than our machine but, very, very cool.
Written byFLATIRON SCHOOL
Make yourself useful.