What is a Regular Expression?
In programming languages, a regular expression is a pattern that can be matched against a string. The origins of the term Regular Expression go back to a language classification when linguist Noam Chomsky developed a model to classify languages according to a given set of rules. These included regular languages, context free languages, context-sensitive languages, and recursively enumerable languages. Mathematician Stephen Kleen further formalized concept of Regular Expressions for programming languages and Unix.
So.. What is a regular language?
Imagine you have an abstract machine made of lego. Our lego machine can run through a finite number of states for a given operation to produce a unique computation. A regular language, then is a language that can pass through this our lego machine, which we’ll call a Finite State Machine. Similarly combination locks that had a finite amount of memory or a fixed number of states (right, left, and down). If we input a strings of numbers, the set of strings that could passed would be regular language. And all Regular Languages contain Regular Expressions.
Common Types of Regular Expressions
We’ll focus on four broad categories of Regular Expressions including Anchors,Characters and Whitespaces, Character Classes and Subexpression Modifiers. Though a variety of methods like gsub, match, split, can be used with Regular Expressions, we’ll be using the scan method to get a basic understanding of how the above categories really work.
Sound fun? Let’s try it!
Anchors, allows us to search for or return a part of its string based on its location.
So if we had the following string:
We can use the Regular Expressions as anchors as seen below:
Characters and Whitespaces
Regular Expressions for characters and whitespaces allow us to match a character, a digit, a whitespace, or an underscore against a given string.
Here are some of the most helpful of these expressions:
So if we had the following yoda string:
We could do a scan for virtually any character in the string we want to return:
Character classes allow you to match a specific set of characters. So if we had the following string:
We could scan the string for the following classes of characters:
Subexpression modifiers allow us to match the occurrences of a specific character.
Here’s our yoda string again:
And here we use the following Regular Expressions to parse through our code:
That was fun, wasn’t it?
Finally, I leave you with a lego machine. A Turing Machine, known as the first computer, a bit more sophisticated than our machine but, very, very cool.
Make yourself useful.