Is a Regex the Same as a Regular Expression?
Yes. And No. At this stage, this is a semantic question—it depends on what one means by regular expression.
Nowadays, 99 percent of people who mention regular expressions are really speaking about regex. For them (and for Rex), regex is an abbreviation of regular expression. Another common abbreviation (which is losing the abbreviation war) is regexp. Common plurals are regexes, regexps and even regexen (thank you, Larry Wall).
A Short History of RegexWhat about the hundredth person? Actually, the proportion is probably closer to one in a thousand. But that one person has a special claim, because a long long time ago (the 1950s, at the dawn of computer science) a regular expression referred to what the mathematician Stephen Kleene described as a regular language, which itself referred to a mathematical property called regularity. Regular expression engines that conformed to this regularity were called Deterministic Finite Automatons (DFAs). The name of the father of regular expressions (Stephen Kleene) is immortalized in the Kleene star, the small character in A* that tells the engine that the character A must be matched zero or more times.
In the late 1960s Ken Thompson of Bell Labs wrote them into the editor QED, and in the 1970s they made it into Unix programs and utilities such as grep, sed and AWK.
These tools made text-matching much easier than the alternative—writing custom parsing programs for each task. Naturally, some saw the potential for even more powerful text-matching patterns. In the 1980s, programmers could not resist the urge to expand the existing regular expression syntax to make its patterns even more useful—most notably Henry Spencer with his regex library, then Larry Wall with the Perl language, which used then expanded Spencer's library.
The engines that process this expanded regular expression syntax were no longer DFAs—they are called Non-Deterministic Finite Automatons (NFAs). At that stage, regex patterns could no longer said to be regular in the mathematical sense. This is why a small minority of people today (most of whom have email addresses ending with .edu) will maintain that what we call regex are not regular expressions.
For the rest of us… Regex and regular expressions? Same-same.
Perl had a huge influence on the flavors of regular expressions used in most modern engines today. This is why modern regular expressions are often called Perl-style. The differences in features across regex engines are considerable, so in my view speaking of Perl-style regular expressions only makes sense when one wants to make it clear one is not talking about the ivory tower brand of mathematically-correct expressions.
But if you really want to avoid ambiguity, just say regex, as that is one word that white-coat computer scientists are not claiming.
Don't Miss The Regex Style Guide
and The Best Regex Trick Ever!!!
and The Best Regex Trick Ever!!!
Quick-Start: Regex reference table
Leave a Comment
The engines that process this expanded regular expression syntax were no longer DFAs—they are called Non-Deterministic Finite Automatons (NFAs). After (the wording may need a bit of additional reworking):
The engines that process this expanded regular expression syntax were no longer regular. They are certainly recursively enumerable, most likely also context-sensitive, and if so, most likely also context-free. I am slightly confused by whether or not context-free languages are also context-sensitive, given the wording. I'm pretty sure they are considered to be, though.
That's sort of right but sort of muddled. Deterministic Finite Automata, Nondeterministic Finite Automata, and regular expressions all generate/recognize exactly the set of regular languages. However, the "regular expressions" in programming languages might better be described as "extended regular expressions". The language of all strings with some number of As followed by a B followed by the same number of As is not a regular language (pumping lemma). However, it is accepted by the following regex: (A*)B\1
"Automata" is plural. The singular is "automaton".