Is a Regex the Same as a Regular Expression?


Yes. And No. At this stage, this is a semantic question—it depends on what one means by regular expression.

Nowadays, 99 percent of people who mention regular expressions are really speaking about regex. For them (and for Rex), regex is an abbreviation of regular expression. Another common abbreviation (which is losing the abbreviation war) is regexp. Common plurals are regexes, regexps and even regexen (thank you, Larry Wall).


(direct link)

A Short History of Regex

What about the hundredth person? Actually, the proportion is probably closer to one in a thousand. But that one person has a special claim, because a long long time ago (the 1950s, at the dawn of computer science) a regular expression referred to what the mathematician Stephen Kleene described as a regular language, which itself referred to a mathematical property called regularity. Regular expression engines that conformed to this regularity were called Deterministic Finite Automatons (DFAs). The name of the father of regular expressions (Stephen Kleene) is immortalized in the Kleene star, the small character in A* that tells the engine that the character A must be matched zero or more times.

In the late 1960s Ken Thompson of Bell Labs wrote them into the editor QED, and in the 1970s they made it into Unix programs and utilities such as grep, sed and AWK.

These tools made text-matching much easier than the alternative—writing custom parsing programs for each task. Naturally, some saw the potential for even more powerful text-matching patterns. In the 1980s, programmers could not resist the urge to expand the existing regular expression syntax to make its patterns even more useful—most notably Henry Spencer with his regex library, then Larry Wall with the Perl language, which used then expanded Spencer's library.

The engines that process this expanded regular expression syntax were no longer DFAs—they are called Non-Deterministic Finite Automatons (NFAs). At that stage, regex patterns could no longer said to be regular in the mathematical sense. This is why a small minority of people today (most of whom have email addresses ending with .edu) will maintain that what we call regex are not regular expressions.

For the rest of us… Regex and regular expressions? Same-same.

Perl had a huge influence on the flavors of regular expressions used in most modern engines today. This is why modern regular expressions are often called Perl-style. The differences in features across regex engines are considerable, so in my view speaking of Perl-style regular expressions only makes sense when one wants to make it clear one is not talking about the ivory tower brand of mathematically-correct expressions.

But if you really want to avoid ambiguity, just say regex, as that is one word that white-coat computer scientists are not claiming.



next
 Quick-Start: Regex reference table



1-1 of 1 Threads
Thomas McLeod – Boston, Mass.
August 12, 2015 - 08:55
Subject: Please check your theory

DFAs and NFAs have equivalent expressive power. In other words, for every NFA, there is an equivalent DFA, and visa-versa. So it is incorrect to say that these "regex patterns could no longer said to be regular in the mathematical sense. " They in fact do define regular languages. Some "regex" engines, however do go beyond regular languages to define "context-free" languages. These cannot be represented by an NFA (or DFA) and require a push-down automata (PDA).
Reply to Thomas McLeod
Rex Dwyer
March 21, 2016 - 09:29
Subject: Thomas, a bit muddled...

Thomas,
That's sort of right but sort of muddled. Deterministic Finite Automata, Nondeterministic Finite Automata, and regular expressions all generate/recognize exactly the set of regular languages. However, the "regular expressions" in programming languages might better be described as "extended regular expressions". The language of all strings with some number of As followed by a B followed by the same number of As is not a regular language (pumping lemma). However, it is accepted by the following regex: (A*)B\1
"Automata" is plural. The singular is "automaton".


Leave a Comment






All comments are moderated.
Link spammers, this won't work for you.

To prevent automatic spam, we require that you type the two words below before you submit your comment.