Using Regular Expressions with Ruby


Differences in Regex Features across Ruby Versions

Before we start, you should know that there were important breaks in regex support between Ruby versions 1.8, 1.9 and 2.0.

I won't say anything about version 1.8 except that it's the dark ages of Ruby regex. In version 1.9, the Onigurama engine became integrated with Ruby. Version 2.0 started using the Onigmo engine, a fork from Onigurama. This added some interesting features:

✽ Conditionals
✽ Recursion
\K to drop what was matched so far from the match to be returned
\R to match all line break characters including CRLF
\X to match a single Unicode grapheme


Ruby Quirk: (?m)

In all engines that support it—except for Ruby—the "dot matches at line breaks mode" (a.k.a single-line or DOTALL mode) is turned on by the (?s) inline modifier or the s flag. In Ruby, you turn it on with the (?m) inline modifier or the m flag.

This is confusing because in other flavors, the m stands for multi-line, which is the mode where the beginning- and end-of-string anchors ^ and $ are allowed to match on every line. In Ruby, ^ and $ always match on every line. If you want to specify the beginning of the string, use \A. For the very end of the string, use \z (or \Z to match at the end of the string or before the final line break, if any).


Other Ruby Quirks

I've been meaning to compile a list. I'll start with one item: unlike other engines, Ruby does not allow a lookahead or a negative lookbehind inside a lookbehind, such as (?<=(?<!A)A)


Character Class Intersection, Subtraction and Union

The syntax […&&[…]] allows you to use a logical AND on several character classes to ensure that a character is present in them all. Intersecting with a negated character, as in […&&[^…]] allows you to subtract that class from the original class. For details, on the page about character class operations, see character class intersection and character subtraction union in Java and Ruby.

Similarly, the syntax […[…]] allows you to use a logical OR on several character classes to ensure that a character is present in either of them. For details, see character class union in Java and Ruby on the page about character class operations.


About this Page

At the moment, I am not planning a fully fleshed-out guided tour of Ruby regex, although I certainly intend to add plenty of tasty material to this page over time. My pages are always in motion.

In the meantime, I don't want to leave you Ruby coders out dry, so I have something special to get you started.


A Ruby program that shows
how to perform common regex tasks

Whenever I start playing with the regex features of a new language, the thing I always miss the most is a complete working program that performs the most common regex tasks—and some not-so-common ones as well.

This is what I have for you in the following complete Ruby regex program. It's taken from my page about the best regex trick ever, and it performs the six most common regex tasks. The first four tasks answer the most common questions we use regex for:

✽ Does the string match?
✽ How many matches are there?
✽ What is the first match?
✽ What are all the matches?

The last two tasks perform two other common regex tasks:

✽ Replace all matches
✽ Split the string

If you study this code, you'll have a terrific starting point to start tweaking and testing with your own expressions with Ruby. Bear in mind that the code inspects values captured in Group 1, so you'll have to tweak… but you'll have a solid base to understand how to do basic things&and fairly advanced ones as well.

As you can imagine, I am not fluent in all of the ten or so languages showcased on the site. This means that although the sample code works, a Ruby pro might look at the code and see a more idiomatic way of testing an empty value or iterating a structure. If some idiomatic improvements jump out at you, please shoot me a comment.

Please note that usually you will choose to perform only one of the six tasks in the code, so your own code will be much shorter.


Click to Show / Hide code
or leave the site to view an online demo
subject = 'Jane"" ""Tarzan12"" Tarzan11@Tarzan22 {4 Tarzan34}'
regex = /{[^}]+}|"Tarzan\d+"|(Tarzan\d+)/
# put Group 1 captures in an array
group1Caps = []
subject.scan(regex) {|m|
	group1Caps << $1 if !$1.nil?
}

######## The six main tasks we're likely to have ########

# Task 1: Is there a match?
puts("*** Is there a Match? ***")
if group1Caps.length > 0
	puts "Yes"
else
	puts "No"
end	

# Task 2: How many matches are there?
puts "\n*** Number of Matches ***"
puts group1Caps.length

# Task 3: What is the first match?
puts "\n*** First Match ***"
if group1Caps.length > 0 
	puts group1Caps[0]
end	

# Task 4: What are all the matches?
puts "\n*** Matches ***"
if group1Caps.length > 0 
	group1Caps.each { |x| puts x }
end	

# Task 5: Replace the matches
replaced = subject.gsub(regex) {|m| 
	if $1.nil?
		m
	else
		"Superman"
	end
	}
puts "\n*** Replacements ***"
puts replaced

# Task 6: Split
# Start by replacing by something distinctive,
# as in Step 5. Then split.
splits = replaced.split(/Superman/)
puts "\n*** Splits ***"
splits.each { |x| puts x }

Read the explanation or jump to the article's Table of Contents





Smiles,

Rex



Be the First to Leave a Comment






All comments are moderated.
Link spammers, this won't work for you.

To prevent automatic spam, we require that you type the two words below before you submit your comment.