Conditional Regex Replacement in Text Editor


Often, the need arises to replace matches with different strings depending on the match itself. For instance, let's say you want to match words for colors, such as red and blue, and replace them with their French equivalents—rouge and bleu.

Therefore, the string:
blue cheese, a red nose
should turn into:
bleu cheese, a rouge nose

Using regex, this is no problem is most programming languages, where you can call a function to compute replacements. (Depending on context , such functions may be called lambdas, delegates or callbacks.)

In fact, if you wanted you could compute replacements by talking to a NASA server and requesting a piece of data from a machine on the moon. From this light, regex replacements are really flexible. You can do whatever you like.

In a text editor, however, it's a different story. You can insert new text, you can insert text matched by capture groups, but there is no syntax for conditional insertion, e.g, inserting bleu if you matched blue—at least not in any of the tools I know. The purpose of this page is to show you a trick I came up with that allows you to do just that.

Note that this technique will only work in flavors that allow you to set a capture within a lookahead. You'll also need an editor with strong regex capabilities, such as EditPad Pro or Notepad++.

I'll show you two similar idea (using a replacement pool or a dictionary) and some variations.


(direct link)

Conditional Replacement using a Replacement Pool

In this version, at the bottom of the file, we temporarily paste a "pool" with the possible replacement texts. Keeping our example, we just paste bleu rouge.

In the editor, the text looks like this:

blue cheese, a red nose.
bleu rouge


We then use a pattern that matches blue or red. If it matches blue, we lookahead for bleu, which we know we'll find at the pool in the bottom, and capture it. We do the same for red. In free-spacing mode, the regex looks like this:

(?sx) \bblue\b(?=.*(bleu)) | \bred\b(?=.*(rouge))


Or, on one line:

(?s)\bblue\b(?=.*(bleu))|\bred\b(?=.*(rouge))

What do we replace our matches with? In the blue match case, the replacement is captured to Group 1. In the red case, it is captured to Group 2. When one group is set, the other is empty, so gluing them together with \1\2 just results in the one that is set:

bleu + "" yields bleu
"" + rouge yields rouge

Following this principle, if we had five replacements, our replacement string would be \1\2\3\4\5

Here's an online demo.

(direct link)
Variation: branch reset
In regex flavors that support the (?|...) branch reset syntax, you can capture the replacements to a unique group, so the replacement string becomes a simple \1

In the regex, you just need to wrap the alternation in a branch reset:

(?sx) (?| # branch reset: both captures go to Group 1 \bblue\b(?=.*(bleu)) | \bred\b(?=.*(rouge)) )


Even if you have five replacements, the replacement string will still be \1. Here's an online demo.


(direct link)

Conditional Replacement using a Dictionary

In this version of my conditional replacement trick, instead of pasting a replacement pool at the bottom, we paste a "dictionary".

Dictionary:blue=bleu:red=rouge:green=vert


My choice of the term "dictionary" is not innocent. Of course in this case we have a dictionary in the everyday sense. But this could also be a dictionary in the computing sense, i.e. a data structure that contains pairs of unique keys and not-necessarily unique values. In some languages, this is called a hash table or an associative array.

Our text input becomes something like this:

blue cheese, a red nose.
Dictionary:blue=bleu:red=rouge:green=vert


For our regex, let's start with this:

(?s)\b(blue|red)\b(?=.*:\1=(\w+)\b)

This matches either color, then looks further in the file for a dictionary entry of the form :original=translation, capturing the translation to Group 2. Our replacement is therefore \2 (here's a demo).

Of course if there's a chance that the actual text would contain segments that look like dictionary entries, the regex would have to be refined.


(direct link)
Variation when matches are dense (full translation)
In the previous pattern, we specifically look for the literals blue and red because we do not want to give the engine the burden of looking up every word in the dictionary. However, when nearly every word in your file is a match to be translated, including every word in the regex becomes burdensome. Instead, we can simplify the regex by just matching any word:

(?s)\b(\w+)\b(?=.*:\1=(\w+)\b)

The replacement is still \2. Here's a demo.


(direct link)

Example with Ten Replacements: Translating Japanese Digits

Several years after writing this trick, this very question came up on the RegexBuddy forum. In Japanese, digits can either be represented with the native Kanji characters (imported from Chinese), or with roman numerals. There are Unicode code points dedicated to the roman numerals in Japanese, separate from their ASCII code points, and the goal of the question was to translate these Japanese code points to their ASCII counterparts.

This seems like a perfect chance to showcase the technique with more than the two replacements of our simple example.

The four versions are shown. This is the exact same technique as above, so no explanation is needed. (If you have a question, please use the form at the bottom.)

Here is the text to be transformed, and the desired output. You might not see the difference until you stare at the shape of the digits.

==== Original ====
0 zero 1 ichi 2 ni 3 san 4 shi 5 go 6 roku 7 shichi 8 hachi 9 kyuu

==== Desired Output ====
0 zero 1 ichi 2 ni 3 san 4 shi 5 go 6 roku 7 shichi 8 hachi 9 kyuu

1. Pool trick
At the bottom of the text, we paste this pool: 0123456789

Our first regex is:

(?sx) 0(?=.*(0)) |1(?=.*(1)) |2(?=.*(2)) |3(?=.*(3)) |4(?=.*(4)) |5(?=.*(5)) |6(?=.*(6)) |7(?=.*(7)) |8(?=.*(8)) |9(?=.*(9))

Our replacement is \1\2\3\4\5\6\7\8\9${10}
See online demo.


2. Pool trick, branch reset version
With a branch reset, our replacement is just \1

The regex becomes:

(?sx) (?| 0(?=.*(0)) |1(?=.*(1)) |2(?=.*(2)) |3(?=.*(3)) |4(?=.*(4)) |5(?=.*(5)) |6(?=.*(6)) |7(?=.*(7)) |8(?=.*(8)) |9(?=.*(9)) )


See online demo.


3. Dictionary trick, specific matches
For the dictionary trick, we'll paste this at the bottom of our text:
Dictionary:0=0:1=1:2=2:3=3:4=4:5=5:6=6:7=7:8=8:9=9

We can use this regex:
(?sx)(0|1|2|3|4|5|6|7|8|9) (?=.*:\1=(\w+)\b)


For the replacement, we use \2. See online demo.


4. Dictionary trick, variable matches
Here we simplify the regex to:

(?sx)(\b\w+\b) (?=.*:\1=(\w+)\b)


Note that the \w+ caters for cases where the dictionary contains more than digits. If we know the dictionary only contains digits, we can use:

(?sx)(\p{Nd}) (?=.*:\1=(\w+)\b)


The replacement is still \2. See online demo.






Be the First to Leave a Comment






All comments are moderated.
Link spammers, this won't work for you.

To prevent automatic spam, we require that you type the two words below before you submit your comment.