Difficulty: Beginner
Estimated Time: 10 minutes

Let's dive into the powerful regular expressions and use them for pattern matching.

Regular expressions

Step 1 of 2

RegExp and grep

Let's prepare our file for the search. Create a file named fruits.txt and fill it with the following content (yes there are typos, on purpose)


And remember that grep can be used to search some strings in plain-text data sets like files. The name comes from g/re/p (globally search a regular expression and print) which it does exactly within the plain text. In some cases we will use the egrep, extended version that supports extended regular expression, some of which we will define here.

Let's go over the regular expressions:

  • . (dot) - a single character.

grep peach. fruits.txt

how does that differ from

grep peach fruits.txt

can you notice the difference?

  • ? - the preceding character matches 0 or 1 times only.

Try the same search with question mark egrep peach? fruits.txt

Let's try egrep peach??? fruits.txt

This works, why? Think about it a little bit.

  • * - the preceding character matches 0 or more times.

Let's see the meaning of this in two examples:

grep p fruits.txt


grep p* fruits.txt

Why are the different? Read the definition again and it should be obvious.

  • + - the preceding character matches 1 or more times.

  • {n} - the preceding character matches exactly n times.

  • {n,m} - the preceding character matches at least n times and not more than m times.

  • [agd] - the character is one of those included within the square brackets.

  • [^agd] - the character is not one of those included within the square brackets.

  • [c-f] - the dash within the square brackets operates as a range. In this case it means either the letters c, d, e or f.

Let's see some examples

Try egrep p{3} fruits.txt to find the patterns that have at least three p inside.

Or search for a b or c in the file: egrep [a-c] fruits.txt

Note that [c-f1-9] matches any one of the characters in the ranges c to f and 1 to 9 (takes the union), for instance, [a-z0-9] matches all the lowercase letters or any digit.

Combined sequences of bracketed characters match common word patterns. [Hh][Ee][Yy] matches hey, Hey, HEY, and so on. (Q: How does that differ from [HhEeYy] ?)

Let try it egrep [ao][pr] fruits.txt

  • ^ - matches the beginning of the line (in some cases, negates the meaning of the set, see above for one case).
  • $ - matches the end of the line.
  • \x - matches the character x, where the character's special meaning is stripped by the backslash.
  • \ - matches a backslash (strip the special meaning of the second backslash).

Try egrep 'es$' fruits.txt to find patterns that end with "es"

Some more extended patterns (may be available for non-POSIX compliant systems)

[[:class:]] Matches all the characters defined by a POSIX character class: alnum, alpha, ascii, blank, cntrl, digit, graph, lower, print, punct, space, upper, word and xdigit

grep [[:alnum:]] fruits.txt

This will search for patterns that have alphanumeric characters (here, all will have it). Let's search for digits:

grep [[:digit:]] fruits.txt

This tab will not be visible to users and provides only information to help authors when creating content.

Creating Katacoda Scenarios

Thanks for creating Katacoda scenarios. This tab is designed to help you as an author have quick access the information you need when creating scenarios.

Here are some useful links to get you started.

Running Katacoda Workshops

If you are planning to use Katacoda for workshops, please contact [email protected] to arrange capacity.

Debugging Scenarios

Below is the response from any background scripts run or files uploaded. This stream can aid debugging scenarios.

If you still need assistance, please contact [email protected]