Regex Basics in Ruby

Dave Wisecarver
4 min readDec 22, 2020

While learning Ruby, I occasionally run into problems where I need conditional logic to validate user inputs by making sure they meet certain requirements. I have created input fields for things like dates, as well as inputs for Command Line Interface programs where a user could enter a number to select from a list of options displayed in the terminal. As I went about testing the program, I figured there should be a way to ensure that the only acceptable input would be a number.

My first attempt was to just check if the input was an integer with is_a?(Integer)

input = STDIN.gets.chompif input.is_a?(Integer)
puts "Valid"
puts input
else
puts "Invalid entry"
end
>> 4
#=> "4"
#=> Invalid entry

This did not work since inputs using gets.chomp are returned as strings. So I my next strategy was to just convert the input to an integer and see what happened.

input = STDIN.gets.chomp.to_iif input.is_a?(Integer)
puts "Valid"
puts input
else
puts "Invalid entry"
end
>> 4
#=> Valid
>> Four
#=> 0
#=> Valid

It worked! Except now, if the user inputs non-integers, it still comes back as valid. This is happening because converting a string of non-numeric characters to an integer returns 0, which would still pass this test. “0” might still be an input I want available to the user, so I do not necessarily want to say “valid if greater than 0.” Rather than converting anything about the user input, I wanted to find a way to check the input string to confirm that it was a number and what that number was.

input = STDIN.gets.chompif input.match?(/\A\d+\z/)
puts "Valid"
puts input
else
puts "Invalid entry"
end

This crazy looking bit of code is called regex. Short for Regular Expression, regex was originated by mathematician Stephen Cole Kleene as a way to define a search pattern using a specific sequence of characters. It has been adapted over the decades into many unix based operating systems. It ends up being a very powerful way to find patterns in strings, which can be used to compare or extract information in that string.

So here’s a breakdown of what is happening in the example above:

ONLY_POSITIVE_DIGITS = /\A\d+\z//    #every regex begins and ends with a forward slash.\A   #back slash and capital A denotes the start of a string\d+  #back slash and a lowercase d and a plus will look for any positive digit\z   #back slash lowercase z ends the string/

This regular expression can then be checked in Ruby with the match? method which will return true if the regex conditions are all met. Ruby has several other regex specific methods through the class Regexp that can be taken advantage of as well.

What if I wanted to validate a user input for a telephone number? I would want to make sure that only digits were entered into the field, and that the correct number of digits were entered. I would also want to account for white space as well as parentheses and hyphens, so that the user can enter their phone number in several common ways:

(555) 555–5555

(555)555–5555

5555555555

Here is one regular expression to validate phone number entries:

VALID_PHONE_NUMBER = /\A\(?\d{3}\)?[\s.-]?\d{3}[\s.-]?\d{4}\z/
/\A #back slash and capital A denotes the start of a string\(? # the question mark is checking for zero or one of an opening parentheses\d{3}\ # checking for three digits\)? # checking for zero or one of a closing parentheses[\s.-]? # checking for zero or one of either white space, a period, or hyphen\d{3} # checking for three digits again[\s.-]? # checking for zero or one of either white space, a period, or hyphen\d{4}. # checking for four digits\z # ends the string/

As you can see, these regular expressions can get very long and very difficult to read. Fortunately there are good resources available to help construct the regex you might be looking for. Rubular is a great editor that allows you to test out expressions and see if you get the conditions you want.

Here are some of the most common symbols for creating your own regex:

  • ^ — marks the start of a line.
  • $ — marks the end of a line.
  • [xyz] — checks if a single character matches x, y, or z..
  • [a-z] — checks for any letter.
  • \w — checks for any alphanumeric character and underscores.
  • \W — checks for any non-alphanumeric characters.
  • (a|b) — checks for either a or b.

Some other common regular expressions that can be used for validation checks.

VALID_IP_ADDRESS = /^((?:(?:^|\.)(?:\d|[1-9]\d|1\d{2}|2[0-4]\d|25[0-5])){4})$/VALID_EMAIL = /\A[\w+\-.]+@[a-z\d\-]+(\.[a-z\d\-]+)*\.[a-z]+\z/iNO_NUMBERS_OR_SYMBOLS = /^[[:alpha:][:blank:]]+$/

There are also some options available to set at the end of the expression to modify the conditions you might be looking for. By adding these options tags to the end of the regex string you can ignore case sensitivity and white space.

# Options tags\i  # case insensitivity\x  # ignore white space\m  # make a dot match a new line

In addition to checking to see if a string matches certain conditions, Ruby has other regex methods that can be used to scan for specific information out of a string, or split a string based on a regex condition. For example, if you had a string and you only wanted to return the numbers from it, you could use the scan method like so:

def chapter_number(title)
title.scan(/\d+/)
end
>> chapter_number("Chapter 5: Rad chapter")
#=> 5

These examples of regex methods are actually returning a class object called MatchData. The MatchData class will actually encapsulate all the matches found in the regex pattern. This object can then be converted into an array and be iterated through for the results.

Although they can be a bit cryptic and difficult to read, regular expressions can provide a powerful tool for pattern recognition, especially in cases where there are many conditions you would like to validate within a string.

--

--

Dave Wisecarver

Software dev student and free-time game dev. Currently enrolled in Flatiron Software Engineering Program.