Custom Validations in Rails

Rails validations are nifty, but they can get pretty bulky and difficult to read. In this article we’ll look at how to counteract the bulk with custom validations that enable the use of class methods with clearer names like validates_as_person_name.

Contents:


Why Use Custom Validations?

With many models, it seems like the majority of validation statements end up using validates_format_of so we can explictly white list characters and patterns. We end up using regular expressions, which provide the swiss army knife of functionality, but unless you’re a serious gearhead with the capacity to instantly recognize the the net result of a lengthy regex, they can really slow down your ability to read the code and know exactly what’s going on. Along with that, we end up using custom messages to explain to the user what’s legal for input, which might help us avoid deciphering the regex, but it makes the whole validation declaration much bigger.

I wanted a way to reduce the bulk of these validations, making them easier to write, and more importantly, easier to read. I ended up doing this in stages, and ultimately in learning how to write custom validations in a Railsy syntax.

To give you an idea of the “bulk” issue, let’s look at a case that got me started on this process.

This is the standard out-of-the-box code I needed:

validates_format_of     :userFltrName,	
                        :with => /^[a-zA-Z0-9\_\-\ ]*?$/,
                        :message => 'accepts only letters, 0-9, underscore, hyphen, and space'

validates_format_of     :userFltrTable, 
                        :with => /^[a-zA-Z0-9\_]*?$/,
                        :message => 'accepts only letters, 0-9, and underscore'

validates_format_of     :userFltrField,
                        :with => /^[a-zA-Z0-9\_]*?$/,
                        :message => 'accepts only letters, 0-9, and underscore'

validates_inclusion_of  :userFltrOp,
                        :in => %w{eq bw ew cn lte lt gte gt btw},
                        :message => 'has a value other than the valid options below'

To me, that’s pretty bulky. You might have noticed right away that there’s some repetition in the regex and the messages. That’s to be expected, so the first thing that could be done is to create some variables or constants for these repeated items. If we do that, we could end up with something like this:

validates_format_of     :userFltrName,	
                        :with => is_alpha_numeric_separator,
                        :message => is_alpha_numeric_separator_msg

validates_format_of     :userFltrTable, 
                        :with => is_alpha_numeric_underscore, 
                        :message => is_alpha_numeric_underscore_msg

validates_format_of     :userFltrField, 
                        :with => is_alpha_numeric_underscore, 	
                        :message => is_alpha_numeric_underscore_msg

validates_inclusion_of  :userFltrOp, 	
                        :in => %w{eq bw ew cn lte lt gte gt btw}, 
                        :message => is_not_from_options_msg

Well, honestly, in total, that’s not a lot better, though it is arguably easier to glean the intent of the code. I suppose we could shorten those variable names (“is_alpha_num_us_msg”), but we have to admit they’re pretty clear as they are, and the agile crowd doesn’t much care for shortened words like that.

So, what’s next? We could probably predefine some hashes to combine the :with and the :message keys, but all this is just leading up to the fact that what we’d really like to write is something like this:

validates_as_alpha_numeric_separator    :userFltrName
validates_as_alpha_numeric_underscore   :userFltrTable, :userFltrField
validates_as_value_list                 :userFltrOp, :in => %w{eq bw ew cn lte lt gte gt btw}

Much better! And, we would probably even assign that value list to a variable, but I’ve left it raw as it’s not really a part of what will be reduced through the use of the custom validation.

Adding class methods to ActivRecord

Let’s look at what it takes to create a single custom validation. Then we’ll look at expanding that to a series of ready-to-use validations available to the whole Rails application.

Rails validation methods are class methods of ActiveRecord. That is, the class ActiveRecord has a module mixed in as class methods to give us the various validates_ capabilities. We want to add some new class methods to that list of capabilities. In OOP-speak, we have to extend the ActiveRecord class. In Ruby this is a fairly easy thing to do.

The basic Ruby pattern for adding new methods to a class is simple to declare a class and a method. If that class already exists, Ruby takes the method in this new declaration and adds it to the existing class. Other other languages would complain about this, but Ruby says, “hey, you’re the boss.” An example of this is nifty one I picked up for generating random numbers

class Numeric 
  def random_id
    # don't focus on this next line
    (1..self).collect { (i = Kernel.rand(62); i += ((i < 10) ? 48 : ((i < 36) ? 55 : 61 ))).chr }.join
  end
end

Numeric is a class already defined by the Ruby language. You’ve probably seen examples in basic descriptions of Ruby code that it can do stuff like 3.times. Well, I use a lot of random strings in my apps, and the above code now allows me to do x = 12.random_id which would make something like “nRa7Gighpw8D” by extending the class Numeric. That simple little bit of code actually adds features to using numbers in Ruby. Pretty cool.

We can do the same thing by extending ActiveRecord. The outline for doing this is a little different because we’re not extending ActiveRecord directly, but rather we’re extending a specific module that’s a couple layers deep in ActiveRecord. So the outline starts like this:

module ActiveRecord
    module Validations
        module ClassMethods

        # we'll add code starting here

        end
    end
end

So, what we’re actually extending is a module called ClassMethods, which is included in a module named Validations, which is included in a module called ActiveRecord. It would take us a too far off track to fully explain that one, but those are Rails source code files defined using standard Ruby techniques called mixins. You should be able to find plenty of online explanations of this technique.

Dissecting the Existing Validator

Before we can write any code for our custom validator, we need to understand how the existing validators work. When a validation is processed, Rails not only makes sure the data matches our requirements, but it manipulates some internal storage of things like error messages. Each validator also has several options. We need to be able to do all these same things somehow.

The source file /rails/activerecord/lib/active_record/validations.rb gives us the information we need. After I studied this file and figured out how the overall process was working, it became clear that we don’t want to replicate everything Rails is doing. That would be a short-sighted move. There’s enough going on that if Rails were to change how it works, or add a feature, we’d be setting ourselves up for a problem in having broken code or missing out on a new feature.

So, the approach were going to take, is to translate our custom validator syntax into the standard syntax and call the existing validators. In pattern-speak were creating an interpreter. Were going to write application code in our custom syntax, but ultimately that syntax gets interpreted into standard Rails syntax for execution.

The Rails code boils down to accepting a list of model attributes, and a hash of options and passing that on to another processor. Well do the same thing. Our syntax will allow the same options by simply accepting them and passing them on. This minimizes our work, and more importantly keeps our custom validator capable of using any new features Rails adds.

The Meat of the Custom Validation

The first step is take our idea of using variables to define the regex and message. These will have to be class variables so that they’re available to the class methods. You might think they could be constants, but were going to do some work later on that will depend on them being variables. With one definition, our code now looks like this:

 1   # /lib/validators.rb
 2   
 3   module ActiveRecord
 4       module Validations
 5           module ClassMethods
 6   #----------------------------------------------------------
 7   
 8   @@is_alpha_numeric_underscore_msg = 'accepts only a-z, A-Z, 0-9, and underscores'
 9   @@is_alpha_numeric_underscore     = /^[a-zA-Z0-9\_]*?$/
10   
11   #----------------------------------------------------------
12           end
13       end
14   end

When you look at the internals of the Rails validators, they’re written to separate the list of model attribute names from the validation options. The attribute names become a simple array named attr_names, and the options a simple hash named configuration. It’s critical to the Rails validators that all the options be listed after the attribute names. Our custom validator will follow this pattern as it will be more familiar to those who know the Rails internals well, and it will also make it easier for our interpreter to build parameters to use with the built-in validation methods.

Our completed validator code looks like this:

 1   # /lib/validators.rb
 2   
 3   module ActiveRecord
 4       module Validations
 5           module ClassMethods
 6   #----------------------------------------------------------
 7 
 8   @@is_alpha_numeric_underscore_msg = 'accepts only a-z, A-Z, 0-9, and underscores'
 9   @@is_alpha_numeric_underscore     = /^[a-zA-Z0-9\_]*?$/
10    
11   #----------------------------------------------------------
12 
13   def validates_as_alpha_numeric_underscore(*attr_names)
14   
15       configuration = {
16           :message   => @@is_alpha_numeric_underscore_msg,
17           :with      => @@is_alpha_numeric_underscore }
18   
19       configuration.update(attr_names.pop) if attr_names.last.is_a?(Hash)
20   
21       validates_format_of attr_names, configuration
22   
23   end
24   
25   #----------------------------------------------------------
26           end
27       end
28   end

In Line 13, the * is significant. This is Ruby’s way of taking all method parameters and bringing them into the method internally as one array object. The name of the object is attr_names (not *attr_names). If we used our custom method like this:

validates_as_alpha_numeric_underscore :field_name, :on => :update

Then, internally attr_names would be an array with two elements like this:

{code=[:field_name, {:on => :update}]}

Lines 15-17 define the options hash configuration with the options that will be the standard functionality of our validates_as_alpha_numeric_underscore validator. We’ve used the variables to make the code here more readable (and for other reasons well get to).

In line 19, we remove the last element from the attr_names array and move it to the configuration hash, but only if the last element of attr_names is in fact a hash itself.

This technique of using * and then the attr_names.pop step allows us to make the method parameters very flexible. We don’t have to define a pattern of inputs, we just bring them all into an array.

In line 21, we call the built-in validates_format_of class method. The parameters will consist of an array of attribute names, and a hash of options.

Let’s see what happens using the example above:

validates_as_alpha_numeric_underscore :field_name, :on => :update

We know that inside our custom method, those two parameters will get pulled in as a two-element array. Line 15-17 will start a new local variable for configuration. In line 19, .pop will remove the {mono={:on=>:update}} hash, and .update will add each key-value pair to the configuration hash. That leaves us with variables like this:

attr_names = [:field_name]
configuration = {
   :message => Is_alpha_numeric_underscore_msg,
   :with => Is_alpha_numeric_underscore,
   :on => :update }

When line 21 calls the built-in validator, the variables represent a call which looks exactly like what we would write manually:

validates_format_of :field_name,
   :message => Is_alpha_numeric_underscore_msg,
   :with => Is_alpha_numeric_underscore,
   :on => :update

Adding Multiple Validators

Lets add a couple more useful validators:

 1   # /lib/validators.rb
 2   
 3   module ActiveRecord
 4       module Validations
 5           module ClassMethods
 6   #----------------------------------------------------------
 7   
 8   @@is_alpha_numeric_underscore_msg = 'accepts only a-z, A-Z, 0-9, and underscores'
 9   @@is_alpha_numeric_underscore     = /^[a-zA-Z0-9\_]*?$/
10 
11   @@is_person_name_msg              = 'accepts only a-z, A-Z, hyphens, spaces, apostrophes, and periods'
12   @@is_person_name                  = /^[a-zA-Z\.\'\-\ ]*?$/
13 
14   @@is_email_address_msg            = 'must contain an @ symbol, at least one period after the @, and one letter in each segment'
15   @@is_email_address                = /^[A-Z0-9._%-]+@[A-Z0-9._%-]+\.[A-Z]{2,4}$/i
16 
17   #----------------------------------------------------------
18 
19   def validates_as_alpha_numeric_underscore(*attr_names)
20       configuration = {
21           :message   => @@is_alpha_numeric_underscore_msg,
22           :with      => @@is_alpha_numeric_underscore }
23       configuration.update(attr_names.pop) if attr_names.last.is_a?(Hash)
24       validates_format_of attr_names, configuration
25   end
26   
27   def validates_as_person_name(*attr_names)
28       configuration = {
29           :message   => @@is_person_name_msg,
30           :with      => @@is_person_name }
31       configuration.update(attr_names.pop) if attr_names.last.is_a?(Hash)
32       validates_format_of attr_names, configuration
33   end
34   
35   def validates_as_email_address(*attr_names)
36       configuration = {
37           :message   => @@is_email_address_msg,
38           :with      => @@is_email_address }
39       configuration.update(attr_names.pop) if attr_names.last.is_a?(Hash)
40       validates_format_of attr_names, configuration
41   end
42   
43   #----------------------------------------------------------
44           end
45       end
46   end

Hmm, the last two lines of each method are the same. Now, two lines isnt much duplication but it’s enough to irritate the DRY-obsessed, pragmatic agilists out there, so what would it look like to refactor that into a common method?

 1   # /lib/validators.rb
 2 
 3   module ActiveRecord
 4       module Validations
 5           module ClassMethods
 6   #----------------------------------------------------------
 7 
 8   @@is_alpha_numeric_underscore_msg = 'accepts only a-z, A-Z, 0-9, and underscores'
 9   @@is_alpha_numeric_underscore     = /^[a-zA-Z0-9\_]*?$/
10 
11   @@is_person_name_msg              = 'accepts only a-z, A-Z, hyphens, spaces, apostrophes, and periods'
12   @@is_person_name                  = /^[a-zA-Z\.\'\-\ ]*?$/
13 
14   @@is_email_address_msg            = 'must contain an @ symbol, at least one period after the @, and one letter in each segment'
15   @@is_email_address                = /^[A-Z0-9._%-]+@[A-Z0-9._%-]+\.[A-Z]{2,4}$/i
16 
17   #----------------------------------------------------------
18   
19   def do_as_format_of(attr_names, configuration)
20     configuration.update(attr_names.pop) if attr_names.last.is_a?(Hash)
21     validates_format_of attr_names, configuration
22   end
23 
24   def validates_as_alpha_numeric_underscore(*attr_names)
25       configuration = {
26           :message   => @@is_alpha_numeric_underscore_msg,
27           :with      => @@is_alpha_numeric_underscore }
28       do_as_format_of(attr_names, configuration)
29   end
30   
31   def validates_as_person_name(*attr_names)
32       configuration = {
33           :message   => @@is_person_name_msg,
34           :with      => @@is_person_name }
35       do_as_format_of(attr_names, configuration)
36   end
37   
38   def validates_as_email_address(*attr_names)
39       configuration = {
40           :message   => @@is_email_address_msg,
41           :with      => @@is_email_address }
42       do_as_format_of(attr_names, configuration)
43   end
44   
45   #----------------------------------------------------------
46           end
47       end
48   end

So, we saved one line per method. Nothing to get too excited about, but...it does offer us a nice platform for adding other features that would end up being common to multiple validators. Turns out I have just such a feature I wanted to add to my custom validations, so the refactoring was useful in the long run for me (which we explore in a section below).

Technically, I have now covered all the details of creating custom validations (except for installation), but I have two more topics to add. One is (should be) universal to all applications, which is allowing UTF-8 characters in form inputs, and the other is how I customized error messages to allow for dynamic text in them.

Allowing UTF-8 Characters

So far we have done the American-centric version of allowing letters a-z, and A-Z in the sample validations. Well, that’s just plain lazy...and aggravating to folks that need to use accented characters for names, addresses, or whatever. We need to allow these characters. I managed to figure it out to a point, but admittedly not for all unicode characters.

If we revisit the validation regex for is_alpha_numeric_underscore, we can update it to the following:

/^[a-zA-Z0-9#{"\303\200"}-#{"\303\226"}#{"\303\231"}-#{"\303\266"}#{"\303\271"}-#{"\303\277"}\_]*?$/u

The ranges are allowing the common western accented characters like é ä ö etc. Notice that the regex uses the /u flag at the end. That’s important to not forget. The way the ranges are being defined is by declaring the unicode characters in octal notation and having Ruby convert that to a string for the regular expression notation using #{}.

It took me a couple days of blog reading, experimenting, and posting to the talk lists to get this one figured out. If you’re hunting for more info on UTF-8 in Ruby, try this thread from the Ruby talk list.

Additionally, in /config/environment.rb you need to add the following to the top of that file:

$KCODE = 'UTF8'

With KCODE, the use of /u in regex, and the octal notation, you can write validations that allow at least some of the more common unicode characters.

Making Error Messages Dynamic

The basic Rails validation error message doesn’t account for cases in forms where I have several fields under one label. The classic case is first name and last name of a person. If there’s an error in one of the fields, the “This field...” prefix doesnt work well. Which field?

So, I like to have the option of including a label in the error message that identifies the field. I pass this as an option to the validator:

validates_as_alpha_numeric_underscore :first_name, :label = 'First Name'

Now, for any given form input, I might use a label, and I might not. So, that means I need a way to have the message adapt to that dynamic. Since this is something I need available to any validation, it fit nicely inside that do_as_format_of method we extracted.

The task is to determine is a :label option was passed, and if so, change the wording of the message to include it. That makes the do_as_format_of code look like this:

 1   def do_as_format_of(attr_names, configuration)
 2       configuration.update(attr_names.pop) if attr_names.last.is_a?(Hash)
 3       if configuration.has_key?(:label)
 4           msg_string = "The field <span class=\"inputErrorFieldName\">#{configuration[:label]}</span> #{configuration[:message]}"
 5       else
 6           msg_string = "This field #{configuration[:message]}"
 7       end
 8       configuration.store(:message, msg_string)
 9       configuration.delete(:label)
10       validates_format_of attr_names, configuration
11   end

on the availability of the :label option. Line 8 overwrites the existing :message key in the configuration hash, and since Rails has no use for the :label option, we remove it just to clean up our litter.

When a :label is passed, the error message can start something like “The field First Name...” to be more specific.

There are other opportunities for using this dynamic manipulation of error messages. If you wanted to be ultra user friend, you could identify exactly what’s wrong with an entry and generate a message like “The field Sales Price cannot include $ characters.” for when someone tries to enter a $ in a numeric field.

I’ll leave it to you to think of other possibilities.

My Library of Custom Validations

  • validates_as_person_name
  • validates_as_business_name
  • validates_as_street_address
  • validates_as_alpha
  • validates_as_alpha_space
  • validates_as_alpha_hyphen
  • validates_as_alpha_underscore
  • validates_as_alpha_symbol
  • validates_as_alpha_separator
  • validates_as_alpha_numeric
  • validates_as_alpha_numeric_space
  • validates_as_alpha_numeric_hyphen
  • validates_as_alpha_numeric_underscore
  • validates_as_alpha_numeric_symbol
  • validates_as_alpha_numeric_separator
  • validates_as_numeric
  • validates_as_decimal
  • validates_as_positive_decimal
  • validates_as_integer
  • validates_as_positive_integer
  • validates_as_email
  • validates_as_value_list

I suppose some people might think it is easier for them to just use validates_format_of, but when you consider the aspect of allowing UTF-8 characters, and that the above method names are fairly easy to remember, I think they make a lot of sense. They reduce the bulk, and IMO they’re clearer to read than the raw regex.

Even if you don’t care for these validations, I hope I’ve helped you to create your own.

Installing into a Rails Application

Put the validator file into the /lib/ folder of your Rails application. Then add a line at the bottom of the /config/environment.rb file like this:

require 'validators'