NetBulge.com

Semi-automatic form validation

This article can be found online:
http://www.netbulge.com/index.php?session=0&action=read&click=open&article=1137750491


If you’ve got a website, you’ve probably got one or more forms on it. Since visitors usually don’t want to take the time to fill out the full form - and we do want all the obligatory information - we all have to use form validation. Me personally, I like to do my validation server-side, because if a visitor really wants to annoy you he (or she for PC’s sake) will disable client-side scripts. Yes, that is right, they do it; I’ve seen them do it. All your hard work and all your scripting goes down the drain and you are left empty-handed if you used them. I’ve worked a lot with forms, and I’ve come up with my own little standard way of doing this. The example I’ll give is in PHP since there’s an area for it on this website.

So how are we going to do this then?

First, let’s create a form. For ease of use, I’ll use a simple version of the registration form on this site. If you’re anything like me (chances are you are, because else you wouldn’t be interested in this), you have already control-clicked the link to the right of this text. I’ll just use the following fields:

So how do we want to validate these fields? There are a couple of things we can say or ask about each of them:

Let’s get to coding!

Now that we know that about each form field, what I want to do is get typing. First things first, we check if there’s postdata. After that, I set a variable to true so that we can know in our form when it doesn’t validate. What I’ve also done, is create one big array to hold some properties for each field. As you can see, in this array there is a sub-array for each form field. The key of each sub-array is its field name. Let me explain by example what the elements in the sub-array are. There is a comment above the array naming everything, so I’ll refer to that.

The first element in each sub-array is the default value. Look at the Website field for example. It has the default value of “http://” because that’s what the value in the form field is set to. We need to know this, because we are going to check their values. Next are the value and clean value, which are empty right now. We’ll get those values in a second, so we can do stuff with them. Next element is a boolean, where we say whether or not the field has to be filled in before accepting the form. You can see that login and e-mail are required, but the name and surname are not because in this case, they’re not that important.

When we loop through the fields later, we will need to store in the array if they are valid or not, and that’s exactly what approved is for. Right now, nothing is valid because we haven’t checked it yet. The last one is also the trickiest: the regular expression. I’m going to use a regular expression to check each and every value. Remember, although I have the possibility to break a string in 4 pieces here, in your code, you can not!

<?

  // If there's any postdata from a form, let's see if we can process it.
  if ($_POST) {
   
    $Posted = true;

    // Fieldname Array(Default, Value, CleanValue, Required, Approved, Regular expression)   
    $RegistrationForm = Array(
    "Login" =>     Array("",   "",   "",  true,    false, "/^[0-9A-Za-z\-_ ]{3,12}$/"), 
    "Name" =>      Array("",   "",   "",  false,   false, "/^[A-Za-z\- ]{2,50}$/"),
    "Surname" =>   Array("",   "",   "",  false,   false, "/^[A-Za-z\- ]{2,50}$/"),
    "Email" =>     Array("",   "",   "",  true,    false, "/^[0-9A-Za-z_\-\.]+
      @([A-Za-z][0-9A-Za-z_\-]*\.)*[A-Za-z][0-9A-Za-z_\-]*\.
      (aero|biz|cat|com|coop|edu|gov|info|int|jobs|mil|mobi|
      museum|name|net|org|pro|travel|arpa|[A-Za-z]{2})$/"),
	
    "Website" =>   Array("http://", "", "", false, false, "/^http:\/\/
      ([A-Za-z][0-9A-Za-z_\-]*\.)*[A-Za-z][0-9A-Za-z_\-]*\.
      (aero|biz|cat|com|coop|edu|gov|info|int|jobs|mil|mobi|
      museum|name|net|org|pro|travel|arpa|[A-Za-z]{2})$/"),

    "Password" =>  Array("",   "",   "",  true,    false, "/^[0-9A-Za-z-_ ]{3,12}$/"),  
    "ShownName" => Array("",   "",   "",  true,    false, "/^(Login|Name)$/")
    );
  ...
  }

  ?>

 

As you can see, I’ve put the properties of each form field in the array. This means that if we need to add another field, all we have to do is add it to the HTML page and to the array - no additional coding will be required. As the title of this article says, it’s semi-automatic. We could make it fully automatic, but that would require a lot of work, and this way it’s easy, fast, efficient and above all, manageable.

Regular expressions?

As you have seen, we’ll be using regular expressions to test the field values. We’ll do this by using a PHP function that can take a string and a regular expression and determine if they match. Logically, this function is called preg_match. But more about the expressions themselves – how do we use them? I won’t give you a full tutorial since it is a quite complicated matter, but I’ll explain by example: I’ll use the regular expression that we will be using for the login field, and the one for the e-mail field.

What we want: a login needs to be between 3 and 12 characters, and only numbers (0-9), letters (a-z) and a dash (-), underscore (_) and space are allowed. So how do we tell that in a regular expression? Easy! First, let me tell you that the / at the beginning and end of our regular expression are just delimiters, nothing more. After that, we see that the regular expression is enclosed by ^ and $. These mean that the regular expression goes from the beginning (symbol ^ at the beginning) all the way to the end (symbol $ at the end) of the string we want to match it with. In other words, the string has to match exactly with the regular expression.

After the ^ symbol, we see some letters, numbers and symbols enclosed by square brackets [ and ]. This is called a character class, and it means that any of the characters inside it can be matched with a single character. Also, 0-9 means any number between 0 and 9, A-Z and a-z mean any letter in the alphabet. Since the dash (-) is reserved for giving a range of characters, we have to escape it with \ (similar to what you’d do to insert a quote symbol into a MySQL database). This means that [0-9A-Za-z\-_ ] will match only the characters we will allow! Last, there are two numbers between curly brackets { and }. This simply means there can be between 3 and 12 times the characters preceding the brackets.

 

 // The regular expression for the login 

echo preg_match("login name", "/^[0-9A-Za-z\-_ ]{3,12}$/"); 
// Returns 1 

echo preg_match("#1 login", "/^[0-9A-Za-z\-_ ]{3,12}$/"); 
// Returns 0 

 

When you look at the e-mail, something similar is done. We start the same as last time, only now we don’t allow a space but we do allow a dot (which is also a reserved character, therefore the backslash). Also, there needs to be at least one character before the @ (notice that I haven’t set a maximum this time) and that’s what the plus sign (+) is for. The @ is obligatory, and after it there can be any number of subdomains: they have to be letters, numbers and underscore (_) and dash (-) only, and have to start with a letter and end with a dot. The asterisk (*) means there can be zero or more of them. After that there is exactly one domain name which has the same rules as the subdomains. This is followed by the top level domain, and here you can see a list of top level domain names (aero, biz, etc) separated by | which means that one of those can be chosen. You may have noticed that the last top level domain I give, [A-Za-z]{2}, is for all the top level domains that countries use, since I would not want to list them all.

// The regular expression for the e-mail, again, was too long to fit. 

$regexp = "/^[0-9A-Za-z_\-\.]+@([A-Za-z][0-9A-Za-z_\-]*\.)*
  [A-Za-z][0-9A-Za-z_\-]*\.(aero|biz|cat|com|coop|edu|gov|info|
  int|jobs|mil|mobi|museum|name|net|org|pro|travel|arpa|[A-Za-z]{2})$/"; 
  
echo preg_match("myaddress@mycompany.com", $regexp);
  // Returns 1 

echo preg_match("this_is_no_email.com", $regexp);
  // Returns 0

 

For more information on regular expressions, visit http://www.regular-expressions.info.

 

Finally, we process

What we need to do right now is add the last step: looping through the form fields. We can do this right after we set the array, inside the if-statement. We loop through the main array, and select each sub-array as a key/value pair (the value is an array). As I said, the second element (which is ofcourse element 1 because of the zero-based structure of PHP arrays) is the value from the form, and the third is the clean value. I’ve used a function that cleans the value up (I’ll get to it later).

After that, I check if the field is required. If it is, then it can not be equal to the default value (because then the visitor didn’t fill it in) and it has to match the regular expression (if it doesn’t, the visitor used illegal characters). If the field isn’t required, and NOT equal to the default value, it also has to match the regular expression. Again, this is simply because the visitor may use values you don’t allow them. For example, a visitor might have entered some gibberish in the website form field, and that is useless so we don’t want that. Else, if the value is the default value, and is not required, then it’s valid as well: the visitor didn’t have to fill the form field in, and he (or she) didn’t. That’s perfectly acceptable!

Because we had taken the sub-array and put it into its own little variable, we have to put it back in at the end, or else all our hard work will be in vain.

// Loop through the form
  foreach ($RegistrationForm as $Key => $Element) {
    $Element[1] = $_POST[$Key];
    $Element[2] = CleanValue($Element[1]);

    if ($Element[3]) {
      // If it's required then it can't be default and it must match the regular expression  
      if ($Element[1] != $Element[0] && preg_match($Element[5], $Element[1])) {
         $Element[4] = true;
      }
    }
    elseif ($Element[1] != $Element[0] && preg_match($Element[5], $Element[1])) {
      // If it's not required then it must match the regular expression if it's not default  
      $Element[4] = true;
    }
    elseif ($Element[1] == $Element[0]) {
      // Not required and default, it's all good
      $Element[4] = true;
    }

   $RegistrationForm[$Key] = $Element;
  }

 

Is it safe?

I know you’re wondering about that function now. It’s really nothing, and you should know it’s not. All I do here is check if it’s an integer or floating point number and convert it if so. If it’s neither, I assume it’s a string and escape the string to make it suitable for inserting into a MySQL database. If you plan on storing postdata into a database or something similar (perhaps an XML file?) I strongly suggest you take your own measures to protect against injection and annoying stuff like that. The only thing I want to do here, is make sure the function is used for every bit of user input we get, and use the clean values whenever we’ll do something important with the data.

 // Basic function to clean up whatever comes from the postdata array.
 // Adapt it to your own needs, I have NOT tested its security!

  function CleanValue($Value) {
    if (is_int($Value)) {
      return settype($Value, integer);
    }
    elseif (is_float($Value)) {
      return settype($Value, float);
    }
    else {
      return mysql_real_escape_string($Value);
    }
  }

 

But wait, what about the form?

Ah yes, the form. So you’ve found out the user didn’t input what you wanted, but how are you going to present the form? Here’s one form field as an example. This is what it might look like before we make sure the users understand what they’re giving us is just not good enough:

 <form name="Registration" method="post"> 
   <table> 
     <tr> 
       <td>Login</td> 
       <td><input type="text" name="Login" value="" /></td> 
     </tr> 
   </table>
 </form> 
 

 

We need to do two things: we need to make sure the user knows it’s bad, and we need to put the value that was given back into the field so they can see what they did wrong. If the form was posted and the login field was wrong or missing, then we give the label to that field a red color. In any case, if there was postdata, we give them their values back to make it all a bit easier on them.

 <form name="Registration" method="post"> 
   <table> 
     <tr> 
       <td <? if ($Posted && !$RegistrationForm["Login"][4]) { echo "style=\"color: #ff0000;\""; } ?>>Login</td> 
       <td><input type="text" name="Login" value="<? if ($Posted) { echo $RegistrationForm["Login"][1]; } ?>" /></td> 
     </tr> 
   </table> 
 </form> 

 

I hope I made life, or at least form validation, that little bit easier for you that you wanted.



Author: NeoTeq
From NetBulge.com