Saturday, March 20, 2010

Tutorial: Validating User's Arabic Input using Regular Expressions in PHP

In PHP, Regular Expressions can be used to match a string to a specific pattern, this can be used to validate user's input to avoid certain security problems such as spoofed form submissions and cross-site scripting (see http://phpsec.org/projects/guide/), this can be done using preg_match() function

Here I'll talk only about using preg_match() to validate arabic input, for an explanation of regular expressions in PHP, see http://www.phpro.org/tutorials/Introduction-to-PHP-Regex.html

First Method you can use to make regular expressions include arabic is using \p{Arabic}, this makes preg_match() accept arabic strings,

that's not completely true :D

The previous paragraph is maybe what you would get if you searched the Internet for arabic patterns, but it needs following modifications to work correctly:

  • use /u modifier so the pattern will be treated as UTF-8 (you can make it for all your patterns since it works only for alphanumeric characters) >> '/\p{Arabic}/u'

  • use the HTML META tag so that Request parameters (GET & POST) will be encoded as unicode (the charset attribute should be UTF-8)

those two steps are important to make your validation work

another method is using arabic characters unicode codes, you can use /\x{****}/u to represent any unicode character, also /[\x{****}-\x{----}]/u makes the pattern support any character/symbol which its unicode lies between **** and ----, but do not forget the /u modifier and setting charset to UTF-8