Unicode Regular Expressions in PHP

Recently I wanted to perform a regular expression match in PHP to match all printable characters. I used the character class [:print:] to do this. My PHP test code was

[cc lang=”php”] preg_match(“/^[[:print:]]*$/”, “abcde”)[/cc]

Although this worked, it didn’t work for non-ASCII characters, e.g. French characters with accent marks like réseau. What I needed was the ability for preg_match to match all unicode printable characters. It turns out there is a modifier (/u) that supports this. But, I also had to use a special unicode character class so my test code became

[cc lang=”php”]preg_match(“/^P{C}+$/u”, “réseau”)[/cc]

P{C} basically matches everything EXCEPT control characters in any language.

You can find more info in the Regular Expressions Cookbook by O’reilly in chapter Unicode Code Points, Properties, Blocks, and Scripts.