Setting Up PHP to Support UTF-8 / i18n International Characters

Instead of specifically writing string manipulation code that is multibyte-safe, e.g. mb_substr() instead of substr(), you can configure PHP to do this automatically.  Just update the following lines in your php.ini.

mbstring.internal_encoding = UTF-8
mbstring.func_overload = 7
mbstring.strict_detection = On
zend.multibyte = On
zend.script_encoding = UTF-8

mbstring.func_overload will automatically cause any non-multibyte-safe functions to use their multibyte-safe counterparts.

URLs with UTF-8 / Non-ASCII Characters

When determining the URL for a web page, you often want to use keywords that accurately describe the page’s content. Sometimes, these keywords aren’t in English and contain accented characters. One thing you can do is choose one URL to be the canonical URL and create a redirect to that URL from another that contains the ascii-equivalent version of the words, e.g.

Canonical: http://www.somedomain.com/nǐhǎo

Redirects:

  • http://www.somedomain.com/nihao
  • http://www.somedomain.com/你好