I encountered some strange behaviour when trying to remove all characters from a varchar2, except a-z, A-Z and 0-9.
Using regular expressions, you can define that in a number of ways, such as [^A-Za-z0-9] or [^[:alnum:]]
Example 1 works as expected (see the query below):
For A1 the ë (ascii 235) is not included, since it falls outside of the [^A-Za-z0-9] range.
For A2 the ë is included, since it is first converted to a regular e character.
Example 2 is (a dummy version of) the exact same query on another (also 11gR2) database and client. Here, the result for A1 was a bit different:
For A1 the ë
was included.
A2 was the same.
Probably the characterset settings of the databases and/or clients were the cause? I have included the info gathered by PL/SQL Developer for both setups.
My favourite online regular expressions tool
regex101.com agrees that example 1 is as expected.
Character size: 4 byte(s)
CharSetID: 873
NCharSetID: 2000
Unicode Support: True
NLS_LANG: AMERICAN_AMERICA.WE8MSWIN1252
NLS_CHARACTERSET: AL32UTF8
NLS_NCHAR_CHARACTERSET: AL16UTF16
Character size: 1 byte(s)
CharSetID: 31
NCharSetID: 2000
Unicode Support: True
NLS_LANG: DUTCH_THE NETHERLANDS.WE8MSWIN1252
NLS_NCHAR_CHARACTERSET: AL16UTF16
NLS_CHARACTERSET: WE8ISO8859P1