Skip to Main Content

Accented Characters

I encountered some strange behaviour when trying to remove all characters from a varchar2, except a-z, A-Z and 0-9.
Using regular expressions, you can define that in a number of ways, such as [^A-Za-z0-9] or [^[:alnum:]]

Example 1 works as expected (see the query below):
For A1 the ë (ascii 235) is not included, since it falls outside of the [^A-Za-z0-9] range.
For A2 the ë is included, since it is first converted to a regular e character.

Example 2 is (a dummy version of) the exact same query on another (also 11gR2) database and client. Here, the result for A1 was a bit different:
For A1 the ë was included.
A2 was the same.

Probably the characterset settings of the databases and/or clients were the cause? I have included the info gathered by PL/SQL Developer for both setups.
My favourite online regular expressions tool regex101.com agrees that example 1 is as expected.

Accented Characters Example 1

A1A2
Ab3X1Ab3eX1
Character size: 4 byte(s)
CharSetID: 873
NCharSetID: 2000
Unicode Support: True
NLS_LANG: AMERICAN_AMERICA.WE8MSWIN1252
NLS_CHARACTERSET: AL32UTF8
NLS_NCHAR_CHARACTERSET: AL16UTF16

Accented Characters Example 2

A1A2
Ab3ëX1Ab3eX1
Character size: 1 byte(s)
CharSetID: 31
NCharSetID: 2000
Unicode Support: True
NLS_LANG: DUTCH_THE NETHERLANDS.WE8MSWIN1252
NLS_NCHAR_CHARACTERSET: AL16UTF16
NLS_CHARACTERSET: WE8ISO8859P1

Value Pairs

A simple example how to split string like 'ABC:123;XYZ:789' into multiple value pairs.

Value Pairs Example

RValue 1Value 2
ABC:123ABC123
ABCD:1234ABCD1234
WXYZ:6789WXYZ6789
XYZ:789XYZ789

Phone Numbers

Creating a simple check for phone numbers was not as easy as I expected.
Getting it to work in regex101.com was easy, but that expression was not handled as expected when using it in the Oracle database.
I did get it to work by changing the expression, but I'm still not sure why the first version didn't work.

Phone Numbers Example

TelR_1R_2
088 223 22 3301
+31 (0)88 - 223 22 33 01
088-223 22 3301
088223223301
abc 12300

Email

Maybe even worse than phone numbers are email addresses.

The email address examples listed in https://en.wikipedia.org/wiki/Email_address were an eye-opener for me - I had no idea there were so many types of valid email addresses.
Well, in theory anyway, because a lot of the more exotic ones will not be accepted in a lot of places.
To check which regular expression to use for email validation, I applied quite a lot of them to the list from Wikipedia. See the report below - 0 means the validation failed, 1 means it passed.
Most of these regular expressions come from various websites I found, but I did come up with some of them myself.
As you may expect, there is no "best solution" really. None of the expressions I found give a 100% accurate score.

So for everyday use, I would recommend option x15 - or maybe x14 if you also want to include "local" email addresses like user@server01.
Those should cover over 99% of the valid email addresses you will encounter in the real world.

Email Example

SEX OrgX1X2X3X4X6X7X10X11X12X13X14X15
1simple@example.com1101111111111
2very.common@example.com1101111111111
3FirstName.LastName@EasierReading.org1101011111111
4x@example.com1101111111111
5long.email-address-with-hyphens@and.subdomains.example.com1101111111111
6user.name+tag+sorting@example.com1100110111111
7name/surname@example.com1000000011111
8admin@example1000000000110
9example@s.example1000110111111
10" "@example.org1000000001111
11"john..doe"@example.org1000000011111
12mailhost!username@example.org1000000011111
13"very.(),:;<>[]\".VERY.\"very@\\ \"very\".unusual"@strange.example.com1000000001100
14user%example.com@example.org1100100111111
15user-@example.org1101111111111
16postmaster@[123.123.123.123]1000001011111
17postmaster@[IPv6:2001:0db8:85a3:0000:0000:8a2e:0370:7334]1000000000110
18_test@[IPv6:2001:0db8:85a3:0000:0000:8a2e:0370:7334]1000000000110
19I❤️CHOCOLATE@example.com1000000011111
20abc.example.com0000000000000
21a@b@c@example.com0000000011100
22a"b(c)d,e:f;g<h>i[j\k]l@example.com0000000011111
23just"not"right@example.com0000000011111
24this is"not\allowed@example.com0000000001111
25this\ still\"not\\allowed@example.com0000000001111
261234567890123456789012345678901234567890123456789012345678901234+x@example.com0100110111111
27i.like.underscores@but_they_are_not_allowed_in_this_part0000000000110

Code

Region

Identification
Sequence
Title
Type
Source
Source Type
Region Source

Region

Identification
Sequence
Title
Type
Source
Source Type
Region Source

Region

Identification
Sequence
Title
Type
Source
Source Type
Region Source

Region

Identification
Sequence
Title
Type
Source
Source Type
Region Source

Region

Identification
Sequence
Title
Type
Source
Source Type
Region Source