Wednesday, 18 January 2012

Introduction to Regular Expressions in Oracle

Regular expressions – RegEx or RegExp to their friends – were fashionably late in coming to the Oracle party. The seeds of what we know today as regular expressions were formulated in the period immediately following the Second World War in fields as diverse as formal language theory and neurophysiology; but it wasn’t until the PL/SQL Web toolkit arrived for 8i and 9i that they popped their heads over the Oracle parapet. By this time they were already rock stars in the worlds of Unix and Perl.

But I’m being rude here; I should introduce you. Regular expressions are a codified means to accomplish flexible pattern-matching in strings of text.

And, if you’re rereading that definition and thinking ‘huh?!’, that’s a feeling you might want to get used to, because, while regular expressions are powerful and definitely useful, they can seem very much like hieroglyphics to the uninitiated. But don’t worry, I’ll hold your hand.

String pattern matching has always been possible in Oracle sql and pl/sql, of course. With the LIKE condition and the simple metacharacters of “%” and “_”, character patterns could be described and integrated into queries. However, the moment you required a little complexity these basic wildcards cowered in fear. An example: my name is David, but friends tend to call me Dave when they have a beer or two in them. To match either spelling of my name, not using regular expressions, you would have to query the table for ‘Dav%’. But then your results might be flooded with Davidsons and Davises and Davinas. And I’m definitely not pretty enough to be called Davina!

No comments: