Removing repeating characters with SQL

A database column may occasionally contain unwanted repeating characters. These may be, for example, multiple spaces or full stops not intentionally inserted into the database. Mistakes and typos do happen, so the data may need some fixing time to time. One old trick to remove repeating spaces is the following simple code snippet. I use period character to better illustrate the process.

UPDATE table_to_fix SET column_to_fix = REPLACE( REPLACE( REPLACE(column_to_fix, '.', '{}'), '}{', ''), '{}', '.') WHERE column_to_fix LIKE '%..%'

This very common way to truncate repeating characters is often neglected. It is very efficient compared to some loop implementations I've seen and is short as well as easy to implement with various database systems. In fact, the sample code above is portable to MySQL and Microsoft SQL Server without changes and perhaps to other systems as well. The magic behind the trick is extremely simple. First the innermost replace() function changes all periods to pair of braces. Take a look at the example below, which has five period characters between the words instead of one.

before: unwanted.....periods after: unwanted{}{}{}{}{}periods

The middle replace() function then replaces all occurrences of braces, which are against each other, with an empty string. So, pairs of "}{" will be removed.

before: unwanted{}{}{}{}{}periods after: unwanted{}periods

Finally, the outermost replace() function replaces the last pair of braces back to one period character.

before: unwanted{}periods after: unwanted.periods

This technique can replace virtually any repeating characters to a single one. In case the data already contains braces, the replace method must use some other characters in the process. Removing repeating characters with this method is astonishing simple.

Julkaistu Mondayna 22.10.2012 klo 18:12 avainsanalla ohjelmointi.

Edellinen
Tulivuoren uhrit
Seuraava
Taiteilijamenu Elitessä