Getting a String before another in MySQL
The other day I had a bit of a fun problem to solve. We had some data in MySQL, all stored, unfortunately, in a denormalized, and hell, blobby way. So instead of seperated fields for a set of data points, they were all stuck into a single text column. The problem to solve was to take a string to find, and then return it and the string before it until it found another keyword.
So for an example let's say we have:
<blob> <blobThing> <data>This is just some dummy data</data> <tidbit>This is some cool data</tidbit> </blobThing> <blobThing> <data>This is just some other dummy data</data> <tidbit>This is some lame data</tidbit> </blobThing> <blobThing> <data>This is just some more dummy data</data> <tidbit>This is some other data</tidbit> </blobThing> </blob>
And yeah, XML? In a database? Terrible and horrible idea I'm aware, but hey, when you're given stuff to deal with, you gotta just deal with it sometimes.
If we wanted to extract the data
and its tidbit
based on information in
the tidbit
, we would need to find the tidbit, then backtrack to the
closest opening data
tag. Unfortunately, MySQL doesn't really have that
ability in one handy function like it does other things. Still, with
some creative use of a few of the other string functions, we can arrive
at a solution that works well enough to get by in our extraction.
Let's say we want the middle data
and tidbit
, we know that the tidbit
has "lame data", so we can find that using the LOCATE if we wanted to,
but I'm going to (out of preference really) use the INSTR method since
it reads easier to me to have the subject as the first argument:
INSTR(myColumn, 'lame data</tidbit>');
From here, we now face the real crux of the issue, which is, how do we get
the stuff before the index we just found? If we use SUBSTR
now we'll only
get up to what we found with INSTR
:
<blob> <blobThing> <data>This is just some dummy data</data> <tidbit>This is some cool data</tidbit> </blobThing> <blobThing> <data>This is just some other dummy data</data> <tidbit>This is some lame data</tidbit>
We can't use INSTR(myColumn, <data>)
to find the previous data
element
since that would find us the first on in the string. So what we need to
do is make the previous data
element the first data
element in the
string. To do this, we just have to think and understand that if we
consider that we've cut our string down to the last tidbit
we needed,
then if we flipped the string, the first open data
element would be the
one we need!
REVERSE( SUBSTRING( myColumn FROM 1 FOR INSTR(myColumn, 'lame data</tidbit>') + LENGTH('lame data</tidbit>') -1 ) )
Doing something like the above gets us the slightly uncomprehensible:
>tibdit/<atad emal emos si sihT>tibdit< >atad/<atad ymmud rehto emos tsuj si sihT>atad< >gnihTbolb< >gnihTbolb/< >tibdit/<atad looc emos si sihT>tibdit< >atad/<atad ymmud emos tsuj si sihT>atad< >gnihTbolb< >bolb<
Now, to find the first open data element we need to use INSTR
with the
reversed string of what we want:
INSTR(<the above string>, REVERSE('<data>'))
This gives us the index within the reversed substring we need to get
to. To extract the text from our reversed string we can then use
SUBSTRING
and the indices we now have:
SUBSTRING( REVERSE( SUBSTRING( myColumn FROM 1 FOR INSTR(myColumn, 'lame data</tidbit>') + LENGTH('lame data</tidbit>') -1 ) ), -- This is Just our reversed bit from before 1, -- Start at the beginning of the reversed string because it starts at what we found before INSTR( REVERSE( SUBSTRING( myColumn FROM 1 FOR INSTR(myColumn, 'lame data</tidbit>') + LENGTH('lame data</tidbit>') -1 ) ), REVERSE('<data>') -- Grab the index of where the first open data element before our lame data</tidbit> is ) + LENGTH('<data>') -- Capture the tag as well )
We're close now, the above will net you:
>tibdit/<atad emal emos si sihT>tibdit< >atad/<atad ymmud rehto emos tsuj si sihT>
Which just needs to be put through REVERSE
<data>This is just some other dummy data</data> <tidbit>This is some lame data</tidbit>
And there you have it! Of course, this works well with data like XML or other things you probably really shouldn't be storing in a database anyway. But like I said, beggers can't be choosers. You could still use this to deal with non structured data as well if you know your data well. For example:
Once upon a time there was a fish, and it was really neat and I have no idea what to put here but hey it's dummy data!
in a column could have the last part pulled out with:
REVERSE( SUBSTRING( REVERSE( SUBSTRING( myColumn FROM 1 FOR INSTR(myColumn, 'dummy data') + LENGTH('dummy data') -1 ) ), 1, INSTR( REVERSE( SUBSTRING( myColumn FROM 1 FOR INSTR(myColumn, 'dummy data') + LENGTH('dummy data') -1 ) ), REVERSE('I ') ) + LENGTH('I ') ) )
To get:
I have no idea what to put here but hey it's dummy data
If the string you're looking for isn't in the data, then you're going to get
back some funny results trying to do this whole thing all at once. So it would
be better to write a stored procedure or function that would handle the cases
where the INSTR
methods don't find what you're looking for, like this:
DELIMITER ;; CREATE FUNCTION ExtractStringBeforeUntil (findThisStr TEXT, rewindUntilStr TEXT, inSubject TEXT) RETURNS TEXT DETERMINISTIC BEGIN -- Good Arguments? IF(rewindUntilStr IS NULL OR findThisStr IS NULL OR inSubject IS NULL) THEN RETURN NULL; END IF; -- Do the strings even exist? SET @foundStrLocation = INSTR(inSubject, findThisStr); SET @rewindExists = INSTR(inSubject, rewindUntilStr); IF( @foundStrLocation = 0 OR @rewindExists = 0) THEN RETURN NULL; END IF; -- Make the reversing part a bit easier to read -- Also helps us not to compute this more than once SET @partialData = REVERSE( SUBSTRING( inSubject FROM 1 FOR @foundStrLocation + LENGTH(findThisStr) -1 ) ); RETURN REVERSE( SUBSTRING( @partialData, 1, INSTR( @partialData, REVERSE(rewindUntilStr) ) + LENGTH(rewindUntilStr) ) ); END;; DELIMITER ;
Which we can then use whereever we need to!