I have a column which contains codes and descriptions and need to split them.

Multiple codes may or may not reside in the field.

Eg;

"CSxxxxx description text here"

"WSxxxxx description text here"

"CSxxxxx, WSxxxxx, CSxxxxx description text here"

If there are multiple codes, then the description should be repeated, so the output for row 3 above would be 3 lines inserted into a secondary table.

The codes, for the moment, would always be in the format of:

FixedLetter-FixedLetter-Number-Number-Number-Number-Number 

OR

FixedLetter-FixedLetter-Number-Number-Number-Number

I can write the regex to identify the first couple, but i'm not familiar with cutting this from the string and inserting to another table?

Thanks, Dave

    
Do you have a table of codes that you are looking for? – Gordon Linoff 1 hour ago
    
Yeah, there are 3 current variants at the moment but needs to be scale-able as expect more. Work with CSxxxxx, WSxxxxx and CLxxxx for the moment where x is a number 0-9. So a regex pattern for cs/ws would be ((?:C|W)S)[0-9][0-9][0-9][0-9][0-9] – Dave Leech 1 hour ago
    
SQL Server does not support regex in TSQL. You could write a CLR function that does this and uses regex though. This will be simpler and better performing than a SQL solution. – Martin Smith 1 hour ago
    
Thanks, I'm not so hot with regex nor CLR functions - just knew roughly want i needed to achieve. I don't have to worry too much about performance, its not got much to parse and will be part of an overnight maintenance job on a non-production environment – Dave Leech 1 hour ago

2 Answers 2

Another option is with a CROSS APPLY in concert with a little XML as the parser (assuming not 2016)

Example

Declare @YourTable Table (ID int,[SomeCol] varchar(100))
Insert Into @YourTable Values 
 (1,'CS12345 description text here for id 1')
,(2,'WS67890 description text here for id 2')
,(3,'CS23456, WS34567, CS45678 description text here for id 3')

 ;with cte as (
     Select A.ID
          ,B.*
     From @YourTable A
     Cross Apply (
                    Select RetSeq = Row_Number() over (Order By (Select null))
                          ,RetVal = LTrim(RTrim(B.i.value('(./text())[1]', 'varchar(max)')))
                    From  (Select x = Cast('<x>' + replace(A.SomeCol,' ','</x><x>')+'</x>' as xml).query('.')) as A 
                    Cross Apply x.nodes('x') AS B(i)
                 ) B
 )
Select ID
      ,Code  = replace(RetVal,',','')
      ,Descr = Stuff((Select ' ' +RetVal From cte Where A.ID=ID and RetVal Not Like '%[0-9][0-9][0-9][0-9]%' Order By RetSeq  For XML Path ('')),1,1,'')
 From  cte A
 Where RetVal Like '%[0-9][0-9][0-9][0-9]%'

Returns

ID  Code      Descr
1   CS12345   description text here for id 1
2   WS67890   description text here for id 2
3   CS23456   description text here for id 3
3   WS34567   description text here for id 3
3   CS45678   description text here for id 3
    
Currently 2008r2, but will need to be 2016 compatible - what makes this inappropriate for 2016 please? – Dave Leech 1 hour ago
    
Also, this wouldn't return the description, so more wizardry would be required there i guess – Dave Leech 1 hour ago
    
@DaveLeech This works with 2008+. IF you had 2016, you could use the split_string() function – John Cappelletti 1 hour ago
    
understood thanks – Dave Leech 1 hour ago
1  
Done thanks, still getting to grips with the site, I'll back track at some point and up vote others – Dave Leech 48 mins ago

The only way I can think of doing this "easily" is by splitting the string. SQL Server 2016 has a string split function. You can find one on the web by Googling "SQL Server split".

Then it would be something like this:

select t.*, c.code
from t outer apply
     (select
      from dbo.split(textcol, ' ')) s(code)
      where replace(code, ',', '') like '[A-Z][A-Z][0-9][0-9][0-9][0-9][0-9]' or
            replace(code, ',', '') like '[A-Z][A-Z][0-9][0-9][0-9][0-9]'
     ) c(code);

Note that this should work for the examples in your question. It might not work on all your data. Text is never clean.

    
Thanks but this doesn't seem to play ball with the split function i found online, not sure if its your code or an incompatibility though at this point - probably the latter! – Dave Leech 1 hour ago