SQL Server 2012: Retrieving Regular Expression Groups

I have text in my database in Markdown format. I would like to extract the links and count the number of relevant links I have. I can get a list of text blocks that contain links using a query like this:

SELECT post_text
FROM posts p
WHERE p.body like '%\[%](http%)%' ESCAPE '\'

      

How can I go to the next step and just extract the link part of the text (the part that is in the brackets)? If I can get this, I can count the number of times this particular link is in my dataset.

Some examples of data:

"Visit [Google](http://google.com)"    -> Should return "http://google.com"
"Get an [iPhone](http://www.apple.com) (I like it better than Android)"   -> Should return "http://www.apple.com"
"[Example](http://example.com)"    -> Should return "http://example.com"
"This is a message"    -> Nothing to return on this one, no link
"I like cookies (chocolate chip)"  -> Nothing to return on this one, no link
"[Frank] says 'Hello'" -> Nothing to return on this one, no link

      

I am using SQL Server 2012 (if there are differences between versions in this regard).

+3


source to share


1 answer


Assuming the actual data is no more complex than the above examples, this should work without resorting to RegEx:

DECLARE @posts TABLE
(
   post_id INT NOT NULL IDENTITY(1, 1),
   post_text NVARCHAR(4000) NOT NULL,
   body NVARCHAR(2048) NULL
);
INSERT INTO @posts (post_text, body) VALUES (N'first',
                                           N'Visit [Google](http://google.com)');
INSERT INTO @posts (post_text, body) VALUES (N'second',
                                           N'Get an [iPhone](http://www.apple.com)');
INSERT INTO @posts (post_text, body) VALUES (N'third',
                                           N'[Example](http://example.com)');
INSERT INTO @posts (post_text, body) VALUES (N'fourth',
                                           N'This is a message');
INSERT INTO @posts (post_text, body) VALUES (N'fifth',
                                           N'I like cookies (chocolate chip)');
INSERT INTO @posts (post_text, body) VALUES (N'sixth',
                                           N'[Frankie] says ''Relax''');
INSERT INTO @posts (post_text, body) VALUES (N'seventh',
                                           NULL);


SELECT p.post_text,
       SUBSTRING(
                  p.body,
                  CHARINDEX(N'](', p.body) + 2,
                  CHARINDEX(N')', p.body) - (CHARINDEX(N'](', p.body) + 2)
                ) AS [URL]
FROM   @posts p
WHERE  p.body like '%\[%](http%)%' ESCAPE '\';

      

Output:



post_text  URL
first      http://google.com
second     http://www.apple.com
third      http://example.com

      

PS:
If you really want to use regular expressions, they can only be executed through SQLCLR. You can write your own or download ready-made libraries. I wrote one such library, SQL # , which has a free version that includes RegEx functions. But they should only be used if a T-SQL solution cannot be found, which is not the case yet.

+6


source







All Articles