SQL Server 2012: Retrieving Regular Expression Groups
I have text in my database in Markdown format. I would like to extract the links and count the number of relevant links I have. I can get a list of text blocks that contain links using a query like this:
SELECT post_text
FROM posts p
WHERE p.body like '%\[%](http%)%' ESCAPE '\'
How can I go to the next step and just extract the link part of the text (the part that is in the brackets)? If I can get this, I can count the number of times this particular link is in my dataset.
Some examples of data:
"Visit [Google](http://google.com)" -> Should return "http://google.com" "Get an [iPhone](http://www.apple.com) (I like it better than Android)" -> Should return "http://www.apple.com" "[Example](http://example.com)" -> Should return "http://example.com" "This is a message" -> Nothing to return on this one, no link "I like cookies (chocolate chip)" -> Nothing to return on this one, no link "[Frank] says 'Hello'" -> Nothing to return on this one, no link
I am using SQL Server 2012 (if there are differences between versions in this regard).
source to share
Assuming the actual data is no more complex than the above examples, this should work without resorting to RegEx:
DECLARE @posts TABLE
(
post_id INT NOT NULL IDENTITY(1, 1),
post_text NVARCHAR(4000) NOT NULL,
body NVARCHAR(2048) NULL
);
INSERT INTO @posts (post_text, body) VALUES (N'first',
N'Visit [Google](http://google.com)');
INSERT INTO @posts (post_text, body) VALUES (N'second',
N'Get an [iPhone](http://www.apple.com)');
INSERT INTO @posts (post_text, body) VALUES (N'third',
N'[Example](http://example.com)');
INSERT INTO @posts (post_text, body) VALUES (N'fourth',
N'This is a message');
INSERT INTO @posts (post_text, body) VALUES (N'fifth',
N'I like cookies (chocolate chip)');
INSERT INTO @posts (post_text, body) VALUES (N'sixth',
N'[Frankie] says ''Relax''');
INSERT INTO @posts (post_text, body) VALUES (N'seventh',
NULL);
SELECT p.post_text,
SUBSTRING(
p.body,
CHARINDEX(N'](', p.body) + 2,
CHARINDEX(N')', p.body) - (CHARINDEX(N'](', p.body) + 2)
) AS [URL]
FROM @posts p
WHERE p.body like '%\[%](http%)%' ESCAPE '\';
Output:
post_text URL
first http://google.com
second http://www.apple.com
third http://example.com
PS:
If you really want to use regular expressions, they can only be executed through SQLCLR. You can write your own or download ready-made libraries. I wrote one such library, SQL # , which has a free version that includes RegEx functions. But they should only be used if a T-SQL solution cannot be found, which is not the case yet.
source to share