Lua compliance
I have a file that I am getting from pulling values from a Microsoft Lync session that has RTF formatting tags. An example file will look like this:
{\ rtf1 \ fbidis \ ansi \ ansicpg1252 \ deff0 \ nouicompat \ deflang1033 {\ fonttbl {\ f0 \ fnil \ fcharset0> Segoe UI;} {\ f1 \ fnil Segoe UI;}} {\ colortbl; \ red0 \ green0 \ blue0 ;} {* \ generator Riched20 15.0.4420} {* \ mmathPr \ mwrapIndent1440} \ viewkind4 \ uc1 \ pard \ cf1 \ embo \ f0 \ fs20 Craig ... \ embo0 \ embo please \ embo0 \ embo close \ embo0 \ embo > out \ embo0 \ embo \ embo0 \ embo \ embo0 \ embo old \ embo0 \ embo client \ embo0 \ embo> and \ embo0 \ embo re-open \ embo0 \ f1 \ par {* \ lyncflags rtf = 1}}
With Lua scripting, I'm trying to remove the RTF tags and just pull out the conversation text. So the result of my function should be:
Craig ... please close your old client and open
I tried using string.gsub with regex to match patterns and replace them with blank space to leave the text, but it doesn't work. Here is the code I have before the .gsub line:
result = string.gsub(s, "\{\*?\\[^{}]+}|[{}]|\\\n?[A-Za-z]+\n?(?:-?\d+)?[ ]?", " ")
Any suggestions would be greatly appreciated!
Additionally:
user1@capital.com @ 2013-01-18 17: 48: 03Z (TO: user2@capital.com )
{\ rtf1 \ fbidis \ ansi \ ansicpg1252 \ deff0 \ nouicompat \ deflang1033 {\ fonttbl {\ f0 \ fnil \ fcharset0 Segoe UI;} {\ f1 \ fnil Segoe UI;}} {\ colortbl; \ red0 \ green0 \ blue0; } {* \ generator Riched20 15.0.4420} {* \ mmathPr \ mwrapIndent1440} \ viewkind4 \ uc1 \ pard \ cf1 \ embo \ f0 \ fs20 works \ embo0 \ embo for \ embo0 \ embo me .. \ embo0 \ embo like \ embo0 \ embo about \ embo0 \ embo embedding \ embo0 \ embo pictures? \ embo0 \ f1 \ par {* \ lyncflags rtf = 1}}
user1@capital.com @ 2013-01-18 17: 48: 57Z (TO: user2@capital.com )
{\ rtf1 \ fbidis \ ansi \ ansicpg1252 \ deff0 \ nouicompat \ deflang1033 {\ fonttbl {\ f0 \ fnil \ fcharset0 Segoe UI;} {\ f1 \ fnil Segoe UI;}} {\ colortbl; \ red0 \ green0 \ blue0; } {* \ generator Riched20 15.0.4420} {* \ mmathPr \ mwrapIndent1440} \ viewkind4 \ uc1 \ pard \ cf1 \ embo \ f0 \ fs20 I \ embo0 \ embo see \ embo0 \ embo it \ embo0 \ f1 \ par {* \ lyncflags rtf = 1}}
user1@capital.com @ 2013-01-18 17: 49: 27Z (TO: user2@capital.com )
{\ rtf1 \ fbidis \ ansi \ ansicpg1252 \ deff0 \ nouicompat \ deflang1033 {\ fonttbl {\ f0 \ fnil \ fcharset0 Segoe UI;} {\ f1 \ fnil Segoe UI;}} {\ colortbl; \ red0 \ green0 \ blue0; } {* \ generator Riched20 15.0.4420} {* \ mmathPr \ mwrapIndent1440} \ viewkind4 \ uc1 \ pard \ cf1 \ embo \ f0 \ fs20 \ let's embo0 \ embo try \ embo0 \ embo \ embo0 \ embo meeting. \ embo0 \ f1 \ par {* \ lyncflags rtf = 1}}
source to share
Lua templates do not have operators or
( |
) or optional grouping ( (?:...)?
). Something like this might work:
s:match("{(.+)}"):gsub("%b{}", ""):gsub("\\%w+", "")
will return:
" Craig... please close >out of your old client >and re-open "
The first one gsub
removes all pairs {}
along with their content, the second one gsub
removes all rtf tags (although some of them seem to allow spaces in them, so you might need to customize the template).
source to share
Try it:
local s = '{\rtf1\fbidis\ansi\ansicpg1252\deff0\nouicompat\deflang1033{\fonttbl{\f0\fnil\fcharset0 >Segoe UI;}{\f1\fnil Segoe UI;}} {\colortbl ;\red0\green0\blue0;} {*\generator Riched20 15.0.4420}{*\mmathPr\mwrapIndent1440 }\viewkind4\uc1 \pard\cf1\embo\f0\fs20 Craig...\embo0 \embo please\embo0 \embo close\embo0 \embo >out\embo0 \embo of\embo0 \embo your\embo0 \embo old\embo0 \embo client\embo0 \embo >and\embo0 \embo re-open\embo0\f1\par {*\lyncflags rtf=1}}\n'
..'{\rtf1\fbidis\ansi\ansicpg1252\deff0\nouicompat\deflang1033{\fonttbl{\f0\fnil\fcharset0 Segoe UI;}{\f1\fnil Segoe UI;}} {\colortbl ;\red0\green0\blue0;} {*\generator Riched20 15.0.4420}{*\mmathPr\mwrapIndent1440 }\viewkind4\uc1 \pard\cf1\embo\f0\fs20 works\embo0 \embo for\embo0 \embo me..\embo0 \embo how\embo0 \embo about\embo0 \embo embedding\embo0 \embo pictures?\embo0\f1\par {*\lyncflags rtf=1}}\n'
..'{\rtf1\fbidis\ansi\ansicpg1252\deff0\nouicompat\deflang1033{\fonttbl{\f0\fnil\fcharset0 Segoe UI;}{\f1\fnil Segoe UI;}} {\colortbl ;\red0\green0\blue0;} {*\generator Riched20 15.0.4420}{*\mmathPr\mwrapIndent1440 }\viewkind4\uc1 \pard\cf1\embo\f0\fs20 I\embo0 \embo see\embo0 \embo it\embo0\f1\par {*\lyncflags rtf=1}}\n'
..'{\rtf1\fbidis\ansi\ansicpg1252\deff0\nouicompat\deflang1033{\fonttbl{\f0\fnil\fcharset0 Segoe UI;}{\f1\fnil Segoe UI;}} {\colortbl ;\red0\green0\blue0;} {*\generator Riched20 15.0.4420}{*\mmathPr\mwrapIndent1440 }\viewkind4\uc1 \pard\cf1\embo\f0\fs20 let\'s\embo0 \embo try\embo0 \embo a\embo0 \embo meeting.\embo0\f1\par {*\lyncflags rtf=1}}\n'
local text = string.gsub(s, '{(.-)}[}]?', ''):gsub('embo',''):gsub('0',''):gsub('iewkind4uc1 pardcf1',''):gsub('1par',''):gsub('s2',''):gsub('>','')
print(text)
output
Craig... please close out of your old client and re-open
works for me.. how about embedding pictures?
see it
let try a meeting.
source to share