Lua compliance

I have a file that I am getting from pulling values ​​from a Microsoft Lync session that has RTF formatting tags. An example file will look like this:

{\ rtf1 \ fbidis \ ansi \ ansicpg1252 \ deff0 \ nouicompat \ deflang1033 {\ fonttbl {\ f0 \ fnil \ fcharset0> Segoe UI;} {\ f1 \ fnil Segoe UI;}} {\ colortbl; \ red0 \ green0 \ blue0 ;} {* \ generator Riched20 15.0.4420} {* \ mmathPr \ mwrapIndent1440} \ viewkind4 \ uc1 \ pard \ cf1 \ embo \ f0 \ fs20 Craig ... \ embo0 \ embo please \ embo0 \ embo close \ embo0 \ embo > out \ embo0 \ embo \ embo0 \ embo \ embo0 \ embo old \ embo0 \ embo client \ embo0 \ embo> and \ embo0 \ embo re-open \ embo0 \ f1 \ par {* \ lyncflags rtf = 1}}

With Lua scripting, I'm trying to remove the RTF tags and just pull out the conversation text. So the result of my function should be:

Craig ... please close your old client and open

I tried using string.gsub with regex to match patterns and replace them with blank space to leave the text, but it doesn't work. Here is the code I have before the .gsub line:

result = string.gsub(s, "\{\*?\\[^{}]+}|[{}]|\\\n?[A-Za-z]+\n?(?:-?\d+)?[ ]?", " ")

      

Any suggestions would be greatly appreciated!

Additionally:

user1@capital.com @ 2013-01-18 17: 48: 03Z (TO: user2@capital.com )

{\ rtf1 \ fbidis \ ansi \ ansicpg1252 \ deff0 \ nouicompat \ deflang1033 {\ fonttbl {\ f0 \ fnil \ fcharset0 Segoe UI;} {\ f1 \ fnil Segoe UI;}} {\ colortbl; \ red0 \ green0 \ blue0; } {* \ generator Riched20 15.0.4420} {* \ mmathPr \ mwrapIndent1440} \ viewkind4 \ uc1 \ pard \ cf1 \ embo \ f0 \ fs20 works \ embo0 \ embo for \ embo0 \ embo me .. \ embo0 \ embo like \ embo0 \ embo about \ embo0 \ embo embedding \ embo0 \ embo pictures? \ embo0 \ f1 \ par {* \ lyncflags rtf = 1}}

user1@capital.com @ 2013-01-18 17: 48: 57Z (TO: user2@capital.com )

{\ rtf1 \ fbidis \ ansi \ ansicpg1252 \ deff0 \ nouicompat \ deflang1033 {\ fonttbl {\ f0 \ fnil \ fcharset0 Segoe UI;} {\ f1 \ fnil Segoe UI;}} {\ colortbl; \ red0 \ green0 \ blue0; } {* \ generator Riched20 15.0.4420} {* \ mmathPr \ mwrapIndent1440} \ viewkind4 \ uc1 \ pard \ cf1 \ embo \ f0 \ fs20 I \ embo0 \ embo see \ embo0 \ embo it \ embo0 \ f1 \ par {* \ lyncflags rtf = 1}}

user1@capital.com @ 2013-01-18 17: 49: 27Z (TO: user2@capital.com )

{\ rtf1 \ fbidis \ ansi \ ansicpg1252 \ deff0 \ nouicompat \ deflang1033 {\ fonttbl {\ f0 \ fnil \ fcharset0 Segoe UI;} {\ f1 \ fnil Segoe UI;}} {\ colortbl; \ red0 \ green0 \ blue0; } {* \ generator Riched20 15.0.4420} {* \ mmathPr \ mwrapIndent1440} \ viewkind4 \ uc1 \ pard \ cf1 \ embo \ f0 \ fs20 \ let's embo0 \ embo try \ embo0 \ embo \ embo0 \ embo meeting. \ embo0 \ f1 \ par {* \ lyncflags rtf = 1}}

+3


source to share


2 answers


Lua templates do not have operators or

( |

) or optional grouping ( (?:...)?

). Something like this might work:

s:match("{(.+)}"):gsub("%b{}", ""):gsub("\\%w+", "")

      

will return:



"    Craig...  please  close  >out  of  your  old  client  >and  re-open "

      

The first one gsub

removes all pairs {}

along with their content, the second one gsub

removes all rtf tags (although some of them seem to allow spaces in them, so you might need to customize the template).

+2


source


Try it:

local s = '{\rtf1\fbidis\ansi\ansicpg1252\deff0\nouicompat\deflang1033{\fonttbl{\f0\fnil\fcharset0 >Segoe UI;}{\f1\fnil Segoe UI;}} {\colortbl ;\red0\green0\blue0;} {*\generator Riched20 15.0.4420}{*\mmathPr\mwrapIndent1440 }\viewkind4\uc1 \pard\cf1\embo\f0\fs20 Craig...\embo0 \embo please\embo0 \embo close\embo0 \embo >out\embo0 \embo of\embo0 \embo your\embo0 \embo old\embo0 \embo client\embo0 \embo >and\embo0 \embo re-open\embo0\f1\par {*\lyncflags rtf=1}}\n'
    ..'{\rtf1\fbidis\ansi\ansicpg1252\deff0\nouicompat\deflang1033{\fonttbl{\f0\fnil\fcharset0 Segoe UI;}{\f1\fnil Segoe UI;}} {\colortbl ;\red0\green0\blue0;} {*\generator Riched20 15.0.4420}{*\mmathPr\mwrapIndent1440 }\viewkind4\uc1 \pard\cf1\embo\f0\fs20 works\embo0 \embo for\embo0 \embo me..\embo0 \embo how\embo0 \embo about\embo0 \embo embedding\embo0 \embo pictures?\embo0\f1\par {*\lyncflags rtf=1}}\n'
    ..'{\rtf1\fbidis\ansi\ansicpg1252\deff0\nouicompat\deflang1033{\fonttbl{\f0\fnil\fcharset0 Segoe UI;}{\f1\fnil Segoe UI;}} {\colortbl ;\red0\green0\blue0;} {*\generator Riched20 15.0.4420}{*\mmathPr\mwrapIndent1440 }\viewkind4\uc1 \pard\cf1\embo\f0\fs20 I\embo0 \embo see\embo0 \embo it\embo0\f1\par {*\lyncflags rtf=1}}\n'
    ..'{\rtf1\fbidis\ansi\ansicpg1252\deff0\nouicompat\deflang1033{\fonttbl{\f0\fnil\fcharset0 Segoe UI;}{\f1\fnil Segoe UI;}} {\colortbl ;\red0\green0\blue0;} {*\generator Riched20 15.0.4420}{*\mmathPr\mwrapIndent1440 }\viewkind4\uc1 \pard\cf1\embo\f0\fs20 let\'s\embo0 \embo try\embo0 \embo a\embo0 \embo meeting.\embo0\f1\par {*\lyncflags rtf=1}}\n'
local text = string.gsub(s, '{(.-)}[}]?', ''):gsub('embo',''):gsub('0',''):gsub('iewkind4uc1 pardcf1',''):gsub('1par',''):gsub('s2',''):gsub('>','')
print(text)

      



output

Craig... please close out of your old client and re-open
works for me.. how about embedding pictures?
see it
let try a meeting.


0


source







All Articles