F #, R provider, R tm package and (almost) Ovid example
Remember that I'm new to F # and R (more than that), so maybe point me to RTFM or otherwise; -)
I started looking into some data mining with R and the tm package.
I have the following script in R which by the way is very similar to the example for parsing Ovid (replace "lgtext" with "txt" for a real example and language = "no" with language = "lat" run it using the Ovid example) :
library(tm)
library(SnowballC)
txt <- system.file("texts", "lgtextfull", package = "tm")
(lgorg <- VCorpus(DirSource(txt, encoding = "UTF-8"),
readerControl = list(language = "no")))
lg <- tm_map(lgorg , stripWhitespace)
So, as a starter, I went for F #, R, Deedle and RPRovider. I haven't used Deedle, but it can be ignored ...
I tried to write the following F #:
#I "../packages/RProvider.1.0.17/"
#load "RProvider.fsx"
open RProvider
open RDotNet
open RProvider.``base``
open RProvider.tm
open RProvider.openNLP
open RProvider.SnowballC
let txt = R.system_file("texts", "lgtextfull", package = "tm", lib_loc = null, mustWork=true )
let lang = dict [("language", "no":>obj)]
let readerControl = R.list(lang)
let dirsource = R.DirSource(txt, encoding = "UTF-8")
let lgorg = R.VCorpus(dirsource, readerControl)
let lg = R.tm__map(lgorg, R.stripWhitespace)
The reason for the "extension" R script is so much for me to understand and make it work.
After some of them go back and forth, this works and appears to work and reports to the REPL the same as it did in R, with the last line canceled:
let lg = R.tm__map(lgorg, R.stripWhitespace)
Which gives an error like:
System.Exception: No converter registered for type FSI_0006+lg@81 or any of its base types
at RProvider.RInteropInternal.convertToR@164.Invoke(String message) in c:\Tomas\Public\FSharp.RProvider\src\RProvider\RInterop.fs:line 164
at Microsoft.FSharp.Core.PrintfImpl.go@523-3[b,c,d](String fmt, Int32 len, FSharpFunc`2 outputChar, FSharpFunc`2 outa, b os, FSharpFunc`2 finalize, FSharpList`1 args, Int32 i)
at Microsoft.FSharp.Core.PrintfImpl.run@521[b,c,d](FSharpFunc`2 initialize, String fmt, Int32 len, FSharpList`1 args)
at Microsoft.FSharp.Core.PrintfImpl.capture@540[b,c,d](FSharpFunc`2 initialize, String fmt, Int32 len, FSharpList`1 args, Type ty, Int32 i)
at <StartupCode$FSharp-Core>.$Reflect.Invoke@720-4.Invoke(T1 inp)
at RProvider.RInteropInternal.REngine.SetValue(REngine this, Object value, FSharpOption`1 symbolName) in c:\Tomas\Public\FSharp.RProvider\src\RProvider\RInterop.fs:line 274
at RProvider.RInteropInternal.toR(Object value) in c:\Tomas\Public\FSharp.RProvider\src\RProvider\RInterop.fs:line 287
at RProvider.RInterop.passArg@431(List`1 tempSymbols, Object arg) in c:\Tomas\Public\FSharp.RProvider\src\RProvider\RInterop.fs:line 445
at RProvider.RInterop.argList@452-1.GenerateNext(IEnumerable`1& next) in c:\Tomas\Public\FSharp.RProvider\src\RProvider\RInterop.fs:line 453
at Microsoft.FSharp.Core.CompilerServices.GeneratedSequenceBase`1.MoveNextImpl()
at Microsoft.FSharp.Core.CompilerServices.GeneratedSequenceBase`1.System-Collections-IEnumerator-MoveNext()
at Microsoft.FSharp.Collections.SeqModule.ToArray[T](IEnumerable`1 source)
at RProvider.RInterop.callFunc(String packageName, String funcName, IEnumerable`1 argsByName, Object[] varArgs) in c:\Tomas\Public\FSharp.RProvider\src\RProvider\RInterop.fs:line 450
at RProvider.RInterop.call(String packageName, String funcName, String serializedRVal, Object[] namedArgs, Object[] varArgs) in c:\Tomas\Public\FSharp.RProvider\src\RProvider\RInterop.fs:line 494
at <StartupCode$FSI_0006>.$FSI_0006.main@() in C:\Users\helgeu\Documents\Visual Studio 2012\Projects\DisqusAnalyzer\DisqusAnalyzer.Lib\InteractiveSession.fsx:line 81
Stopped due to error
I must admit I don't understand anything about this and google.com doesn't help me either :-)
Anyone? Any pointers? Should this work? Am I doing it wrong?
source to share
I suspect the problem is that the second parameter tm__map
is an R function. When you write R.stripWhitespace
, you get an F # closure that we can't convert back to an R function.
A workaround could be to evaluate an expression that returns the R function as SymbolicExpression
and then passes it as an argument:
let stripWhite = R.eval(R.parse(text="stripWhitespace"))
let lg = R.tm__map(lgorg, stripWhite)
source to share
The solution provided by Thomas Petricek does not work because the stripWhitespace function comes from the tm package. To make this work, you must use the fully qualified name of the function:
let stripWhitespace= R.eval(R.parse(text="tm::stripWhitespace"))
let lg = R.tm__map(lgorg, stripWhitespace)
This will do the trick.
source to share