F #, R provider, R tm package and (almost) Ovid example

Remember that I'm new to F # and R (more than that), so maybe point me to RTFM or otherwise; -)

I started looking into some data mining with R and the tm package.

I have the following script in R which by the way is very similar to the example for parsing Ovid (replace "lgtext" with "txt" for a real example and language = "no" with language = "lat" run it using the Ovid example) :


txt <- system.file("texts", "lgtextfull", package = "tm")
(lgorg <- VCorpus(DirSource(txt, encoding = "UTF-8"), 
          readerControl = list(language = "no")))

lg <- tm_map(lgorg , stripWhitespace)


So, as a starter, I went for F #, R, Deedle and RPRovider. I haven't used Deedle, but it can be ignored ...

I tried to write the following F #:

#I "../packages/RProvider.1.0.17/"

#load "RProvider.fsx"

open RProvider
open RDotNet

open RProvider.``base``
open RProvider.tm
open RProvider.openNLP
open RProvider.SnowballC

let txt = R.system_file("texts", "lgtextfull", package = "tm", lib_loc = null, mustWork=true )
let lang =  dict [("language", "no":>obj)]
let readerControl = R.list(lang)
let dirsource = R.DirSource(txt, encoding = "UTF-8")

let lgorg = R.VCorpus(dirsource, readerControl)

let lg =  R.tm__map(lgorg, R.stripWhitespace)


The reason for the "extension" R script is so much for me to understand and make it work.

After some of them go back and forth, this works and appears to work and reports to the REPL the same as it did in R, with the last line canceled:

let lg =  R.tm__map(lgorg, R.stripWhitespace)


Which gives an error like:

System.Exception: No converter registered for type FSI_0006+lg@81 or any of its base types
   at RProvider.RInteropInternal.convertToR@164.Invoke(String message) in c:\Tomas\Public\FSharp.RProvider\src\RProvider\RInterop.fs:line 164
   at Microsoft.FSharp.Core.PrintfImpl.go@523-3[b,c,d](String fmt, Int32 len, FSharpFunc`2 outputChar, FSharpFunc`2 outa, b os, FSharpFunc`2 finalize, FSharpList`1 args, Int32 i)
   at Microsoft.FSharp.Core.PrintfImpl.run@521[b,c,d](FSharpFunc`2 initialize, String fmt, Int32 len, FSharpList`1 args)
   at Microsoft.FSharp.Core.PrintfImpl.capture@540[b,c,d](FSharpFunc`2 initialize, String fmt, Int32 len, FSharpList`1 args, Type ty, Int32 i)
   at <StartupCode$FSharp-Core>.$Reflect.Invoke@720-4.Invoke(T1 inp)
   at RProvider.RInteropInternal.REngine.SetValue(REngine this, Object value, FSharpOption`1 symbolName) in c:\Tomas\Public\FSharp.RProvider\src\RProvider\RInterop.fs:line 274
   at RProvider.RInteropInternal.toR(Object value) in c:\Tomas\Public\FSharp.RProvider\src\RProvider\RInterop.fs:line 287
   at RProvider.RInterop.passArg@431(List`1 tempSymbols, Object arg) in c:\Tomas\Public\FSharp.RProvider\src\RProvider\RInterop.fs:line 445
   at RProvider.RInterop.argList@452-1.GenerateNext(IEnumerable`1& next) in c:\Tomas\Public\FSharp.RProvider\src\RProvider\RInterop.fs:line 453
   at Microsoft.FSharp.Core.CompilerServices.GeneratedSequenceBase`1.MoveNextImpl()
   at Microsoft.FSharp.Core.CompilerServices.GeneratedSequenceBase`1.System-Collections-IEnumerator-MoveNext()
   at Microsoft.FSharp.Collections.SeqModule.ToArray[T](IEnumerable`1 source)
   at RProvider.RInterop.callFunc(String packageName, String funcName, IEnumerable`1 argsByName, Object[] varArgs) in c:\Tomas\Public\FSharp.RProvider\src\RProvider\RInterop.fs:line 450
   at RProvider.RInterop.call(String packageName, String funcName, String serializedRVal, Object[] namedArgs, Object[] varArgs) in c:\Tomas\Public\FSharp.RProvider\src\RProvider\RInterop.fs:line 494
   at <StartupCode$FSI_0006>.$FSI_0006.main@() in C:\Users\helgeu\Documents\Visual Studio 2012\Projects\DisqusAnalyzer\DisqusAnalyzer.Lib\InteractiveSession.fsx:line 81
Stopped due to error


I must admit I don't understand anything about this and google.com doesn't help me either :-)

Anyone? Any pointers? Should this work? Am I doing it wrong?


source to share

2 answers

I suspect the problem is that the second parameter tm__map

is an R function. When you write R.stripWhitespace

, you get an F # closure that we can't convert back to an R function.

A workaround could be to evaluate an expression that returns the R function as SymbolicExpression

and then passes it as an argument:

let stripWhite = R.eval(R.parse(text="stripWhitespace"))
let lg =  R.tm__map(lgorg, stripWhite)




The solution provided by Thomas Petricek does not work because the stripWhitespace function comes from the tm package. To make this work, you must use the fully qualified name of the function:

let stripWhitespace= R.eval(R.parse(text="tm::stripWhitespace"))
let lg = R.tm__map(lgorg, stripWhitespace)


This will do the trick.



All Articles