Grow ffdf data frame gradually

From the save.ffdf documentation:

Using 'save.ffdf automatically sets' finalizers' ff vectors are "closed". This means that the data will be saved to disk when the object is deleted or R sessions are closed. Data can be deleted either with "delete" or by deleting the directory where the object was saved ('dir).

I want to start with a small ffdf dataframe, add a bit of new data at a time, and grow it to disk. So I did a little experiment:

# in R
ffiris = as.ffdf(iris)
save.ffdf(ffiris, dir = "~/Desktop/iris")

# in bash
ls ~/Desktop/iris/
## ffiris$Petal.Length.ff ffiris$Petal.Width.ff  ffiris$Sepal.Length.ff ffiris$Sepal.Width.ff  ffiris$Species.ff

# in R
# add a new column
ffiris =transform(ffiris, new1 = rep(99, nrow(iris)))
rm(ffiris)

# in bash
ls ~/Desktop/iris/
## ffiris$Petal.Length.ff ffiris$Petal.Width.ff  ffiris$Sepal.Length.ff ffiris$Sepal.Width.ff  ffiris$Species.ff

      

Turns out it doesn't automatically update ff data on disk when ffiris is removed. How about saving manually?

# in R
# add a new column
ffiris =transform(ffiris, new1 = rep(99, nrow(iris)))
save.ffdf(ffiris, "~/Desktop/iris")

# in bash
ls ~/Desktop/iris/
## ffiris$Petal.Length.ff ffiris$Petal.Width.ff  ffiris$Sepal.Length.ff ffiris$Sepal.Width.ff  ffiris$Species.ff

      

Hmm, no luck so far. Why?

How do I delete a folder before saving?

# in R
ffiris = as.ffdf(iris)
unlink("~/Desktop/iris", recursive = TRUE, force = TRUE)
save.ffdf(ffiris, "~/Desktop/iris", overwrite = TRUE)
ffiris =transform(ffiris, new1 = rep(99, nrow(iris)))
unlink("~/Desktop/iris", recursive = TRUE, force = TRUE)
save.ffdf(ffiris, "~/Desktop/iris", overwrite = TRUE)

# in bash
ls ~/Desktop/iris/
# ls: /Users/ky/Desktop/iris/: No such file or directory

      

Even a stranger. Even if all of this works, it will still be terribly ineffective. I am looking for something like:

updateOnDisk(ffiris)

      

Can anyone please help?

+3


source to share


1 answer


ff

and ffbase

suggest out-of-memory R-vectors, but introduce reference semantics that can give problems with R-idioms.

R is a functional programming language, meaning that functions do not change parameters and objects, but return modified copies. In ffbase

we are implementing functions in the R path, i.e. transform

returns a copy of the original ffdf data.frame

. This can be seen by looking at the filenames:

ffiris = as.ffdf(iris)
save.ffdf(ffiris, dir = "~/Desktop/iris")
filename(ffiris) # show contents of ~/Desktop/iris

ffiris =transform(ffiris, new1 = 99) # this create a copy of the whole data.frame!
filename(ffiris)  

ffiris$new2 <- ff(rep(99, nrow(iris)))  # this creates a new column, but not yet in the right directory
filename(ffiris)

save.ffdf(ffiris, dir="~/Desktop/iris", overwrite=TRUE) # this fixes that.

      



The conversion is currently inefficient to add a new column as it copies the entire dataframe (this is R semantics). This is because the conversion can be a temporary result and you don't want to change the original data.

In ffbase2 we fix this problem

+1


source







All Articles