Assigning to data.frame with `with`

Here's an example that gets assigned in two different ways, one that works and one that doesn't:

library(datasets)
dat <- as.data.frame(ChickWeight)
dat$test1 <- with(dat, Time + weight)
with(dat, test2 <- Time + weight)
> colnames(dat)
[1] "weight" "Time"   "Chick"  "Diet"   "test1" 

      

I am used to this behavior. Perhaps more surprising is that it test2

just disappears (instead of as expected in the base environment as you would expect):

> ls(pattern="test")
character(0)

      

Note that with a sufficiently simple ^ H ^ H ^ H ^ H ^ H ^ H a short function:

function (data, expr, ...) 
eval(substitute(expr), data, enclos = parent.frame())

      

First, let's reproduce with functionality:

eval( substitute(Time+weight), envir=dat, enclos=parent.frame() )

      

Now test another corpus:

testEnv <- new.env()
eval( substitute(test3 <- Time+weight), envir=dat, enclos=testEnv )
ls( envir=testEnv )

      

That is not assigned anywhere else. This disproves my guess that it has to do with the environment being discarded, and rather points to something more fundamental to the argument ,enclos

without doing what I think it does.

I am curious about the mechanics , why this happens, and if there is an alternative that allows assignment.

+3


source to share


3 answers


Change with

to within

. with

is intended only to make variables available, not to change them.

Edit: For development, I believe that and with

, and within

create a new environment and populate it with a given list-like object (like a dataframe) and then evaluate the given expression inside that environment.The difference is that it with

returns the result of the expression and discards the environment. but it within

returns the environment (converts back to whatever class it was originally, for example data.frame). In any case, any assignments made inside the expression are presumably done inside the generated environment, which is discarded with

. This explains why it is test2

nowhere to be found after execution with(dat, test2 <- Time + weight)

.

Note that since it within

returns the changed environment instead of editing it (i.e. semantics on call), you need to do dat <- within(dat, test2 <- Time + weight)

.

If you want a function to perform an assignment to the current environment (or any given environment) take a look assign

.

Edit 2: The modern answer is to cover the tidyverse and use magrittr and dplyr:

library(datasets)
library(dplyr)
library(magrittr)
dat <- as.data.frame(ChickWeight)
dat %<>% mutate(test1 = Time + weight)

      



The last line is equivalent

dat <- dat %>% mutate(test1 = Time + weight)

      

which in turn is equivalent to

dat <- mutate(dat, test1 = Time + weight)

      

Use whichever of the last three lines matters most to you.

+4


source


Inspired that the following works from the command line ...

eval(substitute(test <- Time + weight, dat))

      

... I put together the following which seems to work.



myWith <- function(DAT, expr) {
    X <- call("eval", 
              call("substitute", substitute(expr), DAT))
    eval(X, parent.frame())
}

## Trying it out
dat <- as.data.frame(ChickWeight)
myWith(dat, test <- Time + weight)
head(test)
# [1]  42  53  63  70  84 103

      

(The tricky aspect of this problem is that we need substitute()

to look for characters in one environment (the current frame), and the "outer" is eval()

assigned to a different environment (the parent frame).)

+1


source


I understand that this is being made too complicated. Both values with

and within

return values ​​calculated by operations on named data columns. If you don't assign anything to them, the value will be garbage collected. The usual way to store tehn is to assign a named object, or perhaps a component of an object with an operator <-

. within

returns the entire dataframe, whereas it with

only returns the vector that was calculated from any operations performed on the column names. You can of course use assign

instead <-

, but I think overuse of this function can cloud rather than clarify the code. The difference in usage is just an assignment of an input data frame or just a column:

 dat <- within(dat, newcol <- oldcol1*oldcol2)
 dat$newcol <- with(dat,  oldcol1*oldcol2)

      

+1


source







All Articles