Assigning to data.frame with `with`
Here's an example that gets assigned in two different ways, one that works and one that doesn't:
library(datasets)
dat <- as.data.frame(ChickWeight)
dat$test1 <- with(dat, Time + weight)
with(dat, test2 <- Time + weight)
> colnames(dat)
[1] "weight" "Time" "Chick" "Diet" "test1"
I am used to this behavior. Perhaps more surprising is that it test2
just disappears (instead of as expected in the base environment as you would expect):
> ls(pattern="test")
character(0)
Note that with a sufficiently simple ^ H ^ H ^ H ^ H ^ H ^ H a short function:
function (data, expr, ...)
eval(substitute(expr), data, enclos = parent.frame())
First, let's reproduce with functionality:
eval( substitute(Time+weight), envir=dat, enclos=parent.frame() )
Now test another corpus:
testEnv <- new.env()
eval( substitute(test3 <- Time+weight), envir=dat, enclos=testEnv )
ls( envir=testEnv )
That is not assigned anywhere else. This disproves my guess that it has to do with the environment being discarded, and rather points to something more fundamental to the argument ,enclos
without doing what I think it does.
I am curious about the mechanics , why this happens, and if there is an alternative that allows assignment.
source to share
Change with
to within
. with
is intended only to make variables available, not to change them.
Edit: For development, I believe that and with
, and within
create a new environment and populate it with a given list-like object (like a dataframe) and then evaluate the given expression inside that environment.The difference is that it with
returns the result of the expression and discards the environment. but it within
returns the environment (converts back to whatever class it was originally, for example data.frame). In any case, any assignments made inside the expression are presumably done inside the generated environment, which is discarded with
. This explains why it is test2
nowhere to be found after execution with(dat, test2 <- Time + weight)
.
Note that since it within
returns the changed environment instead of editing it (i.e. semantics on call), you need to do dat <- within(dat, test2 <- Time + weight)
.
If you want a function to perform an assignment to the current environment (or any given environment) take a look assign
.
Edit 2: The modern answer is to cover the tidyverse and use magrittr and dplyr:
library(datasets)
library(dplyr)
library(magrittr)
dat <- as.data.frame(ChickWeight)
dat %<>% mutate(test1 = Time + weight)
The last line is equivalent
dat <- dat %>% mutate(test1 = Time + weight)
which in turn is equivalent to
dat <- mutate(dat, test1 = Time + weight)
Use whichever of the last three lines matters most to you.
source to share
Inspired that the following works from the command line ...
eval(substitute(test <- Time + weight, dat))
... I put together the following which seems to work.
myWith <- function(DAT, expr) {
X <- call("eval",
call("substitute", substitute(expr), DAT))
eval(X, parent.frame())
}
## Trying it out
dat <- as.data.frame(ChickWeight)
myWith(dat, test <- Time + weight)
head(test)
# [1] 42 53 63 70 84 103
(The tricky aspect of this problem is that we need substitute()
to look for characters in one environment (the current frame), and the "outer" is eval()
assigned to a different environment (the parent frame).)
source to share
I understand that this is being made too complicated. Both values with
and within
return values calculated by operations on named data columns. If you don't assign anything to them, the value will be garbage collected. The usual way to store tehn is to assign a named object, or perhaps a component of an object with an operator <-
. within
returns the entire dataframe, whereas it with
only returns the vector that was calculated from any operations performed on the column names. You can of course use assign
instead <-
, but I think overuse of this function can cloud rather than clarify the code. The difference in usage is just an assignment of an input data frame or just a column:
dat <- within(dat, newcol <- oldcol1*oldcol2)
dat$newcol <- with(dat, oldcol1*oldcol2)
source to share