R to sqldf
I need to loop over the sqldf statements, and for that I need to call the loop variable inside the sqldf code:
My data tables can be:
data <- read.table(text ="
loaddate DaysRange DaysRangeNext
1 2014-03-16 30 30
2 2014-03-16 0 0
3 2014-03-16 0 0
4 2014-03-16 60 NA
5 2014-04-16 30 30
6 2014-04-16 0 30
"
,header = TRUE)
then i format loaddate as date:
data$loaddate<-as.Date(as.character(data$loaddate), format='%Y-%m-%d')
Let's say I have a vector "loaddates":
loaddates<- unique(sort(data$loaddate))
And I need to run the following code for each loaddate:
for (i in loaddates) {
sqldf("
SELECT D.LoadDate,D.DaysRange, D.DaysRangeNext,
COUNT(*) AS clientes
FROM data AS D
WHERE D.loaddate = i
GROUP BY D.LoadDate,D.DaysRange, D.DaysRangeNext
ORDER BY D.DaysRange, D.DaysRangeNext
") }
But I am getting followng error:
Error in sqliteSendQuery (con, statement, bind.data): error in statement: no such column: i
Is there a way to store the value of a variable and use it inside a loop?
Thank.
EDITION:
I tried:
sqldf(
strwrap(sprintf("
SELECT D.LoadDate,D.DaysRange, D.DaysRangeNext,
COUNT(*) AS clientes
FROM data AS D
WHERE D.LoadDate = '%s'
GROUP BY D.LoadDate,D.DaysRange, D.DaysRangeNext
ORDER BY D.DaysRange, D.DaysRangeNext
",i),simplify=TRUE,width=1000000))
But I got:
> [1] loaddate DaysRange DaysRangeNext clientes <0 rows>
> (or 0-length row.names)
source to share
The variable i
will not be replaced in the request as is. You need sprintf
to assign a value to it. (I also don't know if you need to account for line breaks, but to be sure I provide it below. Maybe you don't need it for sqldf, in which case just remove strwrap).
#let assume loaddates is the following:
loaddates <- 'something'
One of the ways to get the request according to your desire, i.e. without break lines and i
given the loaddates values:
strwrap(sprintf("
SELECT D.LoadDate,D.DaysRange, D.DaysRangeNext,
COUNT(*) AS clientes
FROM deuda AS D
WHERE D.loaddate = '%s'
GROUP BY D.LoadDate,D.DaysRange, D.DaysRangeNext
ORDER BY D.DaysRange, D.DaysRangeNext
",i),simplify=TRUE,width=1000000)
which will output:
[1] "SELECT D.LoadDate,D.DaysRange, D.DaysRangeNext, COUNT(*) AS clientes FROM deuda AS D WHERE D.CodEmp = 'TGG' and D.loaddate = something GROUP BY D.LoadDate,D.DaysRange, D.DaysRangeNext ORDER BY D.DaysRange, D.DaysRangeNext"
This is what you need in one line with no break lines or with no variable i
assigned.
In your loop, it should be:
for (i in loaddates) {
strwrap(sprintf("
SELECT D.LoadDate,D.DaysRange, D.DaysRangeNext,
COUNT(*) AS clientes
FROM deuda AS D
WHERE D.loaddate = '%s'
GROUP BY D.LoadDate,D.DaysRange, D.DaysRangeNext
ORDER BY D.DaysRange, D.DaysRangeNext
",i),simplify=TRUE,width=1000000)
}
With your dataset:
library(sqldf)
data <- read.table(text ="
loaddate DaysRange DaysRangeNext
1 2014-03-16 30 30
2 2014-03-16 0 0
3 2014-03-16 0 0
4 2014-03-16 60 NA
5 2014-04-16 30 30
6 2014-04-16 0 30
"
,header = TRUE,stringsAsFactors=F)
loaddates<- unique(sort(data$loaddate))
for (i in loaddates) {
print(sqldf(
strwrap(sprintf("
SELECT D.LoadDate,D.DaysRange, D.DaysRangeNext,
COUNT(*) AS clientes
FROM data AS D
WHERE D.loaddate = '%s'
GROUP BY D.LoadDate,D.DaysRange, D.DaysRangeNext
ORDER BY D.DaysRange, D.DaysRangeNext
",i),simplify=TRUE,width=1000000) ))
}
Output:
loaddate DaysRange DaysRangeNext clientes
1 2014-03-16 0 0 2
2 2014-03-16 30 30 1
3 2014-03-16 60 NA 1
loaddate DaysRange DaysRangeNext clientes
1 2014-04-16 0 30 1
2 2014-04-16 30 30 1
source to share
You can get this by specifying the SQL statement in a loop, but outside the function call.
for (i in loaddates) {
statement = paste( " SELECT D.LoadDate,D.DaysRange, D.DaysRangeNext,
COUNT(*) AS clientes
FROM data AS D
WHERE D.loaddate = ", i,
"GROUP BY D.LoadDate,D.DaysRange, D.DaysRangeNext
ORDER BY D.DaysRange, D.DaysRangeNext " )
sqldf(statement)
}
source to share
fn$sqldf
allows to use $
inside sql statement to interpolate R variables. See Example 5 on the sqldf github home page and see the bottom of the man page ?fn
for many more examples. We could reduce it setNames(loaddates, loaddates)
to just loaddates
if we don't need the output names.
Map(function(i)
fn$sqldf("
SELECT D.LoadDate,D.DaysRange, D.DaysRangeNext,
COUNT(*) AS clientes
FROM data AS D
WHERE D.loaddate = $i
GROUP BY D.LoadDate,D.DaysRange, D.DaysRangeNext
ORDER BY D.DaysRange, D.DaysRangeNext
"), setNames(loaddates, loaddates))
giving:
$`2014-03-16`
loaddate DaysRange DaysRangeNext clientes
1 2014-03-16 0 0 2
2 2014-03-16 30 30 1
3 2014-03-16 60 NA 1
$`2014-04-16`
loaddate DaysRange DaysRangeNext clientes
1 2014-04-16 0 30 1
2 2014-04-16 30 30 1
source to share