Do the dplyr functions on the tbl database execute locally or remotely?

I am using dplyr locally and I found it to be a very powerful tool. One thing that comes in many introductory talks that I found - that's how you can use it to work with a database table "to work only with the right data" through its aggregation function summarize

, mutate

etc. I understand how this translates them into SQL statements, but not many other operations.

For example, if I wanted to work with a database table like tbl

, and I wanted to run a function from the result of my pipeline through do()

, for example glm

, it would glm

be transferred to the database anyway, or the data is necessarily loaded (in any shorthand form) and then glm

run locally?

This is an important difference depending on the size of the table in question. Thank!

+3


source to share


1 answer


Any parsing of R, invocation glm()

, is done locally. As @joran already pointed out, the vignette database , introductory documentation , development information , and many of which can be found in use dplyr

, are helpful in learning how certain operations are converted to SQL and performed on the DB system.I believe you can call certain bottlenecks by introducing R-specific analyzes in the middle of the chain of operations where DB-backed completion of operations first may be more efficient.



+2


source







All Articles