In R, find if two files are different
I would like a pure R method to check if two arbitrary files are different. So, equivalent
on Unix, but should work on Windows and no external dependencies.
, but seems to want to deal with the R output files and complain loudly if I feed it anything else.
I realize this is not exactly what you are asking for, but I am posting this for the benefit of others who come across this question, wanting to see the complete difference and wanting to tolerate external dependencies. In this case,
will show them to you with a real diff running on windows, with the same algorithm as GNU diff. In this example, we are comparing Moby Dick's text with its 5-line modified version:
library(diffobj) diffFile(mob.1.txt, mob.2.txt) # or 'diffChr' if you data in R already
If you want something faster, but still know the locations of the differences, you can get the shortest edit script from the same package:
ses(readLines(mob.1.txt), readLines(mob.2.txt)) #  "1127c1127" "2435c2435" "6417c6417" "13919c13919"
Code to get Moby Dick data (note, I have not set seed, so you will get different lines):
moby.dick.url <- 'http://www.gutenberg.org/files/2701/2701-0.txt' moby.dick.raw <- moby.dick.UC <- readLines(moby.dick.url) to.UC <- sample(length(moby.dick.raw), 5) moby.dick.UC[to.UC] <- toupper(moby.dick.UC[to.UC]) mob.1.txt <- tempfile() mob.2.txt <- tempfile() writeLines(moby.dick.raw, mob.1.txt) writeLines(moby.dick.UC, mob.2.txt)
source to share
Example solution: (Using the all.equals utility from: https://stat.ethz.ch/R-manual/R-devel/library/base/html/all.equal.html )
filenameForA <- "my_file_A.txt" filenameForB <- "my_file_B.txt" all.equal(readLines(filenameForA), readLines(filenameForB))
reads all lines from a given file specified by file name, then all.equal can determine if the files are different or not.
Be sure to read the documentation above to fully understand. I have to admit that if the files are very large this may not be the best option.
source to share