In R, find if two files are different

I would like a pure R method to check if two arbitrary files are different. So, equivalent diff -q

on Unix, but should work on Windows and no external dependencies.

I know tools::Rdiff

, but seems to want to deal with the R output files and complain loudly if I feed it anything else.

+8


source to share


5 answers


No memory usage if files are too large:



library(tools)
md5sum("file_1.txt") == md5sum("file_2.txt")

      

+19


source


I realize this is not exactly what you are asking for, but I am posting this for the benefit of others who come across this question, wanting to see the complete difference and wanting to tolerate external dependencies. In this case, diffobj

will show them to you with a real diff running on windows, with the same algorithm as GNU diff. In this example, we are comparing Moby Dick's text with its 5-line modified version:

library(diffobj)
diffFile(mob.1.txt, mob.2.txt)   # or 'diffChr' if you data in R already

      

Produces:

enter image description here

If you want something faster, but still know the locations of the differences, you can get the shortest edit script from the same package:



ses(readLines(mob.1.txt), readLines(mob.2.txt))
# [1] "1127c1127"   "2435c2435"   "6417c6417"   "13919c13919"

      


Code to get Moby Dick data (note, I have not set seed, so you will get different lines):

moby.dick.url <- 'http://www.gutenberg.org/files/2701/2701-0.txt'
moby.dick.raw <- moby.dick.UC <- readLines(moby.dick.url)
to.UC <- sample(length(moby.dick.raw), 5)
moby.dick.UC[to.UC] <- toupper(moby.dick.UC[to.UC])

mob.1.txt <- tempfile()
mob.2.txt <- tempfile()

writeLines(moby.dick.raw, mob.1.txt)
writeLines(moby.dick.UC, mob.2.txt)

      

+6


source


the closest command to Unix is diffr

- it shows a really nice window next to all the different colored lines.

library(diffr)
diffr(filename1, filename2)

      

shows

enter image description here

+4


source


Example solution: (Using the all.equals utility from: https://stat.ethz.ch/R-manual/R-devel/library/base/html/all.equal.html )

filenameForA <- "my_file_A.txt"
filenameForB <- "my_file_B.txt"
all.equal(readLines(filenameForA), readLines(filenameForB))

      

note that

readLines(filename)

      

reads all lines from a given file specified by file name, then all.equal can determine if the files are different or not.

Be sure to read the documentation above to fully understand. I have to admit that if the files are very large this may not be the best option.

+2


source


all.equal(readLines(f1), readLines(f2))

      

0


source







All Articles