Calculate max length of each field in csv file
I have a groovy script that iterates through csv and stores the maximum length of each field in a file:
def csv = new File('./myfile.csv').text
def max = [ ] as ArrayList
csv.eachLine { line, count ->
def params = line.split(',')
// skip the header line
if (count > 0)
{
params.eachWithIndex() { p, index ->
if (p.length() > max[index] ) {
max[index] = p.length()
}
}
}
}
println "Max length of fields: ${max}"
I would like to achieve the same goal using R, ideally using a library function.
How to print the maximum length of fields in a csv file?
Input example:
foo,bar
abcd,12345
def,234567
Output:
Max length of fields: [4, 6]
Read the data in the data frame and tap the specified function by its columns. If the data is in the file, replace text = Lines
with file = "myfile.csv"
. See ?read.csv
for additional arguments which may or may not be needed depending on how your real file looks like.
# test data
Lines <- "foo,bar
abcd,12345
def,234567"
DF <- read.csv(text = Lines, colClasses = "character")
sapply(DF, function(x) max(nchar(x)))
giving:
foo bar
4 6
Note. ... One potential question is if you have such input. Fortunately, this answer is correct:
Lines <- "foo,bar
abcd,1234567e9
def,234567"
According to my experience, the fastest way is to use a function fread
from data.table
to read the file, then it will be the same as Grothendieck's answer
file_path <- './myfile.csv'
dt <- fread(file_path, colClasses = "character")
sapply(dt, function(x) max(nchar(x)))