Calculate max length of each field in csv file

I have a groovy script that iterates through csv and stores the maximum length of each field in a file:

def csv = new File('./myfile.csv').text

def max = [ ] as ArrayList

csv.eachLine { line, count ->

    def params = line.split(',')

    // skip the header line
    if (count > 0) 
    {
        params.eachWithIndex() { p, index ->        
            if (p.length() > max[index] ) {
                max[index] = p.length()
            }
        }
     }
}
println "Max length of fields: ${max}"

      

I would like to achieve the same goal using R, ideally using a library function.

How to print the maximum length of fields in a csv file?

Input example:

foo,bar
abcd,12345
def,234567

      

Output:

Max length of fields: [4, 6]

      

+3


source to share


2 answers


Read the data in the data frame and tap the specified function by its columns. If the data is in the file, replace text = Lines

with file = "myfile.csv"

. See ?read.csv

for additional arguments which may or may not be needed depending on how your real file looks like.

# test data
Lines <- "foo,bar
abcd,12345
def,234567"

DF <- read.csv(text = Lines, colClasses = "character")
sapply(DF, function(x) max(nchar(x)))

      

giving:



foo bar 
  4   6 

      

Note. ... One potential question is if you have such input. Fortunately, this answer is correct:

Lines <- "foo,bar
abcd,1234567e9
def,234567"

      

+1


source


According to my experience, the fastest way is to use a function fread

from data.table

to read the file, then it will be the same as Grothendieck's answer



file_path <- './myfile.csv'
dt <- fread(file_path, colClasses = "character")
sapply(dt, function(x) max(nchar(x)))

      

+1


source







All Articles