Is this the correct way to read lines from a file and split them into words in Rust?

Editor's Note: This code example applies to pre-1.0 Rust and is not syntactically correct Rust 1.0 code. Updated versions of this code generate different bugs, but the answers still contain valuable information.

I applied the following method to get me back the words from a file in a 2D data structure:

fn read_terms() -> Vec<Vec<String>> {
    let path = Path::new("terms.txt");
    let mut file = BufferedReader::new(File::open(&path));
    return file.lines().map(|x| x.unwrap().as_slice().words().map(|x| x.to_string()).collect()).collect();
}

      

Is this the correct, idiomatic, and efficient way in Rust? I am wondering if it collect()

needs to be called so often and whether it needs to be called here to_string()

to allocate memory. Maybe the return type should be defined differently to be more idiomatic and efficient?

+3


source to share


2 answers


Instead, you can read the entire file as one String

and then build a link structure that points to the words inside:

use std::io::{self, Read};
use std::fs::File;

fn filename_to_string(s: &str) -> io::Result<String> {
    let mut file = File::open(s)?;
    let mut s = String::new();
    file.read_to_string(&mut s)?;
    Ok(s)
}

fn words_by_line<'a>(s: &'a str) -> Vec<Vec<&'a str>> {
    s.lines().map(|line| {
        line.split_whitespace().collect()
    }).collect()
}

fn example_use() {
    let whole_file = filename_to_string("terms.txt").unwrap();
    let wbyl = words_by_line(&whole_file);
    println!("{:?}", wbyl)
}

      



This will read the file with less overhead since it can nest it into one buffer, whereas read s lines BufReader

involve a lot of copying and allocation, first into the buffer inside BufReader

, then into a new one allocated String

for each line, and then into a new allocation String

for each the words. It will also use less memory because one is large String

and the reference vectors are more compact than many individual String

s.

The downside is that you cannot directly return the link structure because you cannot live off the stack while holding one large String

. In the example_use

above, we have to put a big one String

in let

to trigger words_by_line

. It is possible to work around this with unsafe code and wrap String

both references in a private structure, but this is much more complicated.

+4


source


There is a shorter and more readable way to get words from a text file.



use std::io::{BufRead, BufReader};
use std::fs::File;

let reader = BufReader::new(File::open("file.txt").expect("Cannot open file.txt"));

for line in reader.lines() {
    for word in line.unwrap().split_whitespace() {
        println!("word '{}'", word);
    }
}

      

+3


source







All Articles