Is this the correct way to read lines from a file and split them into words in Rust?
Editor's Note: This code example applies to pre-1.0 Rust and is not syntactically correct Rust 1.0 code. Updated versions of this code generate different bugs, but the answers still contain valuable information.
I applied the following method to get me back the words from a file in a 2D data structure:
fn read_terms() -> Vec<Vec<String>> {
let path = Path::new("terms.txt");
let mut file = BufferedReader::new(File::open(&path));
return file.lines().map(|x| x.unwrap().as_slice().words().map(|x| x.to_string()).collect()).collect();
}
Is this the correct, idiomatic, and efficient way in Rust? I am wondering if it collect()
needs to be called so often and whether it needs to be called here to_string()
to allocate memory. Maybe the return type should be defined differently to be more idiomatic and efficient?
source to share
Instead, you can read the entire file as one String
and then build a link structure that points to the words inside:
use std::io::{self, Read};
use std::fs::File;
fn filename_to_string(s: &str) -> io::Result<String> {
let mut file = File::open(s)?;
let mut s = String::new();
file.read_to_string(&mut s)?;
Ok(s)
}
fn words_by_line<'a>(s: &'a str) -> Vec<Vec<&'a str>> {
s.lines().map(|line| {
line.split_whitespace().collect()
}).collect()
}
fn example_use() {
let whole_file = filename_to_string("terms.txt").unwrap();
let wbyl = words_by_line(&whole_file);
println!("{:?}", wbyl)
}
This will read the file with less overhead since it can nest it into one buffer, whereas read s lines BufReader
involve a lot of copying and allocation, first into the buffer inside BufReader
, then into a new one allocated String
for each line, and then into a new allocation String
for each the words. It will also use less memory because one is large String
and the reference vectors are more compact than many individual String
s.
The downside is that you cannot directly return the link structure because you cannot live off the stack while holding one large String
. In the example_use
above, we have to put a big one String
in let
to trigger words_by_line
. It is possible to work around this with unsafe code and wrap String
both references in a private structure, but this is much more complicated.
source to share
There is a shorter and more readable way to get words from a text file.
use std::io::{BufRead, BufReader};
use std::fs::File;
let reader = BufReader::new(File::open("file.txt").expect("Cannot open file.txt"));
for line in reader.lines() {
for word in line.unwrap().split_whitespace() {
println!("word '{}'", word);
}
}
source to share