Concatenate a vector of string vectors
I am trying to write a function that takes a vector of row vectors and returns all vectors concatenated together, i.e. returns a vector of strings.
The best I could do so far was this:
fn concat_vecs(vecs: Vec<Vec<String>>) -> Vec<String> {
let vals : Vec<&String> = vecs.iter().flat_map(|x| x.into_iter()).collect();
vals.into_iter().map(|v: &String| v.to_owned()).collect()
}
However, I'm not happy with this result because it seems like I should get Vec<String>
from the first call collect
, but somehow I can't figure out how to do it.
I'm even more interested in figuring out why exactly the return type collect
is equal Vec<&String>
. I tried to infer this from the API documentation and source code, but despite my best efforts, I couldn't even figure out the function signatures.
So let me try and trace the types of each expression:
- vecs.iter(): Iter<T=Vec<String>, Item=Vec<String>>
- vecs.iter().flat_map(): FlatMap<I=Iter<Vec<String>>, U=???, F=FnMut(Vec<String>) -> U, Item=U>
- vecs.iter().flat_map().collect(): (B=??? : FromIterator<U>)
- vals was declared as Vec<&String>, therefore
vals == vecs.iter().flat_map().collect(): (B=Vec<&String> : FromIterator<U>). Therefore U=&String.
My guess is that the type inferencer can determine which is U=&String
based on the type vals
. But if I give an expression to explicit types in code, this compiles without error:
fn concat_vecs(vecs: Vec<Vec<String>>) -> Vec<String> {
let a: Iter<Vec<String>> = vecs.iter();
let b: FlatMap<Iter<Vec<String>>, Iter<String>, _> = a.flat_map(|x| x.into_iter());
let c = b.collect();
print_type_of(&c);
let vals : Vec<&String> = c;
vals.into_iter().map(|v: &String| v.to_owned()).collect()
}
I see U=Iter<String>
... Please help me sort out this mess.
EDIT: Thanks to bluss' tip, I was able to achieve one collect
like this:
fn concat_vecs(vecs: Vec<Vec<String>>) -> Vec<String> {
vecs.into_iter().flat_map(|x| x.into_iter()).collect()
}
I understand that by using the into_iter
I pass ownership vecs
on IntoIter
and on down the call chain that allows me to avoid copying data within the lambda-call, and hence - magically - type system gives me Vec<String>
where he is always always give me Vec<&String>
. While it's certainly great to see how the high-level concept is reflected in the library, I wish I knew how this is achieved.
EDIT 2: After a tedious process of guessing, looking at the API docs and using this method to decipher the types, I got them fully annotated (excluding lifetime):
fn concat_vecs(vecs: Vec<Vec<String>>) -> Vec<String> {
let a: Iter<Vec<String>> = vecs.iter();
let f : &Fn(&Vec<String>) -> Iter<String> = &|x: &Vec<String>| x.into_iter();
let b: FlatMap<Iter<Vec<String>>, Iter<String>, &Fn(&Vec<String>) -> Iter<String>> = a.flat_map(f);
let vals : Vec<&String> = b.collect();
vals.into_iter().map(|v: &String| v.to_owned()).collect()
}
source to share
I would have thought: why are you using iter () on the outer vec, but in_iter () on the inner vecs? Usage into_iter()
is actually critical, so we don't need to copy the inner vectors first and then the strings inside, we just get ownership of them.
We can write this in the same way as summation: combine vectors two by two. Since we always reuse the distribution and contents of the same accumulation vector, this operation is linear time.
To minimize the time it takes to zoom in and out of the vector, calculate the amount of space ahead.
fn concat_vecs(vecs: Vec<Vec<String>>) -> Vec<String> {
let size = vecs.iter().fold(0, |a, b| a + b.len());
vecs.into_iter().fold(Vec::with_capacity(size), |mut acc, v| {
acc.extend(v); acc
})
}
If you want to clone all content, there is already a method for this and you simply use vecs.concat() /* -> Vec<String> */
The c approach .flat_map
is good, but if you don't want to clone strings again, you should use .into_iter()
at all levels: ( x
is Vec<String>
).
vecs.into_iter().flat_map(|x| x.into_iter()).collect()
If you want to clone each line instead, you can use this: (Changed .into_iter()
to .iter()
since x
here &Vec<String>
and both methods actually result in the same!)
vecs.iter().flat_map(|x| x.iter().map(Clone::clone)).collect()
source to share