Splitting an iterator <(A, B)> into an iterator <a> and an iterator <b>

Question

Splitting an iterator <(A, B)> into an iterator <a> and an iterator <b>

I would like to split the output of an object that implements Iterator<(A,B)>

into two objects that implement Iterator<A>

and Iterator<B>

. Since one of the outputs can be iterated over more than the other, I will need to buffer the output Iterator<(A,B)>

(because I cannot rely on Iterator<(A,B)>

which to clone.) The problem is that the iterator could be infinite, so I cannot just collect the output of the iterator into two buffers and return iterators over two buffers.

It seems to me that I will need to store buffers of objects A

and B

, and whenever one of the buffers is empty, I will fill it with samples from the object Iterator<(A,B)>

. This means that I will need two iterable structures that have mutable references to the input iterator (since they will need to be called next()

in the input to fill the buffers), which is not possible.

So, is there a way to do this in a safe way?

+3

rust

awelkie 30 Aug 14 at 20:55

source share

1 answer

huon · Accepted Answer · 2014-08-31T02:22:11+0000

It is possible. As you have determined, you need mutable references to the underlying iterator from both descriptors, which is possible with an "internally mutable" type , that is, one that uses internal code unsafe

to expose a safe API to access &mut

the aliased data (i.e. contained in &

) by dynamically enforcing invariants that the compiler normally enforces at compile time outside unsafe

.

I assume you are happy to keep two iterators on the same thread ¹ so in this case we want RefCell

. We also need to have access to the RefCell

two handles, resulting in saved either &RefCell<...>

or Rc<RefCell<...>>

. The former would be too restrictive, as it would allow us to use a pair of iterators in the stack frame and below the stack frame in which we are created RefCell

, whereas we want to be able to freely traverse the iterators around, therefore Rc

.

In general we are mainly going to store Rc<RefCell<Iterator<(A,B)>>>

, but only a matter of buffering. The right tool to work with here is RingBuf

as we want efficient front and back push / pop. Thus, what we are using (i.e. inside RefCell

) might look like this:

struct SharedInner<A, B, It> {
    iter: It,
    first: RingBuf<A>,
    second: RingBuf<B>,
}

We can abbreviate the type actually shared as type Shared<A, B, It> = Rc<RefCell<SharedInner<A, B, It>>>;

, which allows us to define iterators:

struct First<A, B, It> {
    data: Shared<A, B, It>
}
impl Iterator<A> for First<A,B,It> {
    fn next(&mut self) -> Option<A> {
        // ...
    }
}

For implementation, the next

first thing to do is get &mut

to SharedInner

through self.data.borrow_mut();

. And then fetch an element from it: check the right buffer or else, get a new element from iter

(remember to buffer the remaining one B

):

let mut inner = self.data.borrow_mut();

inner.first.pop_front().or_else(|| {
    inner.iter.next().map(|(a,b)| {
        inner.second.push(b);
        a
    })
})

Documents: RingBuf.pop_front

, Option.or_else

.

The iterator for the other side is similar. Total:

use std::cell::RefCell;
use std::collections::{Deque, RingBuf};
use std::rc::Rc;

struct SharedInner<A, B, It> {
    iter: It,
    first: RingBuf<A>,
    second: RingBuf<B>
}

type Shared<A, B, It> = Rc<RefCell<SharedInner<A, B, It>>>;

struct First<A, B, It> {
    data: Shared<A, B, It>
}

impl<A,B, It: Iterator<(A,B)>> Iterator<A> for First<A, B, It> {
    fn next(&mut self) -> Option<A> {
        let mut inner = self.data.borrow_mut();

        // try to get one from the stored data
        inner.first.pop_front().or_else(|| 
            // nothing stored, we need a new element.
            inner.iter.next().map(|(a, b)| {
                inner.second.push(b);
                a
            }))
    }
}

struct Second<A, B, It> {
    data: Shared<A, B, It>
}
impl<A,B, It: Iterator<(A,B)>> Iterator<B> for Second<A,B,It> {
    fn next(&mut self) -> Option<B> {
        let mut inner = self.data.borrow_mut();

        inner.second.pop_front().or_else(|| {
            inner.iter.next().map(|(a, b)| {
                inner.first.push(a);
                b
            })
        })
    }
}

fn split<A, B, It: Iterator<(A,B)>>(it: It) -> (First<A, B, It>, 
                                                Second<A, B, It>) {
    let data = Rc::new(RefCell::new(SharedInner { 
        iter: it,
        first: RingBuf::new(),
        second: RingBuf::new(),
    }));

    (First { data: data.clone() }, Second { data: data })
}

fn main() {
    let pairs = range(1u32, 10 + 1).map(|x| (x, 1.0 / x as f64));

    let (mut first, mut second) = split(pairs);

    println!("first:");
    for x in first.by_ref().take(3) {
        println!("  {}", x);
    }

    println!("second:");
    for y in second.by_ref().take(5) {
        if y < 0.2 { break }
        println!("  {}", y);
    }

    let a = first.collect::<Vec<u32>>();
    let b = second.collect::<Vec<f64>>();

    println!("a {}\nb {}", a, b);
}

which prints

first:
  1
  2
  3
second:
  1
  0.5
  0.333333
  0.25
  0.2
a [4, 5, 6, 7, 8, 9, 10]
b [0.166667, 0.142857, 0.125, 0.111111, 0.1]

playpen .

There can be optimized in various ways, for example. when fetching in First

, buffer only the top left B

if a handle exists Second

.

¹ If you want to run them in separate threads, just replace RefCell

with Mutex

and Rc

with Arc

and add the required scores.

Splitting an iterator <(A, B)> into an iterator <a> and an iterator <b>

More articles: