Are there any rust functions for wrapping an iterator that is dependent on a reference so the wrapper contains the referent?

103 Views Asked by At

In this case I want to read integers from standard input such that they are separated by spaces and newline. My first attempt was similar to the following code:

fn splitter(x: String) -> impl Iterator<Item=&'static str> {
    x.as_str().split_whitespace()
}

fn valuereader<A: std::str::FromStr>() -> impl Iterator<Item=A> 
where <A as std::str::FromStr>::Err: std::fmt::Debug
{
    let a = std::io::stdin().lines();
    let b = a.map(Result::unwrap);
    let c = b.flat_map(splitter);
    c.map(|x|x.parse().expect("Not an integer!"))
}

fn main() {
    let temp: Vec<usize> = valuereader().collect();
    println!("{:?}", temp);
}

The problem is that split_whitespace wants a &str, but std::io::stdin().lines() returns an owned String. I don't want to use x.as_str().split_whitespace().collect(), because I don't want to allocate a temporary vector.

The best solution I could come up with was to use a wrapper that contains the owned String and the iterator that depends on the String, using unsafe code. The wrapper's implementation of Iterator is simply a wrapper for the iterator that depends on the String. This was the result:

mod move_wrapper {
    use std::pin::Pin;
    pub fn to_wrapper<'b, A: 'b, F, B: 'b> (a: A, f: F) -> Wrapper<A,B>
    where
        F: FnOnce (&'b A) -> B
    {
        let contained_a = Box::pin(a);
        // Here is the use of unsafe. It is necessary to create a reference to a that can live as long as long as needed.
        // This should not be dangerous as no-one outside this module will be able to copy this reference, and a will live exactly as long as b inside Wrapper.
        let b = f(unsafe{&*core::ptr::addr_of!(*contained_a)});
        Wrapper::<A,B> {_do_not_use:contained_a, dependent:b}
    }

    pub struct Wrapper<A,B> {
        _do_not_use: Pin<Box<A>>,
        dependent: B
    }

    impl<A,B: Iterator> Iterator for Wrapper<A,B>
    {
        type Item = B::Item;
        fn next(&mut self) -> Option<Self::Item> {
            self.dependent.next()
        }
    }
}

fn splitter(x: String) -> impl Iterator<Item=&'static str> {
    move_wrapper::to_wrapper(x, |a|a.as_str().split_whitespace())
}

fn valuereader<A: std::str::FromStr>() -> impl Iterator<Item=A> 
where <A as std::str::FromStr>::Err: std::fmt::Debug
{
    let a = std::io::stdin().lines();
    let b = a.map(Result::unwrap);
    let c = b.flat_map(splitter);
    c.map(|x|x.parse().expect("Not an integer!"))
}

fn main() {
    let temp: Vec<usize> = valuereader().collect();
    println!("{:?}", temp);
}

Now to the actual question. How would you do this as idiomatic as possible, if possible without using any unsafe code (does the function here called to_wrapper exist)? Have I written safe unsafe code? Is there any way to make my Wrapper work for all traits, not just Iterator?

EDIT

To be clearer, this question is about creating a method you can apply anytime you have to give ownership to something that wants a reference, not about how to read from standard input and parse to integers.

1

There are 1 best solutions below

1
drewtato On

You can't create self-referential types in safe rust, as explained here. Unfortunately, fixing this issue still leaves the next one.

Iterators that return items that can't outlive the iterator are impossible, explained here. Yours is even more restrictive: it's trying to create items that can't exist by the time the next item is fetched, which means you need a lending iterator to make this in its current state.

It would be quite easy to create your Vec with some nested for loops:

fn valuereader<A: FromStr>() -> Vec<A>
where
    A::Err: Debug,
{
    let mut v = Vec::new();
    for line in std::io::stdin().lines() {
        for word in line.unwrap().split_whitespace() {
            let a = word.parse().unwrap();
            v.push(a);
        }
    }
    v
}

However, this is not very instructive, and creates many temporary allocations, especially if you only need an iterator and not a Vec.

In order to make your original idea work idiomatically, you need an iterator that produces owned items. Fortunately, your final item type is usize (or anything that comes out of parse, which has to be owned), so you can create an iterator that creates those. This only allocates one String, which will grow to the length of the longest line. (playground)

use std::fmt::Debug;
use std::io::BufRead;
use std::marker::PhantomData;
use std::str::FromStr;

#[derive(Debug, Clone)]
struct ValueReader<B, V> {
    // The underlying BufRead
    buffer: B,
    // The current line being read
    line: String,
    // The current offset into the current line
    index: usize,
    // The type being parsed
    _value_type: PhantomData<V>,
}

impl<B, V> ValueReader<B, V> {
    fn new(b: B) -> Self {
        Self {
            buffer: b,
            line: String::new(),
            index: 0,
            _value_type: PhantomData,
        }
    }

    fn value(&mut self) -> Option<V>
    where
        V: FromStr,
        V::Err: Debug,
        B: BufRead,
    {
        loop {
            // Check if line is consumed, or the iterator just started
            if self.line.is_empty() {
                let bytes = self.buffer.read_line(&mut self.line).unwrap();
                // Buffer is completely consumed
                if bytes == 0 {
                    return None;
                }
            }

            let unconsumed = self.line[self.index..].trim_start();
            self.index = self.line.len() - unconsumed.len();

            let Some(word) = unconsumed.split_whitespace().next() else {
                // Line is consumed, reset to original state
                self.index = 0;
                self.line.clear();
                continue;
            };
            self.index += word.len();

            return Some(word.parse().unwrap());
        }
    }
}

impl<B, V> Iterator for ValueReader<B, V>
where
    V: FromStr,
    V::Err: Debug,
    B: BufRead,
{
    type Item = V;

    fn next(&mut self) -> Option<Self::Item> {
        self.value()
    }
}

This could be made more efficient by using fill_buf and consume to only read one word at a time, shortening the max length of the String. It would also be sensible to report errors instead of unwrapping.