How to shuffle str in place

I want to shuffle String in place in Rust, but I seem to be missing something. The fix is ​​probably trivial ...

use std::rand::{Rng, thread_rng};

fn main() {
    // I want to shuffle this string...
    let mut value: String = "SomeValue".to_string();
    let mut bytes = value.as_bytes();
    let mut slice: &mut [u8] = bytes.as_mut_slice();

    thread_rng().shuffle(slice);

    println!("{}", value); 
}

      

The error I am getting is

<anon>:8:36: 8:41 error: cannot borrow immutable dereference of `&`-pointer `*bytes` as mutable
<anon>:8         let mut slice: &mut [u8] = bytes.as_mut_slice();
                                            ^~~~~

      

I read about String :: as_mut_vec (), but it's not safe, so I don't want to use it.

+3


source to share


3 answers


There is no very good way to do this, partly because of the nature of the encoding of UTF-8 strings and partly because of the inherent properties of Unicode and text.

There are at least three layers of things that can be shuffled into a UTF-8 string:

  • raw bytes
  • coded code points
  • graphemes

Shuffling the raw bytes will most likely result in an invalid UTF-8 string as output, unless the string is fully ASCII. Non-ASCII characters are encoded as special sequences of multiple bytes, and shuffling them almost certainly won't get them in the correct order at the end. Hence, byte shuffling is often not very good.

Shuffling code points ( char

in Rust) makes a little more sense, but there is still the concept of "special sequences" where so-called character combinations can be superimposed on the same letter, adding accents, etc. (for example, type letters ä

can be written as a

plus U + 0308, a code representing diaresis ). Hence, shuffling characters will not result in an invalid UTF-8 string, but it can break up those sequences and produce meaningless output.

This brings me to graphemes: sequences of code points that make up one visible character (for example, ä

still represent one grapheme when written as one or as two code points). This will give the most reliable reasonable answer.

Then, once you decide what you want to shuffle, the tattoo can be done:

  • if the string is guaranteed to be pure ASCII, the byte shuffle with is .shuffle

    reasonable (assuming ASCII, this is equivalent to others)
  • otherwise there is no standard way to work in place, one can get the elements as an iterator ( .chars()

    for code points or .graphemes(true)

    for graphemes), put them in a vector c .collect::<Vec<_>>()

    , shuffle the vector, and then put everything back into a new one String

    , for example .iter().map(|x| *x).collect::<String>()

    .

The difficulty in handling codepoints and graphemes is that UTF-8 does not encode them as a fixed width, so there is no way to extract a random code / grapheme and insert it somewhere else, or otherwise effectively replace two elements ... Without simple decoding of everything to external Vec

.



Nothingness in place is unfortunate, but the lines are difficult.

(If your strings are guaranteed to be ASCII, then using the type Ascii

provided Ascii

would be a good way to keep things straight, at the level of the level.)


For an example of the difference between three things, take a look at:

fn main() {
    let s = "U͍̤͕̜̲̼̜n̹͉̭͜ͅi̷̪c̠͍̖̻o̸̯̖de̮̻͍̤";
    println!("bytes: {}", s.bytes().count());
    println!("chars: {}", s.chars().count());
    println!("graphemes: {}", s.graphemes(true).count());
}

      

It prints:

bytes: 57
chars: 32
graphemes: 7

      

( Generate your own , it demonstrates the ability to combine multiple concatenation characters into one letter.)

+11


source


Summing up the sentence above:



use std::rand::{Rng, thread_rng};

fn str_shuffled(s: &str) -> String {
    let mut graphemes = s.graphemes(true).collect::<Vec<&str>>();
    let mut gslice = graphemes.as_mut_slice();
    let mut rng = thread_rng();
    rng.shuffle(gslice);
    gslice.iter().map(|x| *x).collect::<String>()
}


fn main() {
    println!("{}", str_shuffled("Hello, World!"));
    println!("{}", str_shuffled("selam dünya"));
    println!("{}", str_shuffled("你好世界"));
    println!("{}", str_shuffled("γειά σου κόσμος"));
    println!("{}", str_shuffled(" "));

}

      

+2


source


I am also starting with Rust, but what about:

fn main() {
    // I want to shuffle this string...
    let value = "SomeValue".to_string();
    let mut bytes = value.into_bytes();

    bytes[0] = bytes[1]; // Shuffle takes place.. sorry but std::rand::thread_rng is not available in the Rust installed on my current machine.

    match String::from_utf8(bytes) { // Should not copy the contents according to documentation.
        Ok(s) => println!("{}", s),
        _ => println!("Error occurred!")
    }
}

      

Also keep in mind that Rust's default string encoding is UTF-8 when playing with byte sequences.;)


This was a great suggestion, led me to the following solution, thanks!

use std::rand::{Rng, thread_rng};

fn main() {
    // I want to shuffle this string...
    let value: String = "SomeValue".to_string();
    let mut bytes = value.into_bytes();

    thread_rng().shuffle(&mut *bytes.as_mut_slice());

    match String::from_utf8(bytes) { // Should not copy the contents according to documentation.
        Ok(s) => println!("{}", s),
        _ => println!("Error occurred!")
    }
}

      

rustc 0.13.0-nightly (ad9e75938 2015-01-05 00:26:28 +0000)

+1


source







All Articles