How to properly iterate a UTF-8 string in OCaml?

Let's say I have an input word like "føøbær" and I want a hash table of letter frequencies st f → 1, ø → 2 - how do I do that in OCaml?

The examples only work in ASCII and doesn't tell you how to create BatUTF8 .t from string.


source to share

2 answers

The module BatUTF8

you are referring to defines its type t

as string

, so there is no need for conversion: a BatUTF8.t

- a string

. Apparently the module offers you validate

yours string

before using other functions. I assume the correct way to work would be something like this:

let s = "føøbær"
let () = BatUTF8.validate s
let () = BatUTF8.iter add_to_table s




Looking at the Batteries code I found this of_string_unsafe

, so maybe like this:

open Batteries
BatUTF8.iter (fun c -> …Hashtbl.add table c …) (BatUTF8.of_string_unsafe "føøbær")`


although since it was called "unsafe" (the doc doesn't say why) it might be equivalent to:

BatUTF8.iter (fun c -> …Hashtbl.add table c …) "føøbær"

At least it works for the example here.

Chamomile also seems to be iteratively correct:

module C = CamomileLibraryDefault.Camomile
C.iter (fun c -> …Hashtbl.add table c …) "føøbær"


I don't know of any compromises between Camomile and BatUTF8 here, although they end up storing different types (BatUChar vs C.Pervasives.UChar).



All Articles