How to properly iterate a UTF-8 string in OCaml?
Let's say I have an input word like "føøbær" and I want a hash table of letter frequencies st f → 1, ø → 2 - how do I do that in OCaml?
The examples http://pleac.sourceforge.net/pleac_ocaml/strings.html only work in ASCII and https://ocaml-batteries-team.github.io/batteries-included/hdoc2/BatUTF8.html doesn't tell you how to create BatUTF8 .t from string.
source to share
The module BatUTF8
you are referring to defines its type t
as string
, so there is no need for conversion: a BatUTF8.t
- a string
. Apparently the module offers you validate
yours string
before using other functions. I assume the correct way to work would be something like this:
let s = "føøbær"
let () = BatUTF8.validate s
let () = BatUTF8.iter add_to_table s
source to share
Looking at the Batteries code I found this of_string_unsafe
, so maybe like this:
open Batteries
BatUTF8.iter (fun c -> …Hashtbl.add table c …) (BatUTF8.of_string_unsafe "føøbær")`
although since it was called "unsafe" (the doc doesn't say why) it might be equivalent to:
BatUTF8.iter (fun c -> …Hashtbl.add table c …) "føøbær"
At least it works for the example here.
Chamomile also seems to be iteratively correct:
module C = CamomileLibraryDefault.Camomile
C.iter (fun c -> …Hashtbl.add table c …) "føøbær"
I don't know of any compromises between Camomile and BatUTF8 here, although they end up storing different types (BatUChar vs C.Pervasives.UChar).
source to share