How to properly iterate a UTF-8 string in OCaml?

Let's say I have an input word like "føøbær" and I want a hash table of letter frequencies st f → 1, ø → 2 - how do I do that in OCaml?

The examples http://pleac.sourceforge.net/pleac_ocaml/strings.html only work in ASCII and https://ocaml-batteries-team.github.io/batteries-included/hdoc2/BatUTF8.html doesn't tell you how to create BatUTF8 .t from string.

+3


source to share


2 answers


The module BatUTF8

you are referring to defines its type t

as string

, so there is no need for conversion: a BatUTF8.t

- a string

. Apparently the module offers you validate

yours string

before using other functions. I assume the correct way to work would be something like this:



let s = "føøbær"
let () = BatUTF8.validate s
let () = BatUTF8.iter add_to_table s

      

+2


source


Looking at the Batteries code I found this of_string_unsafe

, so maybe like this:

open Batteries
BatUTF8.iter (fun c -> …Hashtbl.add table c …) (BatUTF8.of_string_unsafe "føøbær")`

      

although since it was called "unsafe" (the doc doesn't say why) it might be equivalent to:

BatUTF8.iter (fun c -> …Hashtbl.add table c …) "føøbær"



At least it works for the example here.

Chamomile also seems to be iteratively correct:

module C = CamomileLibraryDefault.Camomile
C.iter (fun c -> …Hashtbl.add table c …) "føøbær"

      

I don't know of any compromises between Camomile and BatUTF8 here, although they end up storing different types (BatUChar vs C.Pervasives.UChar).

+1


source







All Articles