How to count Korean Word block using Unix / Linux commands?
Korean is composed of blocks of words (for example, 가, 나, 다 라, etc.). I need a way to count these blocks of words. For example, the word 바다 (sea) should return 2.but
wc -w
will return 1
wc -c
will return 7
Thus, these options will not work for me. I would appreciate your help.
+3
Eungi kim
source
to share
1 answer
바다
encoded as UTF-8 6 bytes long. If you want to count characters, use wc -m
:
$ printf "바다" | wc -c
6
$ printf "바다" | wc -m
2
+5
Blender
source
to share