How to count Korean Word block using Unix / Linux commands?

Korean is composed of blocks of words (for example, 가, 나, 다 라, etc.). I need a way to count these blocks of words. For example, the word 바다 (sea) should return 2.but

wc -w

will return 1

wc -c

will return 7

Thus, these options will not work for me. I would appreciate your help.

+3


source to share


1 answer


바다

encoded as UTF-8 6 bytes long. If you want to count characters, use wc -m

:



$ printf "바다" | wc -c
       6
$ printf "바다" | wc -m
       2

      

+5


source







All Articles