Characters in a text are counted in UTF-16 code units (16-bit). Characters expressed in multiple code units, such as some kanji characters and Unicode emojis, are counted as multiple characters instead of one character.
However, some properties are counted in grapheme cluster units rather than UTF-16 code units. For more information, see Character counting in a text in the Messaging API documentation.
Also, unlike Unicode emojis, LINE emojis provided by LINE are internally converted to the alternative text (e.g., (love)
), so they're counted by the number of characters in the alternative text.
Examples of counting the number of characters in UTF-16 code units are as follows:
Character | UTF-16 encoded value | Number of code units | Number of characters |
---|
a | 0061 | 1 | 1 character |
あ | 3042 | 1 | 1 character |
\n | 000A | 1 | 1 character |
邊 | 908A | 1 | 1 character |
𠀋 | D840 DC0B | 2 | 2 characters |
👋 | D83D DC4B | 2 | 2 characters |
👋🏻 | D83D DC4B D83C DFFB | 4 | 4 characters |