Unicode PlaygroundAnalyze emojis, surrogate pairs, and ZWJ sequences
A Unicode analysis tool that compares the perceived character count with JavaScript's .length value.
It breaks down emojis, surrogate pairs, and ZWJ sequences to visualize the differences between UTF-16 .length, code point count, and grapheme count.
*The entered values will not be collected or sent to the outside.For more details, please see ourPrivacy Policy (Japanese).
Code point 7/7: U+1F466 (EMOJI)
Result 7 of 7: 👨👩👧👦
- UTF-16 length
- 11.length
- UTF-8 bytes
- 25TextEncoder
- Code points
- 7[...str].length
- Graphemes
- 1Intl.Segmenter
💡 ASTRAL (U+1F466)
This code point is outside the BMP (U+10000 and above). In UTF-16, it is split into two code units (a surrogate pair), giving a .length of 2.
Surrogate pair: U+D83D + U+DC66
7 / 7
const string = "👨👩👧👦";
console.log(string.length); // > 11
console.log(new TextEncoder().encode(string).length); // > 25
console.log([...string].length); // > 7
const segmenter = new Intl.Segmenter('en', { granularity: 'grapheme' })
console.log([...segmenter.segment(string)].length); // > 1Tips
| Metric | Description |
|---|---|
| UTF-16 length |
Number of UTF-16 code units. Surrogate pairs (U+10000 and above) count as 2. |
| UTF-8 bytes |
Byte length after UTF-8 encoding. ASCII is 1 byte, Japanese characters are 3 bytes, and most emojis are 4 bytes. |
| Code points |
Number of Unicode code points. Spread syntax counts surrogate pairs as 1. |
| Graphemes |
Number of user-perceived characters. ZWJ sequences and flag emojis count as 1. |
💡Some characters look like one grapheme but take more than one Backspace to delete
Grapheme clusters reported by Intl.Segmenter represent what users perceive as a single character, but the delete operation in input or textarea elements does not always operate on that unit.
Depending on the browser or OS implementation, ZWJ-joined emojis or decomposed characters like é (e + combining acute) may look like a single character yet require multiple Backspace presses to disappear.