uga.dev - A Front-end Engineer's shed

The current theme is "Light mode".

Unicode PlaygroundAnalyze emojis, surrogate pairs, and ZWJ sequences

A Unicode analysis tool that compares the perceived character count with JavaScript's .length value.

It breaks down emojis, surrogate pairs, and ZWJ sequences to visualize the differences between UTF-16 .length, code point count, and grapheme count.

*The entered values will not be collected or sent to the outside.For more details, please see ourPrivacy Policy (Japanese).

※ 日本語はこちら

Code point 7/7: U+1F466 (EMOJI)

Result 7 of 7: 👨‍👩‍👧‍👦

UTF-16 length
11.length
UTF-8 bytes
25TextEncoder
Code points
7[...str].length
Graphemes
1Intl.Segmenter

💡 ASTRAL (U+1F466)

This code point is outside the BMP (U+10000 and above). In UTF-16, it is split into two code units (a surrogate pair), giving a .length of 2.
Surrogate pair: U+D83D + U+DC66

7 / 7

Sample code for measuring strings in JavaScript
const string = "👨‍👩‍👧‍👦";

console.log(string.length); // > 11
console.log(new TextEncoder().encode(string).length); // > 25
console.log([...string].length); // > 7
const segmenter = new Intl.Segmenter('en', { granularity: 'grapheme' })
console.log([...segmenter.segment(string)].length); // > 1

Tips

MetricDescription
UTF-16 length

str.length

Number of UTF-16 code units. Surrogate pairs (U+10000 and above) count as 2.

UTF-8 bytes

new TextEncoder().encode(str).length

Byte length after UTF-8 encoding. ASCII is 1 byte, Japanese characters are 3 bytes, and most emojis are 4 bytes.

Code points

[...str].length

Number of Unicode code points. Spread syntax counts surrogate pairs as 1.

Graphemes

Intl.Segmenter

Number of user-perceived characters. ZWJ sequences and flag emojis count as 1.

💡Some characters look like one grapheme but take more than one Backspace to delete

Grapheme clusters reported by Intl.Segmenter represent what users perceive as a single character, but the delete operation in input or textarea elements does not always operate on that unit.

Depending on the browser or OS implementation, ZWJ-joined emojis or decomposed characters like é (e + combining acute) may look like a single character yet require multiple Backspace presses to disappear.

𝕏Share