Unicode PlaygroundAnalyze emojis, surrogate pairs, and ZWJ sequences

A Unicode analysis tool that compares the perceived character count with JavaScript's .length value.

It breaks down emojis, surrogate pairs, and ZWJ sequences to visualize the differences between UTF-16 .length, code point count, and grapheme count.

*The entered values will not be collected or sent to the outside.For more details, please see ourPrivacy Policy (Japanese).

※ 日本語はこちら

String to analyze

Multiline

Code point 7/7: U+1F466 (EMOJI)

Result 7 of 7: 👨‍👩‍👧‍👦

UTF-16 length: 11.length
UTF-8 bytes: 25TextEncoder
Code points: 7[...str].length
Graphemes: 1Intl.Segmenter

💡 ASTRAL (U+1F466)

This code point is outside the BMP (U+10000 and above). In UTF-16, it is split into two code units (a surrogate pair), giving a .length of 2.
Surrogate pair: U+D83D + U+DC66

7 / 7

Sample code for measuring strings in JavaScript

const string = "👨‍👩‍👧‍👦";

console.log(string.length); // > 11
console.log(new TextEncoder().encode(string).length); // > 25
console.log([...string].length); // > 7
const segmenter = new Intl.Segmenter('en', { granularity: 'grapheme' })
console.log([...segmenter.segment(string)].length); // > 1

Tips

Metric	Description
UTF-16 length	`str.length` Number of UTF-16 code units. Surrogate pairs (U+10000 and above) count as 2.
UTF-8 bytes	`new TextEncoder().encode(str).length` Byte length after UTF-8 encoding. ASCII is 1 byte, Japanese characters are 3 bytes, and most emojis are 4 bytes.
Code points	`[...str].length` Number of Unicode code points. Spread syntax counts surrogate pairs as 1.
Graphemes	`Intl.Segmenter` Number of user-perceived characters. ZWJ sequences and flag emojis count as 1.

💡Some characters look like one grapheme but take more than one Backspace to delete

Grapheme clusters reported by Intl.Segmenter represent what users perceive as a single character, but the delete operation in input or textarea elements does not always operate on that unit.

Depending on the browser or OS implementation, ZWJ-joined emojis or decomposed characters like é (e + combining acute) may look like a single character yet require multiple Backspace presses to disappear.