text boundary analysis
Tags: computers
Chrome V8 detects text boundaries with:
- Specifically, v8 uses ICU to do a bunch of Unicode-related text processing things, including breaking text up into words. The ICU boundary-detection code includes a “Dictionary-Based BreakIterator” for languages that don’t have spaces, including Japanese, Chinese, Thai, etc.
- https://www.instapaper.com/read/1303705767
Character boundary rules: http://www.unicode.org/reports/tr29/#Grapheme_Cluster_Boundaries