عجفت الغور

text boundary analysis

Tags: computers

Chrome V8 detects text boundaries with:

  • Specifically, v8 uses ICU to do a bunch of Unicode-related text processing things, including breaking text up into words. The ICU boundary-detection code includes a “Dictionary-Based BreakIterator” for languages that don’t have spaces, including Japanese, Chinese, Thai, etc.
  • https://www.instapaper.com/read/1303705767

Character boundary rules: http://www.unicode.org/reports/tr29/#Grapheme_Cluster_Boundaries