text boundary analysis

Chrome V8 detects text boundaries with:

Specifically, v8 uses ICU to do a bunch of Unicode-related text processing things, including breaking text up into words. The ICU boundary-detection code includes a “Dictionary-Based BreakIterator” for languages that don’t have spaces, including Japanese, Chinese, Thai, etc.
https://www.instapaper.com/read/1303705767