Humaid AlQassimi

< Go back

Detecting the Empty Character in Go

Jul 29, 2020 · 2 min read

I have recently been working on an online ticketing system. I have been using strings.TrimSpace for a while, and it works well. I tested it with the “empty character” from, and it failed to detect whatever whitespace characters it was using.

I thought it was just strings.TrimSpace not detecting different types of Unicode’s empty characters. So I replaced it with strings.TrimFunc(s, unicode.IsSpace), and it still didn’t clear the spaces1.

Disecting that empty character, we find it actually made up of five different characters:

  • U+200F: Right-To-Left Mark
  • U+200F: Right-To-Left Mark
  • U+200E: Left-To-Right Mark
  • U+0020: Regular Space
  • U+200E: Left-To-Right Mark

We can see that it is using a control character to prevent the regular space from being trimmed.

However, Go doesn’t list these characters as control characters2, so we cannot use unicode.IsControl. But it is included in the unicode.Bidi_Control subset. Here’s my first solution:

func isImproperChar(r rune) bool {
	return unicode.IsSpace(r) || unicode.In(r, unicode.Bidi_Control)

strings.TrimFunc(s, IsImproperChar)

This would trim away at bi-directional control characters, which is probably a really bad idea especially in systems supporting Arabic, Hebrew, or other right-to-left languages.

So we can just trim it to measure the length, then discarding the trimmed result.

func IsEmpty(s string) bool {
	return len(strings.TrimFunc(s, func(r rune) bool {
		return unicode.IsSpace(r) || unicode.In(r, unicode.Bidi_Control)
	})) == 0

Try it out on the Go playground!

Have a better solution? Please let me know!

This is my eighth post in the #100DaysToOffload challenge.

  1. I thought unicode.IsSpace wasn’t detecting detecting some types of spaces. But after some testing, that doesn’t seem to be the case. ↩︎

  2. Not listed on unicode/tables.go:7108 as pC (control character), but rather it’s included in the Bidi_Control subset. ↩︎

Would like to comment on the blog post? Feel free to start a discussion on my public general mailing list.

Articles from blogs I follow around the net

You can also read my newsfeed.

Linux development is distributed - profoundly so

The standard introduction to git starts with an explanation of what it means to use a “distributed” version control system. It’s pointed out that every developer has a complete local copy of the repository and can work independently and offline, often contra…

via Blogs on Drew DeVault's blog September 2, 2020

Why IRC is Still Good in $CURRENT_YEAR

Why IRC is Still Good in $CURRENT_YEAR Written By: Jake Bauer | Posted: 2020-08-30 | Last Updated: 2020-08-30 XKCD Comic #1782: Team Chat (CC-BY-NC 2.5) Similar to how I think e-mail is still the best discussion platform, I think there is still a solid plac…

via - What's New August 30, 2020

BTW, I Use Arch

BTW, I Use Arch Written By: Jake Bauer | Posted: 2020-08-29 | Last Updated: 2020-08-29 Okay, the title isn’t exactly 100% accurate; I’m actually using Artix Linux, an Arch Linux derivative which offers a choice regarding init systems (between OpenRC, runit,…

via - What's New August 29, 2020

Generated by openring