From Blocking REPLs to Non-blocking REPLs

By Zach Dennis on 14 01 2016

Writing an interactive command line tool using non-blocking IO opens doors that tools using blocking IO can only dream of. Your program no longer has to sit there and do nothing while it's waiting for user input. It can do other things. OTHER THINGS.

This concept of doing other things really intrigued me so I migrated my tool (a shell replacement for Bash/Zsh/fish) from blocking IO to non-blocking IO. The move went quicker and smoother than I had anticipated, but the jubilation didn't last long. Output from child processes that my tool launched was getting cut off.

I went into research mode scouring the Internet, reading man pages, and reading parts of the Linux Programming Interface (specifically section 44.9). The following day I was discussing the issue with my colleague, Sam Bleckley. Shortly thereafter we had a hypothesis, we tested it, and we knew what the issue was.

Once I had a clear understanding of the issue the fix was very straightforward, but getting to that point was a bit of a journey.

To demonstrate the issue let's take a very simple shell built in Ruby from blocking to non-blocking IO. If you're just interested in the solution feel free to skip to the end of this post.

A Simple Blocking IO Shell in Ruby

Here's our Ruby-based blocking shell:

The above shell is pretty straight forward and doesn't have any bells or whistles. It merely:

  • prints a prompt
  • get user input from STDIN via gets
  • exit if the user entered "exit"
  • otherwise run a system command
  • loop back around and do it all over again

While this is waiting for user input this shell will block at the gets until the user hits enter/return.

You can run the shell and see it action:

> ruby blocking-shell.rb
prompt> ls
FILE1.txt FILE2.txt FILE3.txt
prompt> cat FILE1.txt
this is
file1's
contents
prompt>

Now that we've got a working shell using blocking IO let's migrate to a non-blocking version.

From Blocking to Non-Blocking IO

To convert to non-blocking we need to switch from using gets to using STDIN#read_nonblock which in turn will cause us to update other parts of our input loop:

The non-blocking code isn't as straightforward as the previous version at first glance, but it's still not too bad:

  • print the prompt
  • try to read input in chunks of 4096 bytes at a time
  • keep reading input in chunks if available
  • when STDIN doesn't have anything to read it will raise a IO::WaitReadable error
  • rescue the IO::WaitReadable error
  • if input is available re-run the begin block otherwise we're ready to process user input
  • exit if the user typed "exit"
  • otherwise run the system command
  • re-print the prompt
  • loop back around and do it all over again

The non-blocking shell should continue to work just as well as the original blocking version:

> ruby non-blocking-shell.rb
prompt> ls
FILE1.txt FILE2.txt FILE3.txt
prompt> cat FILE1.txt
this is
file1's
contents
prompt>

Now that we've got a working non-blocking version let's expose the problem.

Exposing the problem

Try to cat a file with more than 1000 bytes.

When you cat the README for Ruby itself you can see that the at the end the contents got cut off and our prompt> is in an odd spot:

prompt> cat /Users/zdennis/source/opensource_projects/ruby/README.md
# What's Ruby

Ruby is the interpreted scripting language for quick and easy object-oriented
programming. It has many features to process text files and to do system
management tasks (as in Perl). It is simple, straight-forward, and
extensible.

## Features of Ruby

* Simple Syntax
* **Normal** Object-oriented Features (e.g. class, method calls)
* **Advanced** Object-oriented Features (e.g. Mix-in, Singleton-method)
* Operator Overloading
* Exception Handling
* Iterators and Closures
* Garbage Collection
* Dynamic Loading of Object Files (on some architectures)
* Highly Portable (works on many Unix-like/POSIX compatible platforms as
well as Windows, Mac OS X, BeOS, etc.) cf.
http://bugs.ruby-lang.org/projects/ruby-trunk/wiki/SupportedPlatforms

## How to get Ruby

For a complete list of ways to install Ruby, including using third-party tools
like rvm, see:

http://www.ruby-lang.org/en/downloads/

The Ruby distribution files can be found on the followinprompt>

This may seem like a bit of a mystery, but it's actually more of a puzzle. If you like puzzles be prepared to make a guess before reading the next section.

One more piece of information that I had at my disposal was that this problem also exists if you replace the system call with fork/exec/waitpid.

Now that you have all the same information I had it's time make that guess. If you're worried about being wrong, don't be, I was wrong the first eleventy times.

Understanding the issue

The issue relates to how processes are created. On *nix-based system processes are created using fork and spawn.

Both fork and spawn share their open file descriptors with the child processes that they create. STDIN is one of those file descriptors inherited by the new child process. Traveling along with it are its file descriptor flags.

Because both the child and the parent processes are pointing to the the same file descriptor both descriptors are affected by flags set by the other.

It turns out that Ruby's STDIN#read_nonblock is not side-effect free since in order to make STDIN read in a non-blocking manner it has to set the O_NONBLOCK flag on STDIN's file descriptor. When our process (e.g. the shell) creates a new process using system/fork they inherit the fact that STDIN is configured for non-blocking reads.

Another side-effect is that a child-process may make modifications to the file descriptor that in turns bubbles up and adversely affects the parent process.

To verify that claim let's create a new program with the below code that we'll call form our shell (the blocking version):

Before we run our shell be sure to chmod a+x the above file so we can run it.

bash> chmod a+x break-the-shell.rb
bash> ruby blocking-shell.rb
prompt> ./break-the-shell.rb
prompt> cat /Users/zdennis/source/opensource_projects/ruby/README.md
# What's Ruby

Ruby is the interpreted scripting language for quick and easy object-oriented
programming. It has many features to process text files and to do system
management tasks (as in Perl). It is simple, straight-forward, and
extensible.

## Features of Ruby

* Simple Syntax
* **Normal** Object-oriented Features (e.g. class, method calls)
* **Advanced** Object-oriented Features (e.g. Mix-in, Singleton-method)
* Operator Overloading
* Exception Handling
* Iterators and Closures
* Garbage Collection
* Dynamic Loading of Object Files (on some architectures)
* Highly Portable (works on many Unix-like/POSIX compatible platforms as
well as Windows, Mac OS X, BeOS, etc.) cf.
http://bugs.ruby-lang.org/projects/ruby-trunk/wiki/SupportedPlatforms

## How to get Ruby

For a complete list of ways to install Ruby, including using third-party tools
like rvm, see:

http://www.ruby-lang.org/en/downloads/

The Ruby distribution files can be found on the followinprompt>

As you can see by the last line in the above output we successfully broke it by showcasing that file descriptor flags are indeed shared between child and parent processes causing unintended side effects.

Fixing the issue

Despite the arcane low-level knowledge necessary to understand the issue the fix ends up being quite simple:

This brings in Ruby's standard libray fcntl which wraps the C library for file control.

By bringing in fcntl we can take a snapshot of the file descriptor flags we inherited when our process started. We store them away so we can use them later and continue about our business.

We let read_nonblock do its thing, but before we use fork or system to launch a child process we set the file descriptor flags back to what they were originally.

If we wanted to we could toggle off the O_NONBLOCK flag directly, but the approach we're using seems more responsible.

Summary

This post started with the intrigue that came with being able to do other things because of non-blocking IO. This intrigue almost died away when an odd issue popped up, but now that we've solved the puzzle we can go back to thinking about how to leverage this architecture to do other things.

Well, I'm off to write my shell.

Happy coding!