Who's reading my file?

By Zach Dennis on 23 10 2013

We had just deployed to a staging environment and nothing seemed to be working right. There were no problems though when running locally or in the integrated dev/QA environments. The logs showed there were SSL certificate verification issues, however, the SSL certificate was validating fine in browsers. Something else was amiss, but what?

We did find and solve the SSL issue. It was caused by the customer-specific truststore for root certificates. It was missing the certificate for the certificate authority that had signed their SSL certificate, Comodo. That's not the interesting part though.

There were multiple keystores and truststores on the production server not counting the system or JVM defaults. After spending time looking through configurations to see what was being used we decided to take step back and interrogate the system. It could tell us what file was being opened.

This got me thinking about three ways that I had used in the past to interrogate the system: lsof, strace, and dtrace. In fact, I even got to learn about a fourth: SystemTap.

While I didn't get to use all of these to solve our issue (as dtrace and SystemTap aren't available on all systems), they are good tools to have in your toolbox.

Here are three simple ways to use them to find out who's opened a particular file, or, as in our case: what file has been opened.

lsof

lsof is a unix command which is for "list open files". It can be used to determine which process currently has a file open. It has a lot of options, but in its most basic form is incredibly easy to use:

The above command will show you which process has /path/to/file open.

Here's a working example:

If you close the file or end the irb session and re-run lsof you'll no longer see output as there is no process with the file currently open.

You can find more information via its main page. There are a lot of options, but at its heart the simple rule is this: if a process currently has a file open lsof will be able to tell you about it.

This is a blessing and a curse as only being able to interrogate the system about currently open file handles can be limiting.

A great list of lsof examples can be found over at http://www.catonmat.net/blog/unix-utilities-lsof/.

strace

strace is a linux command for tracing system calls and signals. It works by running a specified command, intercepting and recording system calls and signals, and then reporting its findings when the command has finished.

The basic usage is this:

Now run strace on sample.sh:

That prints out – unfiltered – all system calls and signals that occured when running the ruby command. It's a bit more than what we care for.

strace does provide a way to get better signal to noise ratio. We can specify a qualifying expression using the -e option to indicate we only want to see open system calls:

Your output will now look a lot more reasonable and possibly a lot like this:

This is pretty close to what we did when looking to see what truststore file was being opened by our process, with one major caveat. By default strace won't enabling tracing for child processes.

Let's run the above command again and cause our ruby process to fork before reading /etc/passwd:

This time you'll notice that we're missing the last system open on /etc/passwd. We can fix this using the -f option which tells strace to trace child processes:

This will trace forks of forks of forks:

One last tip for using strace. It's the -o option which lets you specify an output file. This can come in handy when the process you start with strace ends up being started by other processes (like init.d daemons) and you don't have access to its STDOUT.

The following example logs the system calls to /tmp/files-opened.log:

strace is a great tool especially on linux systems since it seems to be installed on many distributions by default. It's not installed by default on OSX or other unix systems as far as I know although there appear to be packages that may be installed.

A good list of strace examples can be found at http://www.commandlinefu.com/commands/using/strace.

dtrace

dtrace, which stands for "dynamic tracing", is the bees knees in process and systems debugging. It gives you the ability to trace dynamic languages, compiled executables, libraries, system calls, kernel calls, and hardware calls – and does it dynamically.

There's a lot to dtrace and its power extends far beyond what lsof and strace can accomplish. There are a few books on dtrace (one of which I am currently working my way through).

Since the scope of dtrace is so vast I'm going to force myself to keep it super minimal. Oh, and you should know, dtrace is available on OSX 10.5 and up, FreeBSD, Oracle Solaris, and OpenSolaris. There is a port to linux but I haven't used it. And by default it requires superuser privileges to run.

Here's an example for finding which files are being opened on your system:

This will output all of the files that are being opened on your system by all processes. It's quite a bit of information! If only there was a way to narrow it down.

Well, you're in luck. With dtrace you can use predicates to narrow down the results collected. For example, start an irb session in one terminal, and then open a second terminal.

In that second terminal, let's find out the PID of the irb session:

Now insert that PID into the follwoing dtrace command and run it:

Leave dtrace running in your second terminal and navigate back to your first terminal. The one with irb running. In irb read a file like so:

Now go back to your dtrace terminal. You'll see something similar to this:

Any time your irb session opens a file dtrace will tell you about it. This is not limited to irb either. You can trace any open system calls for any process. It's just that in the above example we chose to focus one our irb session.

Let's revisit the problem described at the top of the post for just a second. We could have used dtrace and the above command(s) to help us identify the truststore file that was being used. Unfortunately, the server was RedHat Enterprise Linux which doesn't have dtrace. Had it been available it would have saved us time stopping our daemon, modifying the init script to run with strace, and then re-starting the daemon to collect output.

All in all dtrace is very powerful and hopefully this one example provides a little insight into how nice it can be.

Many useful one liners for dtrace have been shared by Brendan Gregg over at http://www.brendangregg.com/DTrace/dtrace_oneliners.txt.

There's a possible fourth... SystemTap

SystemTap is like dtrace, but for Linux. It's an opensource project backed by RedHat, IBM, Hitachi, and Oracle that is available as an RPM, but can also be installed on Ubuntu.

I was using RHEL6 and tried to get SystemTap to work. Sadly, it didn't happen. The necessary debuginfo kernel package that SystemTap needs isn't currenlty in the RHEL6 RPM/Yum repository.

There's an open bug with regard to this here and an open issue filed here.

Summary

It's not every day that you need to find out what low level system calls are ongoing. When you do need to know tools like lsof, strace, dtrace, and SystemTap can make the difference between pulling your hair out and pinpointing the issue. You may need to find out who's got your file in some of the strangest situations, like, in the case of SSL certificates and truststores.

Masthead image courtesy of Paramount Pictures