Homework Assignment #4 — Defect Detection

Note: I am trying to get an academic license of CodeSonar for Vanderbilt but cannot guarantee it. If we fail to get a license, HW4 will only include Facebook Infer (stay tuned!). Update (as of Mar 13, 2022): It turned out for HW4, we will only ask students to use Facebook Infer (the hw assigment description below is already updated accordingly).

In this assignment you will use a real-world open source static analysis tool to automatically detect potential defects. The goal of this howework is to learn what statistic analysis tools look like and how to use them especially on legacy code (e.g., how to interpret the output of such tools; understand false positives and false negatives in the output; use the output of the tools to find security issues, etc.).

The static analysis tool is Facebook's Infer, which focuses on memory errors, leaks, race conditions, and API issues. Infer is open source.

You may work with a partner for this assignment. If you do you must use the same partner for all sub-components of this assignment. Only one partner needs to submit the report on Brightspace, but if you both do, nothing fatal happens.

Installing, Compiling, Running and Analyzing Legacy Code

Warning: Infer Is Hard To Run
Many users report that Facebook's Infer tool does not run on the Windows Subsystem for Linux (WSL) or similar shortcuts for using Ubuntu- or Linux-like interfaces. I know that the setups mentioned in HW0 work out of the box, so complete HW0 if you run into issues. In addition, Headless Virtual Box configurations (instructions) are reported to work very well.

It is your responsibility to download, compile, run and analyze the subject program and associated tools (or use the precompiled one). Getting the code and tools to work in some manner is part of the assignment. You can post on the forum for help and compare notes bemoaning various architectures (e.g., windows vs. mac vs. linux, etc.). Ultimately, however, it is your responsibility to read the documentation for these programs and tools and use some elbow grease to make them work.

The lighttpd webserver

We will make use of the lighttpd webserver (pronounced "lighty"), version 1.4.17, as our primary subject program for this homework. A local mirror copy of lighttpd-1.4.17.tar.gz is available, but you can also get it from the original website. It is about 55,000 lines of code in about 90 files. While somewhat small for this class, some analysis tool licenses have LOC limits or scalability issues, so it was chosen as an indicative compromise.

While not as large or popular as apache, at various points lighttpd has been used by YouTube, xkcd and Wikimedia. Much like apache, old verisons of it have a number of known security vulnerabilities.

The Common Vulnerabilities and Exposures system is one approach for tracking security vulnerabilities. A CVE is basically a formal description, prepared by security experts, of a software bug that has security implications.

There are at least ten CVEs associated with lighttpd 1.4.17 tracked in various lists (such as cvedetails and mitre). For example, CVE-2014-2324 has the description "Multiple directory traversal vulnerabilities in (1) mod_evhost and (2) mod_simple_vhost in lighttpd before 1.4.35 allow remote attackers to read arbitrary files via a .. (dot dot) in the host name, related to request_check_hostname." You can dig into the information listed in, or linked from, a CVE (or just look at subsequent versions of the program where the bug is fixed!) to track down details. Continuing the above example, mod_evhost refers to source file mod_evhost.c, mod_simple_vhost refers to file mod_simple_vhost.c, and request_check_hostname is in file request.c. You will need such information when evaluating the whether or not the tools find these security bugs.

Facebook's Infer

The Infer tool is a static analyzer — it detects bugs in programs without running them. The primary website is fbinfer.com.

Unfortunately, some versions of Infer can be obnoxious to build and install, despite their handy installation guide. Also, many users report that Infer does not run on Windows Subsystem for Linux (WSL) or similar setups; either completing HW0 to spin up an Ubuntu 16 VM, or using a headless Virtual Box configuration (instructions) is recommended.

As a convenience only (see above about "your responsibility"), a pre-compiled, runs-for-me-but-no-promises-for-you (Ubuntu 16.04.2 LTS GNU/Linux 4.4.0-34-generic x86_64) version of Infer is available locally here (warning: 265 MB). The main binary can be found at infer-linux64-v0.13.0/infer/bin/infer. You can use either the pre-compiled one or compile it yourself for full credit.

Infer on lighttpd

Once you have Infer built or downloaded, applying it to lighttpd should be as simple as:

$ cd lighttpd-1.4.17 
$ sh configure
$ /path/to/infer/bin/infer run -- make 

That should produce output similar to (but everything is fine if you get very different numbers):

make[1]: Leaving directory '/home/weimer/src/lighttpd-1.4.17'
Found 88 source files to analyze in /home/weimer/src/lighttpd-1.4.17/infer-out
Starting analysis...

legend:
  "F" analyzing a file
  "." analyzing a procedure

FFFFFFFFFF.....F...FF....F..FF.F..F....................................................................................FF.................................................F...........F..................F..................F...........................................................................F....................................................................F........................................................F.......F.................F...............F.......FF.............F...................F.............F.........F...F.................F...................................F............FF.F.....F.......................F.....FF..............F..F........FF..........FF.............FF.......FF.F....F......F......FFF..............F.........F...F......F...........F.......FF..........F.F...........F...F..F.......F..F...F........................F..F.........F....F........F.....F..F..........F............F....F...................F................................................................................................................................................

Found 308 issues

src/joblist.c:19: error: NULL_DEREFERENCE
  pointer `srv->joblist->ptr` last assigned on line 16 could be null and is dereferenced at line 19, column 2.
  17.           }
  18.
  19. >         srv->joblist->ptr[srv->joblist->used++] = con;
  20.
  21.           return 0;

	...

Summary of the reports

      NULL_DEREFERENCE: 145
            DEAD_STORE: 94
           MEMORY_LEAK: 65
         RESOURCE_LEAK: 3
  QUANDARY_TAINT_ERROR: 1

You will have to read through the output carefully and analyze the reported defects. Some will be true positives (i.e., real bugs in the code) and some will be false positives (i.e., spurious warnings that do not correspond to real bugs).

Infer on jfreechart

Running Infer on jfreechart-1.5.0 is similarly direct.

$ cd jfreechart-1.5.0 
$ /path/to/infer/bin/infer run -- mvn compile
Capturing in maven mode...
[INFO] Scanning for projects...
[INFO]
[INFO] ------------------------------------------------------------------------
[INFO] Building JFreeChart 1.5.0

	...

Found 640 source files to analyze in /home/weimer/src/jfreechart-1.5.0/infer-out
Starting analysis...

	...

Found 69 issues

src/main/java/org/jfree/data/xml/DatasetReader.java:73: error: RESOURCE_LEAK
  resource of type `java.io.FileInputStream` acquired to `in` by call to `FileInputStream(...)` at line 72 is not released after line 73.
  71.           throws IOException {
  72.           InputStream in = new FileInputStream(file);
  73. >         return readPieDatasetFromXML(in);
  74.       }

...

Summary of the reports

  THREAD_SAFETY_VIOLATION: 43
         NULL_DEREFERENCE: 22
            RESOURCE_LEAK: 4

While times will vary, some students have reported that running Infer on jfreechart can take five hours.

FAQ and Troubleshooting

In this section we detail previous student issues and resolutions:

  1. Question: When I run infer.exe run -- make or infer run -- mvn compile I get errors like InferModules__SqliteUtils.Error or Maven command failed.

    Answer: The most common issue is that Infer does not always run well on Windows Subsystem for Linux (WSL) or similar shortcuts to get a Linux- or Ubuntu-like interface on another OS. We strongly recommend a headless Virtual Box setup (instructions).

  2. Question: When I try to run Infer, I get cannot execute binary file: Exec format error..

    Answer: One student reports: "Finally got it. Turns out I was using a 32 bit processor (i386) so even when I set up my vm as 64 bit, it couldn’t run any x86-64 binaries. Fixed it by installing a 64 bit vdi. https://appuals.com/fix-cannot-execute-binary-file-exec-format-error-ubuntu/

  3. Question: I see Maven command failed: *** mvn compile -P infer-capture when I try to run Infer.

    Answer: Some students have seen success with:

    sudo apt-get install cobertura maven
    sudo apt-get install openjdk-8-jdk
    
    Others reported that "I ended up having to setup an Ubuntu 16.04 VM in VirtualBox".

Written Report

You must write a detailed PDF report reflecting on your experiences with the static analysis defect detection tool (i.e., infer). In particular, all of the following are required:

The grading staff will select a small number of excerpts from particularly high-quality or instructive reports and share them with the class. If your report is selected you will receive extra credit.

Submission

Submit a single PDF report via Brightspace. You must include your name and Vandy Net ID (as well as your partner's name and email id, if applicable).

There is no explicit format (e.g., for headings or citations) required. For example, you may either use an essay structure or a point-by-point list of question answers.