Homework Assignment #5 — Automated Program Repair (APR) with LLMs

In this assignment you will implement a simplified Automated Program Repair (APR) tool that uses an LLM to propose patches for real buggy Python projects. Your tool follows a classic generate-and-validate loop: (1) identify failing tests, (2) localize suspicious code, (3) generate a patch, and (4) validate the patch by re-running tests.

We will use a buggy project checkout (provided under buggy-projects/) and a harness script (buggy-projects/test.sh) that runs the project’s tests inside its own virtual environment and (optionally) produces coverage.json.

Background: Fault Localization and Ochiai

To decide where to patch, your tool uses Spectrum-Based Fault Localization (SBFL): you run tests, collect coverage, and assign a suspiciousness score to each executed line. We use the Ochiai suspiciousness metric. Ochiai is closely related to cosine similarity; you can read more here: Cosine similarity (Otsuka–Ochiai coefficient).

Coverage data is produced using coverage.py’s JSON report (coverage json).

Subject Program: `tqdm`

For this homework, the buggy program you will repair comes from tqdm, a widely used Python library that provides fast, lightweight progress bars for loops and iterables. In typical usage, programmers wrap an iterable with tqdm(...) (or use trange(...)) to display a live progress meter showing iteration counts, elapsed time, and estimated time remaining.

tqdm was chosen as the subject program for this assignment because it is a real, actively used open-source project, but it is not so large that automated repair becomes dominated by engineering overhead. At the same time, tqdm comes with a reasonably sized and fast test suite that provides many passing tests and good coverage, which is particularly important for spectrum-based fault localization.

In this assignment, the buggy version of tqdm is provided in buggy-projects/tqdm-bug-1/

This directory contains:

The tqdm source code (e.g., under tqdm/)
The associated test suite (under tqdm/tests/)
A bugsinpy_run_test.sh script that specifies which test(s) trigger the bug

Your APR tool should use bugsinpy_run_test.sh to determine the bug-triggering test(s) that should fail before the fix and pass after the fix.

Important: Do Not Overfit to `tqdm`

Although this homework uses tqdm as the subject program, your implementation should be general. You should not hard-code assumptions about specific files, function names, line numbers, or bugs in tqdm.

The grading tests may exercise your APR components (fault localization, localization, patching, and validation) on simplified or synthetic scenarios. Solutions that rely on special-case logic for this specific bug are unlikely to perform well.

Setup and Download

Download the starter bundle and extract it:

$ tar -xzf hw5.tar.gz
$ cd hw5

You should see:

starter_files/ — the code you will edit (APR loop, SBFL, localization, patching, prompts)
buggy-projects/ — the buggy project(s) and the provided test runner script

Python Version Requirement (pyenv)

The grading environment for this assignment expects Python 3.6.9. Because Ubuntu 22.04 ships with a newer Python, you must use pyenv to install Python 3.6.9 locally without breaking your system Python. pyenv is a standard tool for managing multiple Python versions.

On your Ubuntu 22.04 VM, install build dependencies (one-time):

$ sudo apt update
$ sudo apt install -y make build-essential libssl-dev zlib1g-dev \
    libbz2-dev libreadline-dev libsqlite3-dev curl git \
    libncursesw5-dev xz-utils tk-dev libxml2-dev libxmlsec1-dev libffi-dev liblzma-dev

Install pyenv (one-time). Follow the official pyenv instructions if you prefer. A typical approach is:

$ curl https://pyenv.run | bash

Then add pyenv to your shell startup (e.g., ~/.bashrc) and restart your shell:

export PATH="$HOME/.pyenv/bin:$PATH"
eval "$(pyenv init -)"
eval "$(pyenv virtualenv-init -)"

Finally, install and select Python 3.6.9 for this homework directory:

$ cd hw5
$ pyenv install 3.6.9
$ pyenv local 3.6.9
$ python --version
Python 3.6.9

Install Dependencies for the Starter Code

The student-facing code (under starter_files/) is tested with pytest. It is recommended that you create a venv using your pyenv Python:

$ cd starter_files
$ python -m venv venv
$ source venv/bin/activate
(venv) $ pip install --upgrade pip setuptools wheel
(venv) $ pip install pytest requests coverage==6.2

Run the public tests from inside the starter_files directory:

(venv) $ python -m pytest

Set Up the Buggy Project Environment

The buggy project(s) are under buggy-projects/. For each bug directory we provide a setup script that creates the project’s own virtual environment and installs its dependencies. For this homework, run:

$ cd buggy-projects
$ ./setup.sh

This creates buggy-projects/tqdm-bug-1/env/. The test runner script (buggy-projects/test.sh) will activate this environment automatically. You should read through the test runner script to get an idea of what it can do. Your python code will run this script in order to run the tests, so you'll need to understand its inputs and outputs. Next we'll test the test the buggy project to make sure everything is set up properly. Note that you are expecting to fail tests, but not to seg fault or anything here.

$ ./test.sh run-all tqdm-bug-1/
..FEs...sF..ssssssss.........................................................s..............
... a bunch of test output ...
2 failed, 78 passed, 11 skipped, 1 warning, 1 error in 19.06s
$ ./test.sh cov-one tqdm-bug-1/ --nodeid 'tqdm/tests/tests_contrib.py::test_enumerate'
Wrote JSON report to coverage.json
NODEID=tqdm/tests/tests_contrib.py::test_enumerate
EXC=TypeError: 'int' object is not subscriptable
RC=1
/home/ubuntu/hw5/buggy-projects/tqdm-bug-1/coverage.json

If this fails or if you are unable to run ./setup.sh without error, you may need to delete your the env/ directory created in tqdm-bug-1 and try going through the steps in setup.sh manually to debug.

What You Must Implement

You will implement key pieces of an APR pipeline:

Main APR loop + CLI: starter_files/llmapr.py
- Parse CLI args with argparse
- Extract bug-triggering tests from bugsinpy_run_test.sh
- Run a generate-and-validate loop that: runs SBFL, localizes a function, asks the LLM for a replacement snippet, applies it, and validates it
SBFL (Ochiai): starter_files/harness/sbfl.py
- Call buggy-projects/test.sh to collect nodeids and run coverage
- Read coverage.json and compute Ochiai suspiciousness scores
Localization: starter_files/harness/localize.py
- Map a (file, line) suspicious location to the enclosing function definition region
Patching + rollback + validation: starter_files/harness/patch.py
- Replace the localized function region with an LLM-produced snippet and create backups
- Validate patches by re-running bug-triggering tests via test.sh
LLM provider wrapper: starter_files/llm/provider.py
- Implement a uniform generate(...) function that takes chat-style messages and returns the model’s reply as a string
Prompt construction: starter_files/llm/prompts.py
- Build a system+user prompt that instructs the model to return only code for the replacement region

Running Your Tool

From starter_files/ with your venv active:

(venv) $ python llmapr.py \
    --project_dir ../buggy-projects/tqdm-bug-1 \
    --script_path ../buggy-projects/test.sh

Optional arguments include --selector, --model, --max_iters, --top_k, --max_tokens, and --temperature.

LLM Access and API Keys

Your repository includes starter_files/config.py which contains an api_key placeholder. Do not submit config.py. The autograder environment will provide its own configuration.

Submission

Submit a single tarball of your starter_files/ implementation, excluding config.py, the tests, and other unnecessary files.. Do NOT change the name of the starter_files/ folder, the autograder depends on that name. From inside hw5/:

$ tar \
  --exclude='*/env' \
  --exclude='*/venv' \
  --exclude='*/__pycache__' \
  --exclude='*.pyc' \
  --exclude='.DS_Store' \
  --exclude='starter_files/tests' \
  --exclude='starter_files/config.py' \
  -czf hw5-submission.tar.gz starter_files

Upload hw5-submission.tar.gz to the autograder.

Finally, upload a short pdf report to Brightspace that describes how you used Generative AI (or if you used it at all to develop this assignment). Additionally, your report should contain a description of the prompt(s) you used in prompts.py. Are there any particular prompting strategies you used? Did you use any prompting patterns?

Notes and Advice

Keep your patches minimal. Your prompt should ask for code-only output (no markdown fences, no explanations).
SBFL is approximate. If the top suspicious line isn’t the real bug, your localization/prompt should still allow the model to fix nearby logic.
Make sure your solution runs on Python 3.6.9.