In this assignment you will implement a simplified Automated Program Repair (APR) tool that uses an LLM to propose patches for real buggy Python projects. Your tool follows a classic generate-and-validate loop: (1) identify failing tests, (2) localize suspicious code, (3) generate a patch, and (4) validate the patch by re-running tests.
We will use a buggy project checkout (provided under buggy-projects/) and a harness script (buggy-projects/test.sh) that runs the project’s tests inside its own virtual environment and (optionally) produces coverage.json.
To decide where to patch, your tool uses Spectrum-Based Fault Localization (SBFL): you run tests, collect coverage, and assign a suspiciousness score to each executed line. We use the Ochiai suspiciousness metric. Ochiai is closely related to cosine similarity; you can read more here: Cosine similarity (Otsuka–Ochiai coefficient).
Coverage data is produced using coverage.py’s JSON report (coverage json).
For this homework, the buggy program you will repair comes from tqdm, a widely used Python library that provides fast, lightweight progress bars for loops and iterables. In typical usage, programmers wrap an iterable with tqdm(...) (or use trange(...)) to display a live progress meter showing iteration counts, elapsed time, and estimated time remaining.
tqdm was chosen as the subject program for this assignment because it is a real, actively used open-source project, but it is not so large that automated repair becomes dominated by engineering overhead. At the same time, tqdm comes with a reasonably sized and fast test suite that provides many passing tests and good coverage, which is particularly important for spectrum-based fault localization.
In this assignment, the buggy version of tqdm is provided in buggy-projects/tqdm-bug-1/
This directory contains:
Your APR tool should use bugsinpy_run_test.sh to determine the bug-triggering test(s) that should fail before the fix and pass after the fix.
Although this homework uses tqdm as the subject program, your implementation should be general. You should not hard-code assumptions about specific files, function names, line numbers, or bugs in tqdm.
The grading tests may exercise your APR components (fault localization, localization, patching, and validation) on simplified or synthetic scenarios. Solutions that rely on special-case logic for this specific bug are unlikely to perform well.
Download the starter bundle and extract it:
$ tar -xzf hw5.tar.gz $ cd hw5
You should see:
The grading environment for this assignment expects Python 3.6.9. Because Ubuntu 22.04 ships with a newer Python, you must use pyenv to install Python 3.6.9 locally without breaking your system Python. pyenv is a standard tool for managing multiple Python versions.
On your Ubuntu 22.04 VM, install build dependencies (one-time):
$ sudo apt update
$ sudo apt install -y make build-essential libssl-dev zlib1g-dev \
libbz2-dev libreadline-dev libsqlite3-dev curl git \
libncursesw5-dev xz-utils tk-dev libxml2-dev libxmlsec1-dev libffi-dev liblzma-dev
Install pyenv (one-time). Follow the official pyenv instructions if you prefer. A typical approach is:
$ curl https://pyenv.run | bash
Then add pyenv to your shell startup (e.g., ~/.bashrc) and restart your shell:
export PATH="$HOME/.pyenv/bin:$PATH" eval "$(pyenv init -)" eval "$(pyenv virtualenv-init -)"
Finally, install and select Python 3.6.9 for this homework directory:
$ cd hw5 $ pyenv install 3.6.9 $ pyenv local 3.6.9 $ python --version Python 3.6.9
The student-facing code (under starter_files/) is tested with pytest. It is recommended that you create a venv using your pyenv Python:
$ cd starter_files $ python -m venv venv $ source venv/bin/activate (venv) $ pip install --upgrade pip setuptools wheel (venv) $ pip install pytest requests coverage==6.2
Run the public tests from inside the starter_files directory:
(venv) $ python -m pytest
The buggy project(s) are under buggy-projects/. For each bug directory we provide a setup script that creates the project’s own virtual environment and installs its dependencies. For this homework, run:
$ cd buggy-projects $ ./setup.sh
This creates buggy-projects/tqdm-bug-1/env/. The test runner script (buggy-projects/test.sh) will activate this environment automatically. You should read through the test runner script to get an idea of what it can do. Your python code will run this script in order to run the tests, so you'll need to understand its inputs and outputs. Next we'll test the test the buggy project to make sure everything is set up properly. Note that you are expecting to fail tests, but not to seg fault or anything here.
$ ./test.sh run-all tqdm-bug-1/ ..FEs...sF..ssssssss.........................................................s.............. ... a bunch of test output ... 2 failed, 78 passed, 11 skipped, 1 warning, 1 error in 19.06s $ ./test.sh cov-one tqdm-bug-1/ --nodeid 'tqdm/tests/tests_contrib.py::test_enumerate' Wrote JSON report to coverage.json NODEID=tqdm/tests/tests_contrib.py::test_enumerate EXC=TypeError: 'int' object is not subscriptable RC=1 /home/ubuntu/hw5/buggy-projects/tqdm-bug-1/coverage.json
If this fails or if you are unable to run ./setup.sh without error, you may need to delete your the env/ directory created in tqdm-bug-1 and try going through the steps in setup.sh manually to debug.
You will implement key pieces of an APR pipeline:
From starter_files/ with your venv active:
(venv) $ python llmapr.py \
--project_dir ../buggy-projects/tqdm-bug-1 \
--script_path ../buggy-projects/test.sh
Optional arguments include --selector, --model, --max_iters, --top_k, --max_tokens, and --temperature.
Your repository includes starter_files/config.py which contains an api_key placeholder. Do not submit config.py. The autograder environment will provide its own configuration.
Submit a single tarball of your starter_files/ implementation, excluding config.py, the tests, and other unnecessary files.. Do NOT change the name of the starter_files/ folder, the autograder depends on that name. From inside hw5/:
$ tar \ --exclude='*/env' \ --exclude='*/venv' \ --exclude='*/__pycache__' \ --exclude='*.pyc' \ --exclude='.DS_Store' \ --exclude='starter_files/tests' \ --exclude='starter_files/config.py' \ -czf hw5-submission.tar.gz starter_files
Upload hw5-submission.tar.gz to the autograder.
Finally, upload a short pdf report to Brightspace that describes how you used Generative AI (or if you used it at all to develop this assignment). Additionally, your report should contain a description of the prompt(s) you used in prompts.py. Are there any particular prompting strategies you used? Did you use any prompting patterns?