Guest Speaker: Prof. Vincent Hellendoorn
Time: Thursday, April 6, 2023. 1:15pm - 2:30pm Central.
Location: Zoom link is posted on Piazza.
For CS4278/5278 students (only apply to students in Dr. Yu Huang's session), you are required to change your Zoom username to "$VUID-$NAME-CS4278" (e.g., huany47-Yu Huang-CS4278).
In the past few years, advances in AI have powered a surge in new tools like GitHub's Copilot that promise to abstract away many of the tedious parts of software development, such as fixing syntax errors, using complex APIs, and writing boilerplate code. Powering these tools are large, neural language models trained on vast volumes of code and text, which can translate natural language instructions to software with high fidelity. As these models continue to improve at a remarkable rate, the practice of software engineering is changing as well. Tools powered by these models promise to both allow non-programmers to program using just natural language instructions, and to significantly boost the productivity of experienced programmers. However, they also create new risks, ranging from generating code with subtle vulnerabilities to uprooting entire programming ecosystems. In this talk, I first describe the evolution of AI methods that has led to this point, explaining how these models work by discussing how we trained PolyCoder, the first multi-lingual open-source LLM of code. I then review the use-cases, strengths, and weaknesses of tools powered by these models by discussing their current capabilities in relation to actual software developers. Along the way, I identify innovations that seem poised to unlock further improvements and new applications in the near future and relate these to the ways software engineering may change in the coming years.
Prof. Hellendoorn’s research concerns all topics at the intersection of machine intelligence and software engineering research. His work leverages AI to power novel tools that support the many facets of the software development process, and identifies and addresses shortcomings that impact model success in practice. His work has been published at major conferences in both the AI and SE fields including ICSE, FSE, ASE, ICLR, and NeurIPS. He has worked as a visiting researcher at Google Brain and GitHub Next, and as a research consultant for Microsoft Research. He is currently an assistant professor at Carnegie Mellon University.