Evaluation
We evaluate our approach and tool with a simulation experiment and a user study.
Simulation Experiment
In the simulation experiment, we simulate human feedbacks with mutation technique. More specifically, we mutate the code for test cases in Apache projects (namely, Math, Lang, and CLI) so we can have a correct trace before mutation and a buggy trace after mutation. Therefore, we can know the correct feedback on each step on the buggy trace by referencing the correct trace.
We do this by matching the original trace and mutated trace. We first transfer each trace into a contextual tree in which the method invocation/loop head trace step is the parent of its follow-up steps in the invocation/loop scope. Then we match the steps in a top-down manner. That is, we match the root step first, then their direct children next. For two matched parent steps, their direct children will be matched by a dynamic algorithm (Longest Common Subsequence algorithm). The code for trace matching can be checked at https://github.com/llmhyy/microbat. A more detailed technique report can be checked at our technique report.
The results shows that Microbat can find 92.8% mutated bugs and 65% bugs can be found within 20 feedbacks. The detailed statistics can be checked here. In addition, all the mutation files can be checked here.
User Study
We recruited 16 participants to conduct user study on Microbat. We survey the programming experince of the participants (see our survey form) and devide them into two equivalent groups. Our results show that the participants using Microbat can complete the debugging tasks with 55.8% less time on average.
The detailed statistics can be checked here