Vector Abstraction and Concretization for Scalable Detection of Refactorings (A Technical Report)
No Thumbnail Available
Date
2014-03-28
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Automated techniques have been proposed to either identify
refactoring opportunities (i.e., code fragments that can, but
have not yet been restructured in a program), or reconstruct
historical refactoring (i.e., code restructuring operations that
have happened between di erent versions of a program).
However, it remains challenging to apply those techniques to
large code bases containing millions of lines of code involving
many versions. In this paper, we propose a new scalable
technique that can be used for both identifying refactoring
opportunities and historical refactoring. The key of our technique
is the design of vector abstraction and concretization
operations that can capture the essential patterns of the code
changes induced by various refactoring operations in the form
of characteristic vectors. Thus, the problem of identifying
refactorings can be reduced to the problem of identifying
matching vectors, which can be solved e ciently. We have
implemented our technique for Java. We have applied the
prototype to 200 bundle projects from the Eclipse ecosystem
containing 4.5 million lines of code, and reports in total more
than 32K instances of 17 types refactoring opportunities for
all Eclipse projects, taking 25 minutes on average for each
type. We have also applied the prototype to 14 versions
of 3 smaller programs (JMeter, Ant, XML-Security), and
detected (1) more than 2.8K refactoring opportunities within
individual versions with an accuracy of about 87%, and
(2) more than 190 historical refactorings across consecutive
versions of the programs with an accuracy of about 92%.