Optimizing Duplicate Size Thresholds in IDEs

Konstantin Grotov, Sergey Titov, Alexandr Suhinin, Yaroslav Golubev, and Timofey Bryksin

May, 2023. Accepted to MSR'23 (A).

Abstract. In this paper, we present an approach for transferring an optimal lower size threshold for clone detection from one language to another by analyzing their clone distributions. We showcase this method by transferring the threshold from regular Python scripts to Jupyter notebooks for using in two JetBrains IDEs, Datalore and DataSpell.

Pre-print Data