ReSplit: Improving the Structure of Jupyter Notebooks by Re-Splitting Their Cells

Sergey Titov, Yaroslav Golubev, and Timofey Bryksin

March, 2022. Published in the proceedings of SANER'22 (A).

Abstract. Jupyter notebooks represent a unique format for programming - a combination of code and Markdown with rich formatting, separated into individual cells. We propose to perceive a Jupyter Notebook cell as a simplified and raw version of a programming function. Similar to functions, Jupyter cells should strive to contain singular, self-contained actions. At the same time, research shows that real-world notebooks fail to do so and suffer from the lack of proper structure.

To combat this, we propose ReSplit, an algorithm for an automatic re-splitting of cells in Jupyter notebooks. The algorithm analyzes definition-usage chains in the notebook and consists of two parts - merging and splitting the cells. We ran the algorithm on a large corpus of notebooks to evaluate its performance and its overall effect on notebooks, and evaluated it by human experts: we showed them several notebooks in their original and the re-split form. In 29.5% of cases, the re-split notebook was selected as the preferred way of perceiving the code. We analyze what influenced this decision and describe several individual cases in detail.

Paper Pre-print Data