Observing Fine-Grained Changes in Jupyter Notebooks During Development Time
Sergey Titov, Konstantin Grotov, Cristina Sarasua, Yaroslav Golubev, Dhivyabharathi Ramasamy, Alberto Bacchelli, Abraham Bernstein, and Timofey Bryksin
July, 2025. Published on arXiv.
Abstract. In software engineering, numerous studies have focused on the analysis of fine-grained logs, leading to significant innovations in areas such as refactoring, security, and code completion. However, no similar studies have been conducted for computational notebooks in the context of data science.
To help bridge this research gap, we make three scientific contributions: we (1) introduce a toolset for collecting code changes in Jupyter notebooks during development time; (2) use it to collect more than 100 hours of work related to a data analysis task and a machine learning task (carried out by 20 developers with different levels of expertise), resulting in a dataset containing 2,655 cells and 9,207 cell executions; and (3) use this dataset to investigate the dynamic nature of the notebook development process and the changes that take place in the notebooks.
In our analysis of the collected data, we classified the changes made to the cells between executions and found that a significant number of these changes were relatively small fixes and code iteration modifications. This suggests that notebooks are used not only as a development and exploration tool but also as a debugging tool. We report a number of other insights and propose potential future research directions on the novel data.
Pre-print Data Tool