Observing Fine-Grained Changes in Jupyter Notebooks During Development Time

Sergey Titov, Konstantin Grotov, Cristina Sarasua, Yaroslav Golubev, Dhivyabharathi Ramasamy, Alberto Bacchelli, Abraham Bernstein, and Timofey Bryksin

June, 2026. Published in the Journal of Systems and Software.

Abstract. In software engineering research, the analysis of fine-grained logs led to significant innovations in areas such as refactoring, security, and code completion. However, even though computational notebooks are a staple of data science and an important tool in machine learning, few similar studies have been conducted in this area.

To help bridge this research gap, this paper makes three scientific contributions. (1) We introduce a toolset for collecting code changes in Jupyter notebooks during development time. (2) We use it to collect more than 100 h of work related to a data analysis task and a machine learning task (carried out by 20 developers with different levels of expertise), resulting in a dataset containing 2655 cells and 9207 cell executions. (3) Finally, we use this dataset to investigate the dynamic nature of the notebook development process and the changes that take place in the notebooks.

In our analysis of the collected data, we classified the changes made to the cells between executions and found that a significant number of these changes constituted code iteration modifications. We report a number of other insights and propose detailed future research directions on the novel data.

Paper Pre-print Data Tool