Two MicroBooNE datasets are opened to the public. They contain simulated neutrino interactions, overlaid on top of cosmic ray data. Both simulate neutrinos in the Booster Neutrino Beam (BNB). The first sample includes all types of neutrinos and interactions (taking place in the whole cryostat volume), with relative abundance matching our nominal flux and cross section models. The second sample is restricted to charged-current electron neutrino interactions within the argon active volume of the time projection chamber.
Samples are provided in two different formats: HDF5, targeting the broadest audience, and artroot, targeting users that are familiar with the software infrastructure of Fermilab neutrino experiments and more in general of HEP experiments. The HDF5 files and a file with the list of xrootd urls providing access to the artroot files are stored on the open data portal Zenodo, and can be accessed from the DOI links in the table below. Artroot files contain the full information available to members of the collaboration, while HDF5 files have a reduced and simplified content. Each HDF5 sample is provided in two versions: with and without wire information. The reason is that, when present, the wire information largely dominated the file size. A second set of datasets is therefore created without the wire information, thus allowing storage of a significantly larger number of events for applications that do not use the wire information (where events are defined as independent detector read outs).
Sample | DOI | HDF5 | artroot | ||||
N events | N files | size | N events | N files | size | ||
Inclusive, NoWire | 10.5281/zenodo.8370883 | 753,467 | 18 | 195 GB | 1,046,139 | 24436 | 6.4 TB |
Inclusive, WithWire | 10.5281/zenodo.7262009 | 24,332 | 18 | 44 GB | 24,332 | 720 | 136 GB |
Electron neutrino, NoWire | 10.5281/zenodo.7261921 | 89,339 | 20 | 31 GB | 89,339 | 2151 | 761 GB |
Electron neutrino, WithWire | 10.5281/zenodo.7262140 | 19,940 | 20 | 39 GB | 19,940 | 540 | 170 GB |
Detailed documentation for accessing the datasets is provided at https://github.com/uboone/OpenSamples.
Samples are released under CC-by license, allowing users to freely reuse the data with the requirement of giving appropriate credit to the collaboration for providing the datasets.
Suggested text for acknowledgment is the following:
We acknowledge the MicroBooNE Collaboration for making publicly available the data sets [data set DOIs] employed in this work. These data sets consist of simulated neutrino interactions from the Booster Neutrino Beamline overlaid on top of cosmic data collected with the MicroBooNE detector [2017 JINST 12 P02017].
In addition, although not enforced by the license, we request that software products resulting from the usage of the datasets are also made publicly available.
Publications featuring the MicroBooNE Public Datasets
Peer-reviewed Publications
- A Case Study of Data Management Challenges Presented in Large-Scale Machine Learning Workflows
Lee CS (Northwestern U.), Hewes V (Cincinnati U.), Cerati G (Fermilab), J. Kowalkowski (Fermilab), Aurisano A (Cincinnati U.), Agrawal A (Northwestern U.), Choudhary A (Northwestern U.) and Liao W-K (Northwestern U.)
DOI: 10.1109/CCGrid57682.2023.00017
Published in: 2023 IEEE/ACM 23rd International Symposium on Cluster, Cloud and Internet Computing (CCGrid), Bangalore, India, 2023, pp. 71-81 - MicroBooNE Public Data Sets: A Collaborative Tool for LArTPC Software Development
MicroBooNE Collaboration • Giuseppe Cerati (Fermilab) for the collaboration.
e-Print: 2309.15362 [hep-ex]
DOI: 10.1051/epjconf/202429508012
Published in: EPJ Web Conf. 295 (2024), 08012 - Graph neural network for neutrino physics event reconstruction
A. Aurisano (Cincinnati U.), V. Hewes (Cincinnati U.), G. Cerati (Fermilab), J. Kowalkowski (Fermilab), C.S. Lee (Northwestern U.), W. Liao (Northwestern U.), D. Grzenda (Chicago U.), K. Gumpula (Chicago U.), X. Zhang (UCLA and Chicago U.)
e-Print: 2403.11872 [physics.data-an]
DOI: 10.1103/PhysRevD.110.032008 (publication)
Published in: Phys.Rev.D 110 (2024) 3, 3 - Addressing GPU memory limitations for Graph Neural Networks in High-Energy Physics applications
Lee CS (Northwestern U.), Hewes V (Cincinnati U.), Cerati G (Fermilab), Wang K (Northwestern U.), Aurisano A (Cincinnati U.), Agrawal A (Northwestern U.), Choudhary A (Northwestern U.) and Liao W-K (Northwestern U.)
DOI: 10.3389/fhpcp.2024.1458674 (publication)
Published in: Front. High Perform. Comput. 2:1458674 (2024) - LArTPC hit-based topology classification with quantum machine learning and symmetry
Callum Duffy (University Coll. London), Marcin Jastrzebski (University Coll. London), Stefano Vergani (University Coll. London), Leigh H. Whitehead (Cambridge U.), Ryan Cross (Warwick U.), Andrew Blake (Lancaster U.), Sarah Malik (University Coll. London), John Marshall (Warwick U.)
e-Print: 2503.12655 [physics.ins-det] - Optimal Transport for $e/π^0$ Particle Classification in LArTPC Neutrino Experiments
David Caratelli (UCSB), Nathaniel Craig (UCSB, KITP), Chuyue Fang (UCSB), Jessica N. Howard (UCSB, KITP)
e-Print: 2506.09238 [hep-ex] - NuGraph2 with Context-Aware Inputs: Physics-Inspired Improvements in Semantic Segmentation
Vitor F. Grizzi (Illinois U., Urbana and Argonne), Margaret Voetberg (Fermilab), Giuseppe Cerati (Fermilab), Hadi Meidani (Illinois U., Urbana), V. Hewes (Cincinnati U.)
e-Print: 2509.10684 [hep-ex] - NuGraph2 with Explainability: Post-hoc Explanations for Geometric Neural Network Predictions
Margaret Voetberg (Fermilab), Vitor F. Grizzi (Illinois U., Urbana and Argonne), Giuseppe Cerati (Fermilab), Hadi Meidani (Illinois U., Urbana), V. Hewes (Cincinnati U.)
e-Print: 2509.10676 [hep-ex] - Real-time Anomaly Detection for Liquid Argon Time Projection Chambers
Seokju Chung (Columbia U.), Jack Cleeve (Columbia U.), Akshay Malige (Columbia U.), Georgia Karagiorgi (Columbia U.), Lino Gerlach (Princeton U.), Adrian A. Pol (Princeton U.), Isobel Ojalvo (Princeton U.)
e-Print: 2509.21817 [hep-ex]
Other Conference Presentations
- Online tagging and triggering with deep learning AI for next generation particle imaging detector, M. Bhattacharya (Fermilab), CHEP 2023.
- SparsePixels: Efficient Convolution for Sparse Data on FPGAs, H. Tsoi (University of Pennsylvania), Fast Machine Learning for Science Conference 2025.