Public Datasets

Two MicroBooNE datasets are opened to the public. They contain simulated neutrino interactions, overlaid on top of cosmic ray data. Both simulate neutrinos in the Booster Neutrino Beam (BNB). The first sample includes all types of neutrinos and interactions (taking place in the whole cryostat volume), with relative abundance matching our nominal flux and cross section models. The second sample is restricted to charged-current electron neutrino interactions within the argon active volume of the time projection chamber.

Samples are provided in two different formats: HDF5, targeting the broadest audience, and artroot, targeting users that are familiar with the software infrastructure of Fermilab neutrino experiments and more in general of HEP experiments. The HDF5 files and a file with the list of xrootd urls providing access to the artroot files are stored on the open data portal Zenodo, and can be accessed from the DOI links in the table below. Artroot files contain the full information available to members of the collaboration, while HDF5 files have a reduced and simplified content. Each HDF5 sample is provided in two versions: with and without wire information. The reason is that, when present, the wire information largely dominated the file size. A second set of datasets is therefore created without the wire information, thus allowing storage of a significantly larger number of events for applications that do not use the wire information (where events are defined as independent detector read outs).

Sample DOI HDF5 artroot
N events N files size N events N files size
Inclusive, NoWire 10.5281/zenodo.8370883 753,467 18 195 GB 1,046,139 24436 6.4 TB
Inclusive, WithWire 10.5281/zenodo.7262009 24,332 18 44 GB 24,332 720 136 GB
Electron neutrino, NoWire 10.5281/zenodo.7261921 89,339 20 31 GB 89,339 2151 761 GB
Electron neutrino, WithWire 10.5281/zenodo.7262140 19,940 20 39 GB 19,940 540 170 GB

Detailed documentation for accessing the datasets is provided at https://github.com/uboone/OpenSamples.

Samples are released under CC-by license, allowing users to freely reuse the data with the requirement of giving appropriate credit to the collaboration for providing the datasets.

Suggested text for acknowledgment is the following:
We acknowledge the MicroBooNE Collaboration for making publicly available the data sets [data set DOIs] employed in this work. These data sets consist of simulated neutrino interactions from the Booster Neutrino Beamline overlaid on top of cosmic data collected with the MicroBooNE detector [2017 JINST 12 P02017].

In addition, although not enforced by the license, we request that software products resulting from the usage of the datasets are also made publicly available.

Publications featuring the MicroBooNE Public Datasets

Peer-reviewed Publications

  1. A Case Study of Data Management Challenges Presented in Large-Scale Machine Learning Workflows
    Lee CS (Northwestern U.), Hewes V (Cincinnati U.), Cerati G (Fermilab), J. Kowalkowski (Fermilab), Aurisano A  (Cincinnati U.), Agrawal A (Northwestern U.), Choudhary A (Northwestern U.) and Liao W-K (Northwestern U.)
    DOI: 10.1109/CCGrid57682.2023.00017
    Published in: 2023 IEEE/ACM 23rd International Symposium on Cluster, Cloud and Internet Computing (CCGrid), Bangalore, India, 2023, pp. 71-81
  2. MicroBooNE Public Data Sets: A Collaborative Tool for LArTPC Software Development
    MicroBooNE Collaboration • Giuseppe Cerati (Fermilab) for the collaboration.
    e-Print: 2309.15362 [hep-ex]
    DOI: 10.1051/epjconf/202429508012
    Published in: EPJ Web Conf. 295 (2024), 08012
  3. Graph neural network for neutrino physics event reconstruction
    A. Aurisano (Cincinnati U.), V. Hewes (Cincinnati U.), G. Cerati (Fermilab), J. Kowalkowski (Fermilab), C.S. Lee (Northwestern U.), W. Liao (Northwestern U.), D. Grzenda (Chicago U.), K. Gumpula (Chicago U.), X. Zhang (UCLA and Chicago U.)
    e-Print: 2403.11872 [physics.data-an]
    DOI: 10.1103/PhysRevD.110.032008 (publication)
    Published in: Phys.Rev.D 110 (2024) 3, 3
  4. Addressing GPU memory limitations for Graph Neural Networks in High-Energy Physics applications
    Lee CS (Northwestern U.), Hewes V (Cincinnati U.), Cerati G (Fermilab), Wang K (Northwestern U.), Aurisano A  (Cincinnati U.), Agrawal A (Northwestern U.), Choudhary A (Northwestern U.) and Liao W-K (Northwestern U.)
    DOI: 10.3389/fhpcp.2024.1458674 (publication)
    Published in: Front. High Perform. Comput. 2:1458674 (2024)
  5. LArTPC hit-based topology classification with quantum machine learning and symmetry
    Callum Duffy (University Coll. London), Marcin Jastrzebski (University Coll. London), Stefano Vergani (University Coll. London), Leigh H. Whitehead (Cambridge U.), Ryan Cross (Warwick U.), Andrew Blake (Lancaster U.), Sarah Malik (University Coll. London), John Marshall (Warwick U.)
    e-Print: 2503.12655 [physics.ins-det]
  6. Optimal Transport for $e/π^0$ Particle Classification in LArTPC Neutrino Experiments
    David Caratelli (UCSB), Nathaniel Craig (UCSB, KITP), Chuyue Fang (UCSB), Jessica N. Howard (UCSB, KITP)
    e-Print: 2506.09238 [hep-ex]
  7. NuGraph2 with Context-Aware Inputs: Physics-Inspired Improvements in Semantic Segmentation
    Vitor F. Grizzi (Illinois U., Urbana and Argonne), Margaret Voetberg (Fermilab), Giuseppe Cerati (Fermilab), Hadi Meidani (Illinois U., Urbana), V. Hewes (Cincinnati U.)
    e-Print: 2509.10684 [hep-ex]
  8. NuGraph2 with Explainability: Post-hoc Explanations for Geometric Neural Network Predictions
    Margaret Voetberg (Fermilab), Vitor F. Grizzi (Illinois U., Urbana and Argonne), Giuseppe Cerati (Fermilab), Hadi Meidani (Illinois U., Urbana), V. Hewes (Cincinnati U.)
    e-Print: 2509.10676 [hep-ex]
  9. Real-time Anomaly Detection for Liquid Argon Time Projection Chambers
    Seokju Chung (Columbia U.), Jack Cleeve (Columbia U.), Akshay Malige (Columbia U.), Georgia Karagiorgi (Columbia U.), Lino Gerlach (Princeton U.), Adrian A. Pol (Princeton U.), Isobel Ojalvo (Princeton U.)
    e-Print: 2509.21817 [hep-ex]

Other Conference Presentations

  1. Online tagging and triggering with deep learning AI for next generation particle imaging detector, M. Bhattacharya (Fermilab), CHEP 2023.
  2. SparsePixels: Efficient Convolution for Sparse Data on FPGAs, H. Tsoi (University of Pennsylvania), Fast Machine Learning for Science Conference 2025.