BARC/PUB/2020/0071

 
 

Comparison of data storage and analysis throughput in the light of high energy physics experiment MACE

 
     
 
Author(s)

Sarkar, D.; Mahesh, P.; Padmini, S.; Chouhan, N.; Borwankar, C.; Bhattacharya, A. K.; Tickoo, A. K.; Rannot, R. C.
(ApSD;ED;RCnD)

Source

Astronomy and Computing, 2020. Vol. 33: Article no. 100409

ABSTRACT

High Energy Physics (HEP) Experiments produce large amounts of data. The data produced in these experiments are in the range of terabytes and petabytes. The explosion of data has posed a challenge in data capture, storage, data integrity, searching, querying, visualization and analysis. This has led to the development of domain-specific file formats like FITS, HDF5, analysis frameworks like ROOT, storage architectures like relational and NoSQL databases, and parallel and distributed data handling methodologies. In this paper, we investigate the read - write performance by comparing the HEP domain-specific framework ROOT and a NoSQL database Berkeley DB in the context of a gamma-ray Cerenkov experiment to meet the requirement of real-time data analysis. Major Atmospheric Cerenkov Experiment (MACE) is a 21 m gamma-ray telescope set up by BARC at HANLE, India. It will generate a few hundred gigabytes of data per observational night. Aiming at the real-time analysis of the data we have developed a dynamic reading mechanism by implementing a binary type provider for data retrieval from the Berkeley DB database. Data analysis queries were performed and compared both in ROOT files using ROOT query methods and in Berkeley DB using Language Integrated Queries (LINQ). Finally, a generic framework facilitating the online analysis of the data is proposed in this paper.

 
 
SIRD Digital E-Sangrahay