Anomaly Detection with Machine Learning in Time Series Data from the Fermi Anti-Coincidence Detector (Berlin

Abstract

Multimessenger astrophysics relies on multiple observational data channels, necessitating efficient methods for analyzing events of astrophysical origin. With the continuous increase in both volume and complexity of data from modern observatories, advanced Machine Learning techniques have become very useful for identifying and classifying signals effectively.

The aim of my project is to develop a framework to analyze time series data using Machine Learning techniques. The scientific use case that will be presented involves the analysis of the temporal data from the Anti-Coincidence Detector (ACD) onboard the Fermi Gamma-ray Space Telescope. The primary objective is to enhance the detection of high-energy transient events, such as Gamma-Ray Bursts (GRBs), and other astrophysical signals. An ensamble of Neural Networks models may be employed to model and predict the temporal structure of the ACD background data. The network's predictions are used as a baseline for implementing a triggering algorithm, designed for anomaly detection. By identifying significant deviations from the predicted background, the system effectively flags potential astrophysical transients in the ACD time series data.

In addressing challenges such as noise variability in the data, this work explores advanced approaches to refine anomaly detection thresholds, by characterizing the noise amplitude. Bayesian Neural Networks (BNNs) can be used to output uncertainties on the background prediction to dynamically adapt thresholds, offering a robust alternative to traditional fixed-threshold methods.

The application of this framework to transients detection demonstrates its applicability across various datasets and observatories in multimessenger astrophysics.

1. Motivation: What is The Anti-Coincidence Detector?

This work presents a versatile and flexible Machine Learning-based software designed to address two challenges in the analysis of time series data:

modeling the temporal behaviour within time series datasets to estimate the evolution of time series;
identifying anomalies within those time series.

The Anti-Coincidence Detector (ACD) is a system on board of Fermi used to reject background signals such as charged-particles from the Large Area Telescope signal.

The ACD is composed of 89 plastic scintillator tiles distributed in the five faces of the ACD. These tiles detect particles and photons and the signal in each face looks like this:

2. Dataset

The dataset used for the training and testing of the software is composed of the time series of the five signals in the ACD and a set of input features:

the parameters in the weekly Spacecraft files (FT2) from the Fermi Collaboration;

the Solar Activity retrieved from the Geostationary Operational Environmental Satellite (GOES) X-Ray Sensor (XRS);

the signals of the five faces in the ACD.

3. Background Prediction with Bayesian Neural Networks

The software was developed starting from the data of the ACD as a playground.

The software implements a multioutput Bayesian Neural Network (BNN) to model the temporal behaviour of the time series. Unlike traditional feed-forward neural networks, the BNN not only provides point estimates but also quantifies the uncertainty associated with each prediction. In the context of the ACD, the training of the Bayesian Neural Network is performed on a dataset where the ACD signals serve as the labeled outputs and a set of input parameters characterize each sample.

The BNN is trained using a likelihood-based loss function derived from the Gaussian Negative Log Likelihood (NLL). This loss function is particularly suitable when the data is assumed to be Gaussian, as it penalizes deviations based on both the error in the prediction and the predicted uncertainty.

The Gaussian NLL is defined as:

\[ \mathcal{L}_{\textit{NLL}} = \frac{1}{N}\sum_{i=1}^N \left( \frac{1}{2}\log(\sigma_i^2) + \frac{(y_i - \mu_i)^2}{2\sigma_i^2} \right), \]

where \(N\) is the number of samples, \(y_i\) is the true value (ground truth) of the signal for the \(i\)-th sample, \(\mu_i\) is the predicted mean, and \(\sigma_i^2\) is the predicted variance, representing the uncertainty in the prediction. For a multioutput BNN with an estimator function \(f_j\) for the \(j\)-th output, the loss can be written as:

\[ \mathcal{L}_{\textit{NLL},j} = \frac{1}{N}\sum_{i=1}^N \left( \frac{1}{2}\log(\sigma_{j,i}^2) + \frac{(y_{j,i} - f_j(\mathbf{x}_i))^2}{2\sigma_{j,i}^2} \right), \]

where \(\mathbf{x}_i\) is the input feature vector for sample \(i\), \(j\) indexes the \(M\) outputs (the 5 signals in the ACD), and \(f_j(\mathbf{x}_i)\) denotes the estimated mean for the \(j\)-th output, with an associated uncertainty \(\sigma_{j,i}^2\). The overall loss is then computed by averaging over all outputs and samples:

\[ \mathcal{L}_{\textit{NLL}}=\frac{1}{M}\sum_{j=1}^M \mathcal{L}_{\textit{NLL},j} =\] \[ =\frac{1}{M}\sum_{j=1}^M \frac{1}{N}\sum_{i=1}^N \left( \frac{1}{2}\log(\sigma_{j,i}^2) + \frac{(y_{j,i} - f_j(\mathbf{x}_i))^2}{2\sigma_{j,i}^2} \right). \]

The Bayesian Neural Network is trained using the Adam optimizer, a stochastic gradient descent method based on adaptive estimation. In addition, a learning rate scheduler is applied to reduce the learning rate when the loss function ceases to improve, ensuring robust convergence during training.

4. Results of the NN

The FFNN was trained on three periods of two months between January 2024 and December 2024, and then tested on different periods with known solar flare events.

The figure here shown represents the prediction of the BNN for the signal in the Xpos face of the ACD, during a period in which the solar activity was particularly high. The prediction has an associated uncertainty, which is used to set the threshold for the triggering algorithm.

5. Identification of anomalies: Gaussian FOCuS and Z-Score

The software implements a triggering algorithm based on the Gaussian FOCuS method to identify anomalies in the time series with a Gaussian distribution.

The FOCuS (Functional Online CUSUM) algorithm is equivalent to running methods of moving windows for all possible sizes, so that duration of the anomaly is not a parameter of the algorithm, but rather assessed by the algorithm itself. The standard CUSUM statistic uses the partial sum of score statistics to time n.

The algorithm takes each true signal of the ACD and the prediction of its background and calculates the statistic for each signal. When the statistic is above a certain threshold, the algorithm triggers an event for the face. Successive events are merged if they are close in time (180 seconds). When more then one face triggers an event, the algorithm merges the triggers to create a single anomalous event.

Another method used to identify anomalies is the Z-Score method. The Z-Score is calculated as the difference between the true signal and the predicted signal divided by the uncertainty of the prediction. The Z-Score is then compared to a threshold to identify anomalies.

Both these methods are used to identify anomalies and to compare them to known events in the Fermi GBM Trigger Catalog.

6. Results of the Triggering Algorithm

Some of the events identified by the triggering algorithm are shown in the following figures.

pictures/2024-01-31 16:16:54_Xpos_Yneg.png — Left: a TGF (Terrestrial Gamma-Ray Flash) event (as described in the Fermi GBM Trigger Catalog) . Right: presumibly a LOCLPART event during a solar flare event (as described in the Fermi GBM Trigger Catalog) .

pictures/2024-03-10 12:08:44_top_Xpos.png — Left: a TGF (Terrestrial Gamma-Ray Flash) event (as described in the Fermi GBM Trigger Catalog) . Right: presumibly a LOCLPART event during a solar flare event (as described in the Fermi GBM Trigger Catalog) .

7. Conclusions and outlook

The development of ACDAnomalies is ongoing to offer a versatile and flexible tool for the anomaly detection in time series data.

Bayesian NN-based models can be used to describe complex dependencies, such as the ones that characterize the environment in which the Fermi satellite is immersed along its orbit. Work is still required to assess the performance of the neural network in big datasets and in different conditions (e.g. periods in which the solar activity is particularly high).

The triggering algorithm is an important part of the software, and its efficiency is under assessment.

Future development of ACDAnomalies aims to support both integration within the Fermi software stack and its use as a stand-alone package.

Acknowledgements

The work presented in this contribution is performed in the framework of Spoke 0 and Spoke 3 of the ICSC project - Centro Nazionale di Ricerca in High Performance Computing, Big Data and Quantum Computing, funded by the NextGenerationEU European initiative through the Italian Ministry of University and Research, PNRR Mission 4, Component 2: Investment 1.4, Project code CN00000013 - CUP I53C21000340006.

References

Atwood et al., THE LARGE AREA TELESCOPE ON THE FERMI GAMMA-RAY SPACE TELESCOPE MISSION, The Astrophysical Journal, 2009
Gaetano Romano et al., Fast Online Changepoint Detection via Functional Pruning CUSUM statistics, Journal of Machine Learning Research, 2023
Crupi, R., Dilillo, G., Bissaldi, E. et al., Searching for long faint astronomical high energy transients: a data driven approach, Exp Astron 56, 421–476, 2023