# Space-time gap filling for extreme hot-spots

When large georeferenced datasets contain important gaps (missing values), direct calculation of summaries from data, such as the minimum over a space-time domain, is not possible. Gaps may arise in such datasets when sensors are defective or cannot provide useful data, for instance, due to cloud occlusion or orbital characteristics when using satellite-based instruments. In Earth and atmospheric sciences, gap filling and prediction of partially observed data is a very important issue. In this project, we aim to provide probabilistic predictions of summary statistics of the data process for a space-time domain, and in particular to focus on episodes with simultaneous extreme values of a continuous variable such as temperature. By taking into account the spatial and temporal dependence, we seek inference for summary functionals of space-time hot-spots, such as the minimum value of the process over a spatial region during a certain period of time.

In terms of methodology, we propose a two-step approach, where we first model marginal distributions with a focus on accurate modeling of the right tail and then, after transforming the data to a standard Gaussian scale, we estimate a Gaussian space-time dependence model defined locally in the time domain for the space-time subregions where we want to predict. In the first step, we detrend the mean and standard deviation of the data and fit a spatially resolved generalized Pareto distribution to apply a correction of the upper tail. To ensure spatial smoothness of the estimated trends, we either pool data using nearest-neighbor (NN) techniques, or apply generalized additive regression modeling (GAM). In the second step, we cope with high space-time resolution of the data by using local Gaussian models with a Markov representation of the Matérn correlation function based on the stochastic partial differential equations (SPDE) approach. The local models are fitted in a Bayesian framework through the integrated nested Laplace approximation implemented in R-INLA. Finally, posterior samples are generated to provide statistical inferences through Monte-Carlo estimation.

To illustrate our methods, we use data provided by the 2019 Extreme Value Analysis data challenge. The data consist of anomalies of Red Sea surface temperatures, using a gridded dataset (11315 days, 16703 pixels) with artificially generated gaps. Figure 1 shows the prediction on the Gaussian scale for a validation day in December 2009 using the GAM (top) and NN (bottom( approaches. The map on the left shows the available data, the map in the middle shows posterior means of predictions obtained through INLA, and the right-hand map shows the corresponding posterior standard deviations. As expected, prediction uncertainty is lower when locations are close to observed locations but can be relatively high in the case of large gaps, such as the one in the southern part of the Red Sea.

Our models perform better than all other contributions to the EVA challenge, and the tail correction provides a significant improvement over models without marginal transformation.

The article is available here.