Statistical analysis of massive datasets very often implies expensive linear algebra operations with large dense matrices. Typical tasks are an estimation of unknown parameters of the underlying statistical model and prediction of missing values. We developed the H-MLE procedure, which solves these typical tasks. The unknown parameters can be estimated by maximizing the joint Gaussian log-likelihood function, which depends on a covariance matrix. To decrease the high computational cost, we approximate the covariance matrix in the hierarchical (H-) matrix format, which has only a log-linear computational cost. The H-matrix technique allows inhomogeneous covariance matrices and almost arbitrary locations. Especially, H-matrices can be applied in cases when the matrices under consideration are dense and unstructured. For validation purposes, we implemented three machine learning methods: the kNN, random forest, and deep neural network. The best results (for the given datasets) were obtained by the kNN method with three or seven neighbors depending on the dataset. The results computed with the H-MLE method were compared with the results obtained by the kNN method. The developed H-matrix code and all datasets are freely available online.
|Состояние||Опубликовано - 2021|
|Событие||4th International Conference on Uncertainty Quantification in Computational Sciences and Engineering, UNCECOMP 2021 - Athens, Греция|
Продолжительность: 28 июн 2021 → 30 июн 2021
Предметные области OECD FOS+WOS
- 1.02 КОМПЬЮТЕРНЫЕ И ИНФОРМАЦИОННЫЕ НАУКИ
- 1.01 МАТЕМАТИКА