PCA (Principal Component Analysis) and LDA (Linear Discriminant Analysis) are two key techniques for dimension reduction. The former is an unsupervised algorithm which tries to find linear combinations (components) that capture the maximal variance within the data while the latter is a supervised method which preserves as much of the class discrimination information as possible. Both methods are based on a kind of sample covariance matrix of the data which makes them very sensitive to the presence of even a few outliers in the data. To cope with this problem robust methods were proposed, the most straightforward one being to replace the sample covariance matrix by a robust version of it. While appealing by its simplicity, this approach has disadvantages in case of higher dimensions and methods based on projection pursuit, sparsity and regularization were proposed. We will start by a simple example, will demonstrate the issues of outliers and contamination in the data and will walk through the most popular robust methods for PCA and LDA as implemented in the R package 'rrcov'.
For regression analysis in high-dimensional settings, variable selection is a crucial task to improve prediction performance by variance reduction, to increase interpretability of the resulting models due to the smaller number of variables, and to avoid computational issues with standard methods due to the rank deficiency of the design matrix. Common strategies are to either obtain a sequence of important variables and fit a series of regression models, or to apply regularized regression estimators that simultaneously perform variable selection and coefficient estimation. However, when outliers are present in the data, robust methods are necessary to prevent unreliable results. This tutorial provides an overview of robust methods for regression and variable selection for high-dimensional data, such as robust least angle regression and robust lasso and elastic net. Moreover, the practical application of these methods is illustrated using R packages such as robustHD and pense.
The lecture consists of three parts, namely exploratory analysis, modelling, and predicting. In the first section we start by presenting robust tests for stationarity. Subsequently we look at methods describing the dependence structure of the time series, mainly robust estimators of the autocorrelation function. We conclude the first part by talking about robust spectral analysis. The second section mainly covers linear time series models. We present methods to robustly fit ARMA models, evaluate the goodness of fit and detect outliers. The last section covers parametric and non-parametric forecasting.Throughout the talk we will visualize the effects of outliers on non-robust and robust methods and apply the presented methods to real data sets.