Over a weekend on one of my trips to the US, I visited the Atlanta’s no. 1 attraction: The Georgia Aquarium, the largest in the world. My first impression was amazement and ‘hats off’ to the engineers, analysts, and architects. The place was packed like a sardine can, but I never felt crowded. The whole place flowed very well. There was no single-file line, but yet a crowd of over 4500 was contained so comfortably. There was nobody to quench my curiosity that moment. My curiosity pang was alleviated when recently, I had a chance to listen to Beach Clark’s presentation on a PodCast, on how they have deployed predictive analytics to keep the crowds flowing smooth at the Aquarium. With time as their independent factor, they look at the following dependents to draw a regression curve that would best fit their requirements: the number of guests at any instant of time; people on the phone (in the bay area, not actually in the aquarium); connections to various wifi zones inside the premises; how full the parking lot is; hold time at the café billing counter; and time spent on an average by a person inside the aquarium by considering factors like outside climate, time of the year, holiday season, weekend time, etc.
A regression analysis is a predictive model that investigates the relationship between a dependent (target) and independent variable(s), in other words, a regression model is a predictor or correlation, between the variables. One could deploy regression for forecasting time series modelling or finding the causal effect relationship between the variables. To perform regression, just fit a curve/line to the data points, in a way that the variance between the distances of data points from the curve or line is minimized.
Linear and logistic regressions are usually the most basic algorithms an analyst uses. The best of forecasters would agree with me when I say that although they are not the only forms of regression, they are the most important. There are innumerable algorithms to perform a regression, other than these two. Each algorithm, if used at the right moment, would yield the best possible forecast. With this blog I intend to discuss and develop an idea of the span of regressions; what could be used instead of just smearing layers of linear/logistic regression to every problem, hoping that they would just fit!!!
Polynomial Regression: A regression equation would be a polynomial if the power of the independent variable is more than 1, or in simple words, if the relationship between the independent and the dependent variable is modelled as an nth degree polynomial. The only cautionary note in plotting this type of curve is towards the end. Higher polynomials can end up producing weird results on extrapolation and hence weird curves.
Stepwise Regression: Apply this when you have multiple independent variables. To perform the ‘best fit’ analysis, this regression model works by adding/dropping co-variates one at a time based on a specified criterion and various tests of significance.
Ridge Regression: When the independent predictors of a model are highly correlated one could use this form of regression. We call this ‘problem as data multicollinearity’. Due to high correlation amongst the factors, there is a high chance of error. Ridge regression deploys a ‘penalty constant’ which is relative in nature to overcome biases/errors in the forecast.
Lasso Regression: This is a more aggressive form of Ridge regression; here the ‘penalty’ function is not relative but absolute, which means this model equivalently constrains the sum of the absolute values of the estimates.
Elastic-Net Regression: This is combination of the Ridge and the Lasso technique.
There are many other forms of regressions like Ecologic, Logic, Bayesian, Quantile, Jackknife, etc. With regression, one could not only identify the significance of a relationship between dependent and independent variables, but could also quantify the strength of the impact of multiple independent variables on a dependent variable. Knowing what would just fit is all about what I call ‘Expertise with Experience’. Choosing the correct regression model is more of an art than a science.
At DRG we are poised to yield the best of the forecasts for our customers. While many people think that regression is the only way of forecasting, we do realize that this ‘Obsession with Regression’ is just not enough for the passion with which we deliver the best to our customers. I will discuss more on this in my upcoming blog.