man standing starring at a wall of graphs and charts

It is said that the “Intelligence Quotient (IQ) of students tends to increase with their shoe size”. Obviously, the tendency for shoe size and IQ to increase together does not mean that bigger feet is the cause. The two variables are definitely correlated, but is there a causality? Now consider this, older children have bigger feet, but they also have more developed brains. This natural development of children explains the simple observation that shoe size and IQ have a tendency to increase together. Statistics would say they are positively correlated but causality is yet to be established. If we conduct a correlation study to establish relationship between two or more naturally existing variables, such as foot size and IQ, we must understand that as researchers we would have no control on the variables (they cannot be manipulated in any way, like in a laboratory experiment). Such a study cannot justify positive correlation and would be a waste of time and effort in attempting to draw any regression around it.

Regression renders power to postulate superior forecasts based on various correlated variables but it would be worth to note here that if one starts a correlation-regression prediction on a world of infinite variables then one might end up postulating that just about anything is causative or derivative of everything else! ‘Butterfly Effect within the Chaos Theory’ is one such postulate that is worth reading to understand this statement better.

During my research in collaboration with the Indian defense, I worked on a hypothesis that drug x (a research drug) has radio-protective properties. Looking at extensive in-vitro and ex-vivo results I postulated that the drug x was indeed potent, based on which I initiated my animal survival studies. On the 7th day of gamma irradiation, all animals died. I failed to understand how my regression went so wrong in predicting things. I revisited my model and looked at all correlated variables but this time, I also predicted the effect of independent variables on them. I finally strengthened my model using multi-level regression coupled with the predictive likelihood model. This approach is related to regression but is very advanced in its structure and approach.

A similar problem may arise when we predict device markets. What exactly do we want to know when predicting markets? In simple words, we want to predict how will a new device x be adopted in a new population. If we could just know how present market adopters and future (potential) adopters of a new product would interact, we would be able to postulate the strongest prediction possible. The Bass diffusion prediction model developed by Frank Bass deploys a differential equation to help this. The Bass model has been widely used in forecasting, especially new products sales forecasting. Also advanced time-series approaches such as exponential smoothing and artificial neural network (ANN) are worth reading.

Today the best of the bests are deploying their predictive skills to better the world’s healthcare. But predictive analytics by definition cannot tell what will happen in the future, it can only predict the most likely future. ‘Big data’ with ‘Real World Evidence’ analytics can be churned to draw actionable insights. It has the potential to not only predict the future but also precisely say how to change it. I am thinking out loud here but the era of precision medicine and precision healthcare is nearing intersection with the ‘Real World Evidence’ data. I will talk more about the ‘Age of Real World Evidence’ analytics in my concluding blog.


Click here to view the other blogs in the Predictive Forecasting series.

How Glympse Bio oversubscribed their Series B funding amidst the pandemic

View Now