This post has been written by Naveen Prashanth, a second year PGP student at IIM Bangalore who was part of the just-concluded Spreadsheet Modelling for Business Decision Problems course.
Throughout this course, we have enjoyed the experience of playing with data and deriving interesting models / solutions to business problems. The path to making a good decision based on data seemed to involve mostly model making and driving insights. However, having worked for 4 years in analytics space, I have observed a wide variety of challenges that make driving decisions through data analysis a lot more complex. I would like to share some of those insights on this forum, so we can guard against them and ensure they do not come in the way of us fully utilizing our learnings from this course at work.
Firstly, data procurement is a time consuming and tricky business. In my first job, I was working with several retail store giants in the US that banked with Citibank. Though the data for each of these portfolios was supposed to be very similar, given similar economic conditions and business variables, there was stark variation in how the database of each portfolio was stored, the ‘level’ of each dataset as well as the definition of various fields. For example, the field ‘Revenue’ in one portfolio was defined differently from the other, though one would imagine there ideally is no business justification for the same. Further, data owners exhibit their own idiosyncrasies and can be fairly protective about the data they wish to share.
I have observed that the remedy for this situation is to make clear why the data you need is actually needed. It is best to start with the hypotheses you wish to test, how they make sense for the business, the data needed to validate these hypotheses and then delve into specific fields needed. This way, data owners feel empowered and involved in the analysis, area more likely to share data with you and most importantly also offer their valuable suggestions to improve your analysis.
The second issue is in the area of validation. Typically, we used to double check code and ensuring the data is being pulled right. Even in the area of model building, the common practice was to re-visit the logic and formulae to ensure accuracy. However, to accurately analyse data, one should adopt the process of cross-validation, where a brand new angle is used to validate the data. For instance, if A and B are drawn from different databases, and we know that A divided by B has to be between 50% and 60% by business sense, then this check can be applied once the datasheet is up. The advantage of this method is that we are mimicking how an end user would validate the model, while also being able to derive insights in this process. Tying up summarized database information to existing MI reports in the organization, with a certain tolerance level for the match, is also a good practice.
The third issue pertains to timelines. Most people who are not involved with data analytics have limited idea, not entirely their fault, on how long data analysis takes. It is important to clearly outline the steps needed to finally derive the decision, starting from data collection stage, and assign a precise timeline for each of the tasks. Socializing this schedule with other relevant stakeholders in the firm will also ensure no issues are raised later.
The final issue rests with the complexity appetite of the end user. In my previous organization, we built a killer NPV model that cut decision process time by almost half, and also freed up some very strained resources in the firm. However, senior management did not adopt this model since it was considered ‘too complex’ by them. It is important to be aware of the inclinations of the end user, since it is of course better to have as simple solution implemented than a complex one on the shelf. Clear formatting and visualization techniques we learnt in the course can further help break down prejudices on data complexity.
by Naveen Prashanth