Saturday, January 21, 2012

The big fuss about 'big data'

There has been growing interest in the idea of big data in the past few years. Indeed as McKinsey wrote about 'big data' (Read here) in May 2011, there has been an exponential rise in data available to businesses for taking better decisions. Indeed there will be a shortage of big data analysts in the US (and much of the Western world).

However in the growing swell of interest in big data, I find all sorts of companies and people talk about their data as 'big data'. This brings me to think: "How big should a dataset be to qualify as 'big data'?"

According to Wikipedia: "In information technology, big data[1] consists of datasets that grow so large that they become awkward to work with using on-hand database management tools. Difficulties include capture, storage,search, sharing, analytics,and visualizing. This trend continues because of the benefits of working with larger and larger datasets allowing analysts to "spot business trends, prevent diseases, combat crime."Though a moving target, current limits are on the order of terabytes, exabytes and zettabytes of data.

To me,it is not just important how big the data is, it is critically important how fast the data is generated and how fast it needs to be analyzed.

As this blog article from ikasoft correctly puts it:"The answer is not "Now the data is big" -- the answer is "Now the data is fast!"  Google didn't become Google because their data was big -- Google went to MapReduce so they could keep growing the number of sites-crawled while still returning results in < 100 milliseconds, and now they're going to Google Instant because even 200 milliseconds isn't fast enough anymore.   Consider all the action we're seeing today in NoSQL data stores -- the point is NOT that they are big -- the point is that apps need to quickly serve data that is globally partitioned and remarkably de-normalized.   Even the best web-era app isn't successful if it isn't fast."

To me, only companies who generate terabytes of data every second (Google, Linkedin, facebook, twitter, Akamai, Yahoo, etc) are truly in the age of 'big data'. Companies who have terabyte+ databases over a year's time period can still stick to their RDBMS databases (and should quit calling themselves 'big data' companies).

Would love your thoughts!

Friday, January 6, 2012

Cold winter for Indian apparel retailers

Many Indian apparel retailers are offering discounts ranging from 40% to 60% three weeks ahead of schedule this season (newspaper article). Sales have been impacted by 20-30% due to higher excise duty taxes and higher cotton prices which have led to a 10-15% rise in product prices.

Could this situation have been foreseen by the retailers? Are they too late on the buzzer? One can only say that apparel retailers will do well to use BI and analytics better to track sales shortfalls in real time and adjust pricing before everyone else realizes that a slowdown is on.

From an analytical perspective here's what they could do:
1) Price elasticity studies and market surveys to understand the impact of 10-15% price rise on demand
2) Markdown models that advise the retailer on how much the price should be discounted if a particular sales target is not met in the early weeks/months of the season.

These are especially relevant in the Indian market since the festivals/ holiday periods occur in the middle of the season (e.g., Divali) and dates vary year by year. Retailers need to be very nimble to adjust prices quickly if festival/holiday targets are not met.