An experimental investigation of scanner data preparation
strategies for consumer choice models
Rick L. Andrews, Imran S. Currim ,
Over the past two decades, marketing scientists in academia and industry have employed consumer choice models calibrated using supermarket scanner data to assess the impact of price and promotion on consumer choice, and they continue to do so today In order to guide managerial decisions regarded price and promotion strategies.
In its raw form, scanner panel data for a product category often contains information on the purchases of hundreds of Stock Keeping Units (SKUs), representing many brands, sizes, product forms, and formulations, by thousands of consumers. Typically, some of these brands, sizes, product forms, and formulations are judged to be less significant in terms of market share and influence on consumer purchase behavior and sometimes are eliminated from the dataset to improve parameter estimates and reduce computing time. In the marketing literature, there is no standard practice as to how these brands, sizes, etc. should be removed from the dataset.
Likewise, raw scanner panel data may contain purchases from some panelists who do not make enough purchases over a two-year period to provide insight into consideration set composition and loyalty and variety seeking behaviors, and so these panelists are sometimes eliminated from the dataset. On the other hand, such exclusion may produce bias in estimated parameters since heavier users are more price sensitive and have more sharply defined preferences for national brands than lighter users (Kim & Rossi, 1994). Again, there is no standard practice as to how purchases should be removed from the dataset. Some studies sample households, including the entire purchase history of each selected household purchasing only from the selected brands, while others sample purchases of the selected brands and omit purchases of other brands, possibly resulting in incomplete household purchase histories (see Gupta, Chintagunta, Kaul, & Wittink, 1996). Eliminating choice alternatives and/or households from the data so that it is more amenable to statistical analysis is called data pruning (Zanutto & Bradlow, 2003).
In the sections following, they discuss previous research and the rationale for the current study, describe the design of the simulation experiment, present the results of the experiment, and discuss implications of the analyses for model builders and managers.
1. Background and rationale
The only known empirical evidence on the issue of data preparation strategies is a study on data pruning decisions by Zanutto and Bradlow (2003). Using fabric softener data, they demonstrate that different decision rules for brand or SKU selection lead to significantly biased parameter estimates compared to the estimates obtained when the model was fitted to the entire dataset. They show that unbiased estimates can be obtained by using bignorableQ selection mechanisms, such as selecting a simple random sample of brands. Their study builds on the Zanutto and Bradlow study in several ways. First, they examine the entity aggregation decision in conjunction with data pruning decisions. To their knowledge, no study has examined the impact of the entity aggregation decision on parameter estimates, despite the fact that models continue to be estimated at the brand, brand-size, and SKU levels in academia and industry. In practice, most studies use data pruning and entity aggregation jointly to prepare scanner data for analysis, so it makes sense to examine both kinds of data preparation strategies together.
Second, using simulation methods, they manipulate characteristics of the data that potentially impact the consequences of data preparation decisions, including (i) whether the marketing mix varies across product forms and (ii) whether there is cross-sectional heterogeneity in consumer preferences and...
Please join StudyMode to read the full document