MaxDiff
![]() | This article's lead section may be too long. (October 2015) |
![]() |
The MaxDiff model is actually a special case of the more general Best-Worst Scaling (BWS) technique, a discrete choice model first described by Jordan Louviere in 1987 while on the faculty at the University of Alberta. Louviere attributes the idea to the early work of Anthony A. J. Marley in his PhD thesis, who together with Duncan Luce in the 1960s produced much of the ground-breaking research in mathematical psychology and psychophysics to axiomatise utility theory. The first working papers and publications appeared in the early 1990s. With BWS, survey respondents are shown a set of the possible items and are asked to indicate the best and worst items (or most and least important, or most and least appealing, etc.). The definitive textbook describing the theory, methods and applications has now been published by Cambridge University Press by Jordan Louviere (University of South Australia), Terry N Flynn (TF Choices Ltd) and Anthony A. J Marley (University of Victoria and University of South Australia).[1] The book brings together the disparate research from various academic and practical disciplines, in order that replication and mistakes in implementation are avoided. The three authors have already published the key academic peer-reviewed articles describing BWS theory,[2][3][4] practice,[5][6] and a number of applications in health,[7] social care,[8] marketing, transport, voting,[9] and environmental economics.[10]
The book distinguishes two different purposes of BWS - as a method of data collection, and/or as a theory of how people make choices when confronted with three or more items. This distinction is crucial, given the continuing misuse of the term maxdiff to describe the method. As Marley and Louviere describe, the maxdiff is a long-established academic mathematical theory with very specific assumptions about how people make choices:[11] it assumes that respondents evaluate all possible pairs of items within the displayed set and choose the pair that reflects the maximum difference in preference or importance. BWS may be thought of as a variation of the method of Paired Comparisons. Consider a set in which a respondent evaluates four items: A, B, C and D. If the respondent says that A is best and D is worst, these two responses inform us on five of six possible implied paired comparisons:
- A > B, A > C, A > D, B > D, C > D
The only paired comparison that cannot be inferred is B vs. C. In a choice among five items, MaxDiff questioning informs on seven of ten implied paired comparisons.
Yet respondents can produce best-worst data in any of a number of ways. Instead of evaluating all possible pairs (the maxdiff model), they might choose the best from n items, the worst from the remaining n-1, or vice versa. Or indeed they may use another method entirely. Thus it should be clear that maxdiff is a subset of BWS. Indeed as the number of items increases, the number of possible pairs increases in a multiplicative fashion: n items produces n(n-1) pairs (where best-worst order matters). Assuming respondents do evaluate all possible pairs is a strong assumption and in 14 years of presentations, the three co-authors have virtually never found a course or conference participant who admitted to using this method to elicit their best and worst choices. Virtually all use sequential models (best then worst or worst then best).[12] Early work did use the term maxdiff to refer to BWS, but with the recruitment of Marley to the team developing the method, correct academic terminology has been disseminated throughout Europe and Asia-Pacific (if not North America, which continues to use the maxdiff term). Indeed it is far from clear that the major software manufacturers of discrete choice models actually implement maxdiff models in estimating parameters of their models, despite this continuing advertising of maxdiff capabilities.
The second use of BWS described in the book is as a method of data collection (rather than as a theory of how humans produce a best and a worst item). BWS can, particularly in the age of web-based surveys, be used to collect data in a systematic way that (1) forces all respondents to provide best and worst data in the same way (by, for instance, asking best first, greying out the chosen option, then asking worst); (2) Enables collection of a full ranking, if repeated BWS questioning is implemented to collect the "inner rankings". In many contexts, BWS for data collection has been regarded merely as a way to obtain such data in order to facilitate data expansion (to estimate conditional logit models with far more choice sets) or to estimate conventional rank ordered logit models.[13]
BWS questionnaires are relatively easy for most respondents to understand. Furthermore, humans are much better at judging items at extremes than in discriminating among items of middling importance or preference [citation needed]. And since the responses involve choices of items rather than expressing strength of preference, there is no opportunity for scale use bias.
Steve Cohen introduced BWS to the marketing research world in a paper presented at an ESOMAR Conference in Barcelona in 2002 entitled, "Renewing market segmentation: Some new tools to correct old problems."[14] This paper was nominated for Best .paper at that conference. In 2003 at the ESOMAR Latin America Conference in Punta del Este, Uruguay, Steve and his co-author, Dr. Leopldo Neira, compared BWS results to those obtained by rating scale methods. This paper won Best Methodological Paper at that conference. Later the same year, it was selected as winner of the John and Mary Goodyear Award for Best Paper at all ESOMAR Conferences in 2003 and then it was published as the lead article in "Excellence in International Research 2004," published by ESOMAR.[15] At the 2003 Sawtooth Conference, Steve Cohen's paper "Maximum Difference Scaling: Improved Measures of Importance and Preference for Segmentation," [16] was selected as Best Presentation. Steve and Bryan agreed that MaxDiff should be part of the Sawtooth package and it was introduced later that year. Later in 2004, Steve Cohen and Bryan Orme of Sawtooth Software won the David K. Hardin Award from the AMA for their paper which was published in Marketing Research Magazine entitled, "What's your preference? Asking survey respondents about their preferences creates new scaling decisions."[17]
The re-naming of the method, to make clear that maxdiff scaling is BWS but BWS is not necessarily maxdiff, was decided by Louviere in consultation with his two key contributors (Flynn and Marley) in preparation for the book, and was presented in an article by Flynn.[18] That paper also took the opportunity to make clear that there are, in fact, three types ("cases") of BWS: Case 1 (the "object case"), Case 2 (the "profile case") and Case 3 (the "multi-profile case"). These three cases differ largely in the complexity of the choice items on offer. Case 1 presents items that may be attitudinal statements, policy goals, marketing slogans or any type of item that has no attribute and level stricture. It is primarily used to avoid scale biases known to affect rating (Likert) scale data, particularly when Balanced Incomplete Block Designs are used. These force every item to compete with every other the same number of times and ensure there can be no "ties" in importance/salience at the top or bottom of the scale.
Case 2 has predominated in health and the items are the attribute levels describing a single profile of the type familar to choice modellers. Instead of making choices between profiles, the respondent must make best and worst (most and least) choices within a profile. Thus, for the example of a mobile (cell) phone, the choices would be the most acceptable and least acceptable features of a given phone. Case 2 has proved to be powerful in eliciting preferences among vulnerable groups, such as the elderly,[19][20] older carers,[21] and children,[22] who find conventional multi-profile discrete choice experiments difficult. Indeed the first comparison of Case 2 with a DCE found that whilst the vast majority of (older) respondents provided usable data from the BWS task, only around one half do so for the DCE.[23]
Case 3 is perhaps the most familiar to choice modellers, being merely an extension of a discrete choice model: the number of profiles must be three or more, and instead of simply choosing the one the respondent would purchase, (s)he chooses the best and worst profile.
The book contains an introductory chapter summarising the history of BWS and the three cases, together with why the respondent must think whether (s)he wishes to use it to understand theory (processes) of decision-making and/or merely to collect data in a systematic way. Three chapters, one for each case, follow, detailing the intuition and application of each. A chapter bringing together Marley's work proving the properties of the key estimators and laying out some open issues then follows. After laying out open issues for further analysis, nine chapters (three per case - describing applications from a variety of disciplines) then follow.
Process
The basic steps are:
- Conduct proper qualitiative or other research to properly identify and describe all items of interest.[24]
- Construct a statistical design that indicates what items are to be presented in each set of items ("choice set") - designs may come from publicly available catalogues, be constructed by hand, or produced from commercially available software.
- Use the design to construct the choice sets, which contain the actual relevant items (textually or visually).
- Obtain response data where respondents choose the best and worst from each task; repeat best-worst (to obtain second best, second worst, etc) may be conducted if the analyst wishes for more data.
- Input the data into a statistical software program and analyse. The software will produce utility functions for each of the features. In addition to utility scores, you can also request raw counts which will simply sum the total number of times a product was selected as best and worst. These utility functions indicate the perceived value of the product on an individual level and how sensitive consumer perceptions and preferences are to changes in product features.
Why use best-worst scaling?
BWS is an antidote to standard rating scales or importance scales. Respondents find these ratings scales very easy but they do tend to deliver results which indicate that everything is "quite important", making the data not especially actionable.[citation needed] BWS on the other hand forces respondents to make choices between options, while still delivering rankings showing the relative importance of the items being rated. It also produces:
- Distributions of "the scores" (calculated as the best frequency minus the worst frequency) for all items which allow the researcher to observe the empirical distribution of estimated utilities. This produces information on how realistic the results from traditional analysis methods assuming standard continuous distributions are likely to be. Consumers tend to form distinct groups with often very different preferences, giving rise to multi-modal distributions.
- Data that allow investigation of the decision rule (functional form of the utility function) at various ranking depths (most simply, the "best decision rule vs the worst decision rule"). Emerging research is suggesting that in some contexts respondents do not use the same rule, which calls into question the use of estimation methods such as the rank ordered logit model.
- Estimation of attribute impact, a measure of the overall impact of an attribute upon choices that is not available from conventional discrete choice models.
- More data, that allow greater insights into choices, for a given number of choice sets. The same information could be obtained by simply presenting more choice sets but this runs the risk that respondents become bored and disengage with the task.
- Quantifying the phenomena of response shift and adaptation to poor health states.[25]
Analysis
Estimation of the utility function is performed using any of a variety of methods.
- multinomial discrete choice analysis, in particular multinomial logit (strictly speaking the conditional logit, although the two terms are now used interchangeably). The multinomial logit (MNL) model is often the first stage in analysis and provides a measure of average utility for the attribute levels or objects (depending on the Case).
- In many cases, particularly cases 1 and 2, simple observation and plotting of choice frequencies shoud actually be the first step, as it is very useful in identifying preference heterogeneity and respondents using decision-rules based on a single attribute.
- Several algorithms could be used in this estimation process, including maximum likelihood, neural networks, and the Hierarchical Bayes model. The Hierarchical Bayes model is beneficial because it allows for borrowing across the data, although since BWS often allows the estimation of individual level models, the benefits of Bayesian models are heavily attenuated. Response time models have recently been shown to replicate the utility estimates of BWS, which represents a major step forward in the validation of stated preferences generally, and BWS preferences specifically.[26][27]
External sources
- Almquist, Eric; Lee, Jason (April 2009), What Do Customers Really Want?, Harvard Business Review, retrieved 15 February 2010
- Cohen, Steve and Paul Markowitz (2002), “Renewing Market Segmentation: Some New Tools to Correct Old Problems,” ESOMAR 2002 Congress Proceedings, 595-612, ESOMAR: Amsterdam, The Netherlands.
- Cohen, Steven H. (April 2003). "Maximum Difference Scaling: Improved Measures of Importance and Preference for Segmentation". Proceedings of the Sawtooth Software Conference. San Antonio, TX. pp. 61–74.
{{cite conference}}
: Unknown parameter|booktitle=
ignored (|book-title=
suggested) (help) - Louviere, J. J. (1991), “Best-Worst Scaling: A Model for the Largest Difference Judgments,” Working Paper, University of Alberta.
- Louviere, J.J.; Flynn, T.N.; Marley, A.A.J., “Best-Worst Scaling: Theory, Methods and Applications”, Cambridge University Press, Cambridge (September 2015)
- Thurstone, L. L. (1927), “A Law of Comparative Judgment,” Psychological Review, 4, 273-286.
References
- ^ "Best-Worst Scaling". Cambridge University Press. Retrieved 30 September 2015.
- ^ Marley, Anthony AJ; Louviere, Jordan J. (1 January 2005). "Some probabilistic models of best, worst, and best–worst choices". Journal of Mathematical Psychology. 49 (6): 464–480.
- ^ Marley, A. A. J.; Flynn, Terry N.; Louviere, J. J. (1 January 2008). "Probabilistic models of set-dependent and attribute-level best–worst choice". Journal of Mathematical Psychology. 52 (5): 281–296.
- ^ Marley, A. A. J.; Pihlens, D. (1 January 2012). "Models of best–worst choice and ranking among multiattribute options (profiles)". Journal of Mathematical Psychology. 56 (1): 24–34.
- ^ Flynn, Terry N.; Louviere, Jordan J.; Peters, Tim J.; Coast, Joanna (1 January 2007). "Best-worst scaling: What it can do for health care research and how to do it". Journal of Health Economics. 26 (1): 171–189.
- ^ Louviere, Jordan; Lings, Ian; Islam, Towhidul; Gudergan, Siegfried; Flynn, Terry (1 January 2013). "An introduction to the application of (case 1) best–worst scaling in marketing research". International Journal of Research in Marketing. 30 (3): 292–303.
- ^ Flynn, Terry N.; Louviere, Jordan J.; Peters, Tim J.; Coast, Joanna (1 January 2007). "Best-worst scaling: What it can do for health care research and how to do it". Journal of Health Economics. 26 (1): 171–189.
- ^ Potoglou, Dimitris; Burge, Peter; Flynn, Terry; Netten, Ann; Malley, Juliette; Forder, Julien; Brazier, John E. (1 January 2011). "Best–worst scaling vs. discrete choice experiments: An empirical comparison using social care data". Social science & medicine. 72 (10): 1717–1727.
- ^ García-Lapresta, José Luis; Marley, Anthony AJ; Martínez-Panero, Miguel (1 January 2010). "Characterizing best–worst voting systems in the scoring context". Social Choice and Welfare. 34 (3): 487–496.
- ^ Scarpa, Riccardo; Notaro, Sandra; Louviere, Jordan; Raffaelli, Roberta (19 June 2011). "Exploring Scale Effects of Best/Worst Rank Ordered Choice Data to Estimate Benefits of Tourism in Alpine Grazing Commons". American Journal of Agricultural Economics: aaq174. doi:10.1093/ajae/aaq174. ISSN 0002-9092.
- ^ Marley, Anthony AJ; Louviere, Jordan J. (1 January 2005). "Some probabilistic models of best, worst, and best–worst choices". Journal of Mathematical Psychology. 49 (6): 464–480.
- ^ Flynn, Terry; Louviere, Jordan; Peters, Tim; Coast, Joanna (1 January 2008). "Estimating preferences for a dermatology consultation using Best-Worst Scaling: Comparison of various methods of analysis". BMC medical research methodology. 8 (1): 76.
- ^ Louviere, Jordan J.; Street, Deborah; Burgess, Leonie; Wasi, Nada; Islam, Towhidul; Marley, Anthony AJ (1 January 2008). "Modeling the choices of individual decision-makers by combining efficient choice experiment designs with extra preference information". Journal of choice modelling. 1 (1): 128–164.
- ^ Cohen, Steven H. and Paul Markowitz (2002) “Renewing market segmentation: Some new tools to correct old problems.” ESOMAR 2002 Congress Proceedings, 595-612, ESOMAR: Amsterdam, The Netherlands.
- ^ Cohen, Steven H. and Leopoldo Neira (2003). “Measuring preferences for product benefits across countries: Overcoming scale usage bias with maximum difference scaling.” Paper presented at the Latin American Conference of the European Society for Opinion and Marketing Research, Punta del Este, Uruguay. Reprinted in Excellence in International Research: 2004. ESOMAR, Amsterdam, Netherlands, 1-22.
- ^ Maximum Difference Scaling: Improved Measures of Importance and Preference for Segmentation Steven H. Cohen, SHC & Associates, Sawtooth Software RESEARCH PAPER SERIES, 2003
- ^ Steven H. Cohen and Bryan Orme, 2004, What's your preference? Asking survey respondents about their preferences creates new scaling decisions. Winner of the 2004 David K. Hardin Award from the American Marketing Association.
- ^ Flynn, Terry N. (1 January 2010). "Valuing citizen and patient preferences in health: recent developments in three types of best–worst scaling". Expert review of pharmacoeconomics & outcomes research. 10 (3): 259–267.
- ^ Flynn, Terry N.; Peters, Tim J.; Coast, Joanna (1 January 2013). "Quantifying response shift or adaptation effects in quality of life by synthesising best-worst scaling and discrete choice data". Journal of choice modelling. 6: 34–43.
- ^ Coast, Joanna; Flynn, Terry N.; Natarajan, Lucy; Sproston, Kerry; Lewis, Jane; Louviere, Jordan J.; Peters, Tim J. (1 January 2008). "Valuing the ICECAP capability index for older people". Social Science & Medicine. 67 (5): 874–882.
- ^ Al-Janabi, Hareth; Flynn, Terry N.; Coast, Joanna (1 January 2011). "Estimation of a preference-based Carer Experience Scale". Medical Decision Making. 31 (3): 458–468.
- ^ Ratcliffe, Julie; Flynn, Terry; Terlich, Frances; Stevens, Katherine; Brazier, John; Sawyer, Michael (1 January 2012). "Developing Adolescent-Specific Health State Values for Economic Evaluation". Pharmacoeconomics. 30 (8): 713–727.
- ^ Flynn, Terry; Peters, Tim; Coast, Joanna (1 January 2013). "Quantifying response shift or adaptation effects in quality of life by synthesising best-worst scaling and discrete choice data". Journal of Choice Modelling. 6: 34–43.
- ^ Coast, Joanna; Al-Janabi, Hareth; Sutton, Eileen J.; Horrocks, Susan A.; Vosper, A. Jane; Swancutt, Dawn R.; Flynn, Terry N. (1 January 2012). "Using qualitative methods for attribute development for discrete choice experiments: issues and recommendations". Health economics. 21 (6): 730–741.
- ^ Flynn, Terry N.; Peters, Tim J.; Coast, Joanna (1 January 2013). "Quantifying response shift or adaptation effects in quality of life by synthesising best-worst scaling and discrete choice data". Journal of choice modelling. 6: 34–43.
- ^ Hawkins, Guy E.; Marley, A. A. J.; Heathcote, Andrew; Flynn, Terry N.; Louviere, Jordan J.; Brown, Scott D. (1 January 2014). "Integrating cognitive process and descriptive models of attitudes and preferences". Cognitive science. 38 (4): 701–735.
- ^ Hawkins, Guy E.; Marley, A. A. J.; Heathcote, Andrew; Flynn, Terry N.; Louviere, Jordan J.; Brown, Scott D. (1 January 2014). "The best of times and the worst of times are interchangeable". Decision. 1 (3): 192.