Sunday, December 18, 2011

Judging in a wine competition



A few weeks ago, I was asked to be a judge in a state level wine competition.  Of course, I said yes because it sounds like a lot of fun and might be great networking.  On the downside, it is really difficult to be critical of a huge volume of wine.  It's just physically difficult.

Once a friend who owns a wine shop called me with a rare opportunity.  A big time wine critic was coming to his store to taste through a portfolio of Australian wine, and the shop owner was inviting his friends over to taste afterward.  There were over 200 full throttle high-alcohol Australian wines, and towards the end, I could only taste iron and copper.  Why?  Because all the skin had been ripped off my gums and mouth and I was tasting blood!

I am told that the same problem happens at wine competitions.  There are just so many wines to taste that you get drunk even while spitting.  Your palate gets blown out and you end up giving top marks to anything you can still actually taste at the end, usually really high-alcohol fruit bombs.  Well constructed wines with finesse or medium body need not apply.

In my research on how to be a good judge, I've not been encouraged.  Robert Hodgson published two papers in the Journal of Wine Economics in 2009 that are instructive.  Both were written using what sounds like the coolest dataset ever, a set of tasting grades from a competition where researchers had slipped duplicate wines into the lists to see if the judges would be consistent with their rankings.

The first paper is a methodological mess (see my blog post about why you cannot perform mathematical calculations on non-number data -- adding, subtracting, multiplying, and dividing).  The author took rankings, turned them into arbitrary numbers, and then performed odd math to see if critics could taste the exact same wines and award the same grades. 

Okay, let me take a second to say how statistically ridiculous this method is, and an how amazed I am that it got past peer review.  If someone is asking people to rate their interests on a scale of one to five, and I like cotton candy 5 and liver 1, there is no reason to think I'll like liver flavored candy a 3.  It makes no sense.

Despite the egregious abuse of math, it's pretty obvious by Hodgson's description of the data that the judges stink, at a major competition, with judges from the industry.  I cannot say that there is any statistical validity to his conclusion, but there is lots of reason to be concerned.

The second paper makes me much happier as a statistician and much more nervous as a competition judge.  Here, Hodgson uses a test called Cohen's Kappa, which is really cool because it looks at if the judge scored the same wine the same way, but also gives partial credit if he or she got close.  This statistic showed that only 30 percent of judges could rate wine consistently.  Many more judge fairly randomly.  Hrm...

So I'm worried about doing a poor job, but then I'd be in good company.  When I think about the wineries submitting samples, however, I feel a little better.  For them, a competition is a no-lose proposition - they either get a medal to display in their tasting room, or they get nothing.  There is no negative for a poor rating, only praise if they get it.  So I'll just try to be contentious, and remember the advice in the excellent Fermentation Blog.
  • Concentrate.
  • Be aware that my personal tastes may not reflect great wine (an oaky Chardonnay can be excellent even if it isn't to my taste).
  • Go back and review Wine Faults by John Hudelson.
  • Stay hydrated and keep spitting.
Wish me luck!





No comments:

Post a Comment