The Bishop and the Butterfly: Murder, Politics, and the End of the Jazz Age

    Bayesic Instinct

    Towards the late days of October, Huffpost's lead pollster started releasing polls claiming greater than 90% probability of a win, explicitly challenging Nate Silver of 538 and his "conservativism" or even manipulating the data. One commenter noted, "we'll know after Nov 8". It was all too funny and surreal, like a guy saying he knows all about carpentry and grasping the hammer head and nailing with the handle.

    No, you can't "know" anything from a single outcome, unless you predicted 100% that it wouldn't happen - that your certain hypothesis was refuted. Otherwise, you're simply left with false confidence in 1 data point - unless you bothered to research your outcome.

    As background, I'm pretty awful in probabilityand statistics - having the basics of dice permutations down, and getting the math of certain cross-correlations in dependent events, and doing enough damage in trying to model stochastic processes. But mostly I done forgot.

    But even if I hadn't, it might not matter. Just as the field of linguistics is going through a phase of rough and tumble re-evaluation after 30-40 years of certainty centered around Chomsky, probability and data analysis is getting an upgrade - perhaps not changing the science, but more how people use it as an art.

    In trying to make some sense from this awful year and a half, and draw some usable lessons from it (rather than another set of kneejerk platitudes and I-told-you-soes, I'm digging into both psychology and analytics in the new year to get some different insights - angles I wouldn't have thought of before.

    There are a number of observations on human peculiarity in Kahneman, how they distort how we predict things. If we ask the same question rephrased between winning X money or losing Y money, the same odds, people will push harder for the scenario losing money. 

    The scenario with Huffpost's approach highlights a basic problem with both how people use probability and with how Democrats approach winner-take-all situations.

    We're comfortable with predictions 70% above, maybe even less, because we're then pretty assured "we're going to win". 70% means roughly 2-to-1 odds or 1/3 chance of losing, roughly what Silver was showing in the final days. But for most of us, 70% means "short of a lightning strike, we've got it". And we're wrong.

    Then we ignore how predictions are made. If 19 out of 20 news outlets say "Candidate X is going to win", we're pretty confident with this 95% certainty. Except those 20 outlets are quite likely talking to each other or getting their confirmation from similar sources - in short, they're not independent - they're interdependent and tangled up in blue, to quote Dylan - if one fails, another 18 fall too. They weren't certain at all.

    We confuse precision with certainty as well. Someone might do a good job of sampling potential voters, but then the assumptions of which voters will show up, what issues affect them and which *motivate* them, how partisan they are, what can influence their vote, how they interact and reinforce each other, and other issues come into play. Context plays a huge role.

    We confuse the basic questions we're evaluating and then assign the wrong probability to it. Nate Silver's 2012 "The Signal and the Noise" discusses such basics as predicting weather and how often local weather reports 3 days away are *worse* than flipping a coin or going by an almanac, while the National Weather Service is getting better and better. But our politics are worse - he discusses how pundits on TV are *worse* than flipping a coin, on average scoring *worse* than 50% right, in large part due to their role of being interesting, not being accurate. Similarly, local weather stations favor entertainment and fudged predictions over accuracy (cruelly upping the forecast of rain, since people feel worse surprised by a downpour than sun).

    In the primaries one candidate famously tried to prove he'd be more electable in November based on what polls were saying - similar polls to what failed in Michigan twice and what was badly wrong not just by Huffpost but most any other source - even the Russian hackers were expecting a Hillary win.

    Yet we don't even understand our own odds of winning. Jimmy Carter came within 10 points of Reagan and got trounced by over 90% of the electoral votes. Bush Sr came within 8 points and took only 80% of the electoral votes. Clinton in '92 got 2/5 of the vote but over 60% of the electoral vote. Four years later Clinton got half the vote but the same percentage of the electoral vote. In short, people who think they're estimating outcomes aren't even measuring the same thing.

    Much more important is Silver's point that we need to predict what we expect to happen as part of calibrating our bias on outcomes - that there is no level measured playing field of possible events to draw from. In fact he makes the case that Fisher who pioneered the equal probability, independently neutral school of probability has it largely wrong - aside from something truly random like presumably unloaded dice on a level surface not spun by a pro, most events have a ton of bias already baked in, especially if they're not physical events but socially impacted ones. But even our assessments of hurricanes and say need to evacuate are based on a lot of factors that aren't scientific and true.

    The neat part as Silver notes is that you don't have to make your prediction unbiased - you just need to try to estimate that amount that you *are* biased up front, and then Bayes' equation largely takes care of itself. 

    *Except* that, probabilistic methods require tuning, lessons learned, feedback back in. One reason the National Weather Service is doing much better is combining much better analytics and computers with much better human observation and analysis and mapped back into the system - a hybrid method that improves on either expert or machine.

    People think that you can just grab more data and crunch it all down, an idea still in vogue but largely debunked as far back as Laplace and confirmed more recently by the discontinuities of chaos theory (not to mention the largely useless efforts of Big Data in the last 10 years to provide some, any valuable info among all the expense). While still heavily present in education, there's a push to get away from the crunch-heavy data orgasm approach, and reintroduce more up-front analysis and back-end tweaking to prevent garbage-in/garbage-out results.

    The second core matter is that predictions without testing and subsequent improvements are largely hot air. In the end, someone will be right, whether their methods were sound or not. They will then claim success, rather than blind luck.

    Where we start to turn Bayesian is when we start to break down our predictions into probabilistic regions based on certain assumptions and likelihoods of sub-events, whether known in detail or surmised, and start to treat prediction not as multiple universes where each possibility happens, but at least a real spectrum of possible outcomes where each one has a *real* *seriously* *possible* chance of happening, whether large or small, and then aggregate these into an identified collection of equally *respected* outcomes, even though the probabilities differ.

    Those outcomes don't have to be different events - they can be outcomes based on Influence #1 alongside an outcome caused by Influence #2, with the same winner.

    We hear about chaos theory in climate science, where a tiny fluctuation somewhere creates a runaway feedback loop that blows up somewhere completely different. But the same can happen in politics, technology, etc. We see what happens when an unlikely candidate like Obama and Sanders catches fire, and then there's a swelling of me-too fandom and enthusiasm, and the linear predictions get tossed out the window. Similar things happen with a cavalcade of talking heads on Sunday or a million tweets following a debate, some with exponential effect, some with limited effect. But we approach probability as if it were additive and linear.

    One of the reasons primaries were different from the general elections is the former were sequential, so had more of a chance to influence the next - one candidate does poorly in state X, pours in more effort in state Y next week. The generals are betting the table in 50 states, where only 8 or 10 tables actually even matter. You can test and tweak your theories across primaries - you don't get a second chance after First Tuesday.

    Which could conclude that all of this jabber is Monday morning quarterbacking. Except that we do have other games coming up - elections and political maneuvering - plus other seasons, and other strategies to try.

    Silver notes the talking heads are usually groundhogs - one big idea - while the guys and gals who aren't so exciting are the foxes, making a dozen smaller hunches with smaller certainty but aggregatable into a a big picture. If you make 3 big predictions a year, it's doubtful you'll have a better show for it than a baseball batting average. If you make 50 predictions a year, it's much easier to calibrate  and improve.

    Additionally, if you break those individual predictions into several scenarios, and gather feedback after-the-fact, you tune your actual understanding of the processes and events for a much richer approach to prediction. It's a portfolio approach, but it's also a much deeper immersion into the learning curve. Larry Byrd would shoot 300 shots a practice, preferring a court to himself, which meant 100,000 shots a year practice.

    If we expect to start using real prediction and probability in politics, we need a sound approach not just for 1 contest every 4 years, but for small contests, big contests, northern, southern, western, northeastern, in different minority and majority groups, under different conditions and expectations. Silver noted a while ago that the last 5 or 6 elections might be a streak of similars that could then shift to a different pattern, leaving "experts" flat-footed. Indeed, that may have happened this past year.

    But my feeling is we have to go further - that we as largely members of a party and somewhat aligned with similar goals and policy positions need to steadily grow a probabilistic approach to prediction, expectation, understanding of the facts - getting in tune with our biases and real outcomes, which would also give a much more mature attitude towards candidates, policies and platforms, races, and what's possible in various locations.

    Hillary won a precinct in Texas where no Democrat ran for Congress. There are a lot of "lost cause" races that got put back into play due to scandal, voter sentiment, or maybe a tsunami of activity in neighboring state. (The Arab Spring was similar in its Domino Effect for as long as it lasted, before the guys in control regrouped).

    I don't expect that everyone will become a data scientist, but it is possible to become less whipsawed by the latest news blast, to be less gullible about how accurate a report is that says X, to understand another 4 likely causes, scenarios, probabilities behind a news or political event described as "all but certain" caused by Y.

    There's been a lot of ridicule of Mark Penn, praise for David Axelrod, and a few other pollsters who've made some ripples. But analytics isn't just about a superman whiz kid mapping out the landscape - it's much better served by a fleet of data-savvy practitioners - ones who know how to tie the odds in with practical everyday situations, such as whether to send folks out door-to-door and where, vs an online marketing push, how to look for clues as to what's causing a ruckus or a defection, how to chase the more likely events and let the unlikely ones float away, divvying up resources and time accordingly.

    Silver follows a gambler who's pushing a 57% chance of success as a good bet. We need to stop thinking in terms of 100% scenarios, and more on a series of 55%-60% bets, and make sure in the long-term those are actually our outcomes, our real odds, and that we're not just pulling numbers out of thin air.

    And that goes for policy and assessing the landscape for different issues. If we're tackling an issue like global warming or influence of banks or police brutality, we need to look at the chance of success for individual actions, not wait for a perfect scenario, see what can be repeated and easily won over time - more like a good baseball singles and walks team with an occasional homer than guys who think in terms of out-of-the-park for every at-bat. 

    And we also need to think more realistically of negative outcomes as well. Right now we have some people saying Trump won't be so bad, others saying it'll be a horrible. What we don't have so much of is people saying there's a 20% of X happening, 30% of Y, 15% of S and 35% of T and calculating the a priori bias in that estimation to figure out 4  very possible outcomes over the next 4 years.

    It's even more useful to extend it to the real probabilities of having a bad car wreck (plus what to do if), the real likelihoods of catastrophic illness scenarios of some sort, the real chance of a catastrophic weather event in your neck of the woods, et al., and even plan some contingent reaction for those cases, rather than living on gut feelings with not much response.

    I'm watching the post-game roundup on Obama, and sadly there's missing any kind of quantitative analysis of what were the real possibilities (odds) based on when he entered office, what were the available strategies at the time with the perceived chance of success, and an estimation of how closely he met those bets.

    Similarly with November - it's very easy to say "damaged" or "we were hacked", but what were the more likely odds of both Sanders and Clinton under different scenarios and likelihood of Republican response, what was the probabilistic effect of different scandals and controversies and successes along with the probabilities of *alternatives*, and how close we came to likely maximum of 55% (but stlll possible above) of the vote, and what the probabilistic portfolio of actions that could have made a difference last time and might make a difference next time.

    The Signal and The Noise differentiates between "prediction" - an old Latin-based word tied to oracles and soothsayers - and "forecast" - a German-based word from the Renaissance leading to the industrial age, signifying control and self-actualizing. Large amounts of bad data and fake news will continue to plague us in larger and larger quantities, but if we learn to focus instinctively on the quality usable information and evolving patterns, and keep from getting caught up in the fool's gold gold rush, we stand a smallish chance of carrying the day. And I'm much more comfortable with the odds & logic of small numbers than largish claims.



    Post presser, I'm in a pretty bad mood, so I'm not sure I followed your argument as a whole. But here's what occurred to me about Silver. Toward the very end, I believe he was giving Hillary a 60% chance of winning. So when she didn't win, people said, "So much for Nate the great."

    But if he gave Hillary a 60% chance of winning, he gave Trump a 40% chance of winning--which isn't nothing. And the election just barely (based on the 70,000 vote margin in those key states) fell onto the 40% side. IOW, 60/40 does NOT mean that the 40% outcome won't happen. It just means there's a 40% of it happening. Less chance of it happening than the 60% outcome.

    Exactly. My poorly placed optimism in the end was that the polls were shifting towards Hillary the last 3 or 4 days, presumably showing she was countering the Comey and Wikileaks bit well, and maybe with another 3 or 4 days... But the magnitude of uncertainty was still there, including the cross-correlation between Midwest states, along with others. If we thiught she was safe in Wisconsin and she wasn't, the similar assumptions and like-minded polls may have missed the same sentiments or voters in other midwest states. The errors reinforce themselves, and when they collapse or prove flawed, it can be across the board. Silver tried to dial in that uncertainty and herald too many unknown unknowns to even be confident in the prediction. But we demand confidence, a single number, not a range of actual likely and less likely outcomes. We're victims of our own wishful thinking here and in everything we do in life - it's our nature, but doesn't have to be (as much).

    People are saying "Trump likely won't be so bad" or "Trump will be worse". We need to dial in real probabilities to the level of how bad he may be under different scenarios, using our prior bias, and then calibrate those predictions by actual evidence for and against moving forward. That's one way how not to get burned (as bad).

    The Buzzfeed release is interesting - we similarly can assign a probability it's true, mathematically acknowledging our heavy bias towards it being so, and gauge our predictions as more evidence comes out. That the FBI and news agencies sit on stuff like this and not on others, and still others we've (and they've) no clue gives an idea how incomplete our data set and truth models are. In a Bayesian world, Buzzflash would say what likelihood it a priori assigns the document and how likely it believes the different incidents described to be - similar to Wikileaks'little quiz of which disease Hillary might be suffering from, though they didn't assign their own probability - they left it up to the reader and didn't include the possible e answer, "none".

    Anyway, kudos to Buzzfeed even though it will complicate our world or not - we already know news orgs have their fingers on the scale, and their "confirmation" is less gold standard than believed. Often it really does come down to 1 person, just like this, and like the curveball guy in Iraq, often much dodgier. What were his odds of being right?

    Most of what you wrote is somewhat over my head, but I get the gist.  I also found OGD's reference below very helpful.  I was

     hopeful on November 8th until I went to vote.  I had never had to wait to vote before, but that day the line was very long.  I had a sinking feeling as I realized that most polls are taken of "likely voters," and so all the people who were "inspired" by trump but usually did not vote were not counted.  

    I went home feeling very uncertain.  That was my gut reaction, and though late in the game, I don't think it was wrong.  What do you think about polling practices as a factor in getting things wrong?

    Well, polling practices greatly affect the validity of the poll. The current practice of mashing up a bunch of polls and hoping it leads to some validity is certainly a problem. If everyone uses the same flawed method or assumption, then the errors simply appear across all. The guy at Huffpost thought not massaging data a PLUS, while Nate Silver is continuously stressing the importance of the human oversight and tweaking based on intelligent evaluation (while trying to acknowledge where personal bias might enter).

    Peracles... Sam Wang a PEC...

    There are some great take-aways in this... And Sam does interact in his comment section.

    What data got right in 2016 – and what’s ahead for PEC

    December 20, 2016


    Usually, PEC would close down after the election for two years. But this year I’ve heard from many of you about your continued appetite for data-based analysis. More than ever, data is necessary to understand public life. Here are some examples of what we learned this year:

    That is just the analysis done here – there was also much excellent work done at FiveThirtyEight and The Upshot.


    The estimate of uncertainty was the major difference between PEC, FiveThirtyEight, and others. Drew Linzer has explained very nicely how a win probability can vary quite a bit, even when the percentage margin is exactly the same (to see this point as a graph, see the diagram). At the Princeton Election Consortium, I estimated the Election-Eve correlated error as being less than a percentage point. At FiveThirtyEight, their uncertainty corresponded to about four percentage points. But we both had very similar Clinton-Trump margins – as did all aggregators.

    For this reason, it seems better to get away from probabilities. When pre-election state polls show a race that is within two percentage points, that point is obscured by talk of probabilities. Saying “a lead of two percentage points, plus or minus two percentage points” immediately captures the uncertainty.

    Even a hedged estimate like FiveThirtyEight’s has problems, because it is ingrained in people to read percentage points as being in units of votes. Silver, Enten, and others have taken an undeserved shellacking from people who don’t understand that a ~70% probability is not certain at all. Next time around, I won’t focus on probabilities – instead I will focus on estimated margins – as well as an assessment of which states are the best places for individuals to make efforts. This won’t be as appealing to horserace-oriented readers, but it will be better for those of you who are actively engaged.

    Read the entire piece here-->



    Wow, good followup, thanks.

    Latest Comments