Wednesday, July 8, 2015

Variance and Entropy

One of the most compelling qualities of Magic is its tremendous replay value. If you sit down two players with two sealed decks to play ten games, every one of those games will be different. Why? Because of the randomness that drives Magic.

However, the way we talk about randomness in games is often imprecise or even misleading. When we say that a card is "unpredictable", we usually conflate two similar but distinct concepts. I'd like to define and separate these ideas, so that we can say what we mean to say. (And maybe even design what we mean to design!)

Variance:
the lack of predictability of a card's contribution towards winning the game.
Entropy: the lack of predictability of the player's experience of playing a card.
The former definition is one you've probably seen before in some form. It's how poker players talk about variance. Tom LaPille wrote about it in this excellent Latest Developments column.

But the latter definition is entirely different: it's talking about the variety of ways in which a card can be experienced, independent of that card's likelihood of winning the game for you.

Variance


The most obvious types of high-variance cards are those with explicit randomness: Enlisted Wurm, Friendly Fire, Haunted Fengraf, etc. However, the more common and less obtrusive way for a card to be high-variance is situational usefulness. Some cards are only good early (Carnophage), only good late (Boulderfall), only good alongside other cards (Boggart Sprite-Chaser), only good as answers to other cards (Eyes of the Wisent), only good if you’re winning (Ruthless Cullblade), only good if you’re losing (Gavony Ironwright), only good in very particular situations (Hidetsugu’s Second Rite), etc. Linear mechanics such as Constellation, Slivers, and Affinity also fall into this category, as do threshold mechanics. Whenever you have a card that’s potentially awesome or terrible depending on the game state, that’s a variance-creating card.

Conversely, low-variance cards tend to have more evenly spread out utility. They may be the kind of spell like Compulsive Research that you’ll be happy to cast in almost any circumstances. Another way to reduce variance is to give a card modes. This isn’t solely about Charms and Commands, though; many cards have non-explicit modality, like Searing Spear (removal or dome) and Unsummon (tempo play or removal dodge). Also, there are oodles of keyword mechanics that create modality and thereby reduce variance: Cycling, Kicker, Multikicker, Entwine, Fuse, Replicate, Overload, Bestow, Evoke, Reinforce, Retrace, Suspend, and Transmute, to name a few.

Some cards reduce variance not by virtue of their own power, but because they allow you to make a choice between situationally useful cards. Scry and its many non-keyworded cousins fall into this category. Lastly, any card that lessens the problem of drawing too many or too few lands is also a variance-reducer, because it mitigates the randomness of screw and flood. This includes Landcycling and Scry on the low end, and Landfall and Spellshapers on the high end.

Which direction do we turn these knobs in? Too little variance, and the game turns into chess; too much, and it’s Mario Party. Ultimately, the exact degree depends on the needs of the set. For example, Theros had Heroic, an inherently high-variance mechanic that relied on having both the Heroic creature and an aura or combat trick in hand. Therefore, it needed Scry to smooth things out and increase the likelihood of getting appropriately Heroic behavior.

Entropy


A low-entropy card is one that only gives the player a single kind of experience. Such cards do the same thing any time you play them: Lava Axe always burns face. Elvish Mystic always taps for {G}. But even simple cards can provide players with some variety of experiences: Divination may draw you a game-winning Frost Titan, or just a pair of Islands. Mind Rot might catch nothing, or it could clear out the Doom Blade they’ve been sandbagging.

High-entropy cards create a larger diversity of experiences. As before, explicit randomness is one way to achieve this effect, either through coin flips (Molten Birth), random discard (Hanabi Blast), or using library order (Goblin Guide, Bloodbraid Elf). Entropy also comes from spell modality (Gnarlid Pack) and on-board modality (Wooly Loxodon). Threshold mechanics increase entropy; attacking with a Sabertooth Outrider is a completely different experience if you’ve got Formidable.

While tuning variance is a mostly mechanical problem, managing entropy is about the players’ emotional state. Low entropy comforts players with familiar patterns; high entropy excites them with novelty. Every game needs some of both, although the balance point depends on each player’s experience level and personality.

These two types of randomness are not perfectly correlated. As shown here, a card can be high or low variance and high or low entropy:


low variance
high variance
low entropy


high entropy


Unpredictability and the Player


The diversity of Magic’s audience requires different kinds of cards for different people. Let’s take a careful look at how variance and entropy are received by players.

Appreciation for variance is negatively correlated with Spikiness. In part, this is because Spike wants to win by demonstrating her mastery of the game; success via random chance feels hollow to her. However, it’s also for a more practical reason: variance-reducing cards create more reliable decks, which are simply more likely to win consistently throughout a tournament. Spike will still play the most powerful cards if they’re high-variance, but she won’t be happy about it. And she’ll be even more unhappy about losing to high-variance cards. Remember the amount of vitriol directed at Bloodbraid Elf and the “Cascade lottery” during Jund standard?

On a different axis, appreciation for entropy is positively correlated not with any of the psychographics, but with experience. The more games a player has under his belt, the more he has seen all the common situations, and craves variety in card effects and board states. Many of the highest-entropy formats such as Cube and Commander are extremely popular with veteran players, who can process the giant smorgasbord of weird old cards and mechanics. Conversely, new players don’t need quite as much entropy. They can derive enjoyment from analyzing much more conventional game states, because they haven’t seen them all so many times before.

Wrapping up


I hope this article has changed your mind a bit about what “randomness” means in Magic. More importantly, I hope it informs your decisions on how to use these tools in design! Sound off in the comments and let me know what you think.

44 comments:

  1. All very good points. It crystallized at the end when Commander and Cube were brought up. I know that I've tended more towards those formats and Legacy/Vintage for the same reasons. I do however think that entropy gets conflated with decay, which is not the concept being expounded here. It a bit of a quibble, but perhaps volatility or variety would be a better word for the concept? Either way, its a minor issue compared to the powerful idea that randomness exists along two axis, and that you can balance the randomness in an environment by having an assortment of cards along both axis.

    ReplyDelete
    Replies
    1. Thank you for your thoughts. I was thinking of the word in the Information Theory sense, but it definitely does have connotations that I didn't intend to evoke.

      Delete
    2. The word choice 'entropy' definitely made it harder for me to process this helpful thought piece.

      Related reading: https://boardgamegeek.com/thread/1294990/input-vs-output-randomness

      Delete
    3. Interesting video, though I don't really agree with his conclusions. (Presumably none of us truly do, or we wouldn't be playing a game like Magic!)

      Delete
    4. It's not the best explanation of output/input randomness, but it's what I could find.

      Delete
  2. I'm not really sure these things are two "axes" as you suggest. They are super closely related. For example you list in Entropy about how Molten Birth offers a variety of play experiences, but that is explicitly because it is high variance.

    On the other hand, by nature of Magic being the game it is, even your low entropy examples have very high entropy. Lightning Bolt is very different in a mono Red burn deck than it is in a Grixis control list.

    The experience of ramping out a turn two Centaur Courser and feeling like you can't possibly lose because you have cast literally the best creature in Magic on turn two and the experience of chumping your opponents Centaur Courser on turn twelve with an Elvish Mystic are very different. So is throwing a Sword on your Mystic, or using it to blow someone out by blocking with it and Giant Growthing, etc.

    Aside: Your suggestion that Lightning Bolt is not explicitly modal (even though to my eye it very blatantly is) is a big part of Red's perception problem (and someone, I think it was Gavin, wrote an excellent article about this that I am too lazy to link to). Lava Axe and Flame Slash are two totally different cards, but the existence of cards like Lightning Bolt make them feel like they're all "just burn". If somehow there were plentiful cards that said "Target player draws two cards or discards two cards" players might feel that Black's color pie was similarly small.

    Aside on the Aside: Red really does have a problem, but I don't think the problem is actually so much that Red can't do enough things as it is that R&D has only been letting Red do one thing for so long. Finding a non-aggressive place for Red to exist will almost certainly involve some give and take on what abilities Red has access to, but I don't think that is necessarily indicative of Red not having access to "enough" abilities as it is.

    ReplyDelete
    Replies
    1. Well, I certainly agree that they are not simple independent variables, and that it's possible to tweak any card to have X variance and Y entropy the way we can with P/T.

      However, there's a subtle distinction I'd like to point out. Molten Birth is high entropy because it happens a random number of times. The fact that it happens a random number of times also makes it high variance, but it's not the high variance that causes the high entropy; they're both caused by a third factor. And factors can push the two in opposite directions as well: the different sizes on Gnarlid Pack are what cause it to be both high entropy and low variance. (Indeed, this is the case for modal mechanics in general, and modal mechanics constitute a huge chunk of the game!)

      Yes, in some absolute sense, every Magic card (creatures more so than others) is inherently high-entropy. That's built into the game. Still, looking at relative entropies is a useful design tool.

      Re: the burn conundrum, I certainly didn't mean to claim that Lightning Bolt is not modal. I think it's relatively low-entropy even as a modal spell, since it'll usually be used as premium cheap removal. But to address Gavin's point, I claim that the flavor overlap between its modes actually reduces the entropy! If it feels like one experience, then that's less novelty for the player. Humans are weird.

      Delete
    2. That first paragraph should read, "... that it's not possible ... "

      Delete
    3. Tommy, it's interesting that you mention 'the red problem', because that's actually something I was thinking of as I was reading this article myself - but for completely different reasons, haha!

      I've been musing a lot lately about red's usage of 'risk' as a price - much like how black can get access to a large variety of effects by sacrificing permanents or paying life, so too does red get access to a large variety of effects by making the effect 'risky', or not guaranteed to happen exactly as you'd like it. For example, Impulse Draw, or rummaging, both give red card draw at the price of gambling on whether it's what you want or not.

      When I was reading this article, I was thinking that a nice way to expand red's design space might be to exploit the ways that variance and entropy feel to the player - if properly done, they can certainly feel like you're "gambling" on a spell in your deck, even if the outcome isn't random. I'm still developing my thoughts on this - and trying to find specific examples - but I thought I'd share both because it's relevant, and to see if anyone else has any ideas on how red might use variance/entropy in order to expand its design space. ( :

      HavelockV, this is a great article. The only reason I didn't write a comment sooner is that I was kinda speechless! What a wonderful analysis, and I'd never seen it this way before.

      Delete
    4. I suggest the names variance and variety for easier grokking.

      Delete
    5. Although I'd welcome a term that captures the idea a bit better, I can't quite bring myself to use one that sounds so close and has the same Latin root.

      Delete
    6. Given how similar they are conceptually, I like the verbal similarity.

      Delete
    7. I think you should embrace the similarity in these two concepts more, rather than working so hard to make them separate axes.

      Variance directly causes variety, but there are forms of variety that don't come from variance.

      As your article goes on there are a lot of different forms of variety and variance that could be more clearly delineated if you were inclined. For example, at the end you start talking about variance and variety of formats, as opposed to individual cards (and how much variance and variety there is in a format is not necessarily a consequence of how much variance and variety there is in the cards).

      There are also multiple types of variance within cards that I think it would be beneficial to distinguish. A one drop 2/1 is high variance because if you draw it late in the game it isn't good, and the same is true of a 7 mana dragon (except reversed). That is very different from saying that Essence Backlash is high variance which is very different from saying that Friendly Fire is high variance.

      I think this article is very worthy of iteration if you really want to dig deep into these concepts.

      Delete
    8. I really don't agree that variance causes entropy. Lava Axe is almost the definition of a low entropy card, but it's high variance.

      Delete
    9. But I do agree that there's more to be said about the different forms of these concepts. In particular, the early game / late game tension I wrote about recently is tied into variance, though I haven't thought enough about it yet to say something definitive. Hmm...

      Delete
    10. I think saying that Lava Axe is high variance is an indication of why the different types of variance need to be spread more. It is certainly high variance in that sometimes it is good and sometimes it is awful, but I think that is true of nearly every card in Magic. Lava Axe certainly isn't a quadrant theory all star, but I think calling it high variance full stop is misleading.

      Delete
    11. Interesting! It sounds like you're talking about a much more nuanced notion of variance than I am. I'm really just thinking of it from a purely mathematical perspective; just define some kind of marginal utility function for drawing Lava Axe instead of another card, treat that function as a random variable over the space of all possible situations, and take the variance.

      But (if I understand correctly) what you want to do is look at the particular reasons that a card's (mathematical) variance is what it is, and develop a richer vocabulary for these cases. That makes sense to me. It's definitely an angle I hadn't considered.

      Delete
    12. From the optic that the goal of the game is to reduce your opponent from 20 to 0, Lava Axe is approximately 0 variance, it will always bring you 25% of the way there. A 2/2 for 2 runs the whole gamut from doing 0 damage to doing the whole 20.

      From another point of view, of your opponents 20 life points, 19 of them are worth literally nothing, and one of them is worth everything. That point of view, which I suspect you take, suggests this card is very high variance since it only does something if it gets the last life point. From a certain point of view this is true, though like the above viewpoint it fails to capture the complexities of a game of Magic. For example, if your opponent is at ten, and you play two Lava Axes, they should get equal credit.

      To further muddy the waters (sorry I 'm not making this easier for you), a Divination into two lands on turn 12 is about the worst possible Divination, but what if the card you draw next turn is your game winning Lava Axe (no I don't know what crazy deck you're playing). Does the Divination get the credit? If it does, Divination is actually pretty low variance, since it always gets you two cards deeper. If we instead look only at the two cards you get, it is pretty high variance, since they might be worthless.

      And, what makes it even more complicated is that the four simplistic arguments above are all probably correct for different decks and situations.

      Delete
    13. I think my notion of variance actually handles these cases successfully. Here's how I imagine a more formal definition. (I didn't put this in my original post for obvious reasons!)

      First, fix a pair of players and decks. Our probability space X is the set of all reachable game states at the beginning of your draw steps, weighted by the likelihood of reaching them. (This is well-defined since our players are stochastic machines of some sort.) Let S be such a game state. We further specify that S does not include any hidden information such as the cards in your opponent's hand, the order of your libraries, etc.

      Define U(A, S) = P(you win | card A is on top of your library) - P(you win | card A is not on top of your library). Then the variance V(A) is just the variance of the random variable U(A, S), where S ranges over X.

      I think this captures the subtleties in your example. For example, U(Divination, S) is usually pretty high, but V(Divination) isn't too small because of double-land whiffs. (And V(Divination) increases if you have Frost Titan.) U(Lava Axe, some state with the opponent is at 10 life) is probably pretty low, but it does take into account all future states in which you draw a second Lava Axe.

      Obviously it's hard to be certain whether this function actually does what I claim it does, since the definition isn't one you could reasonably use for an actual numerical calculation. But perhaps the intuition behind it makes sense to you.

      Delete
    14. Just realized my definition is nonsense because X is not a probability space, but a measure space. Insert some kind of normalization to make it make sense. (That's what I get for commenting tired...)

      Delete
    15. As a math professor I'm forced to appreciate any definition that involves choosing a measure, but I question whether this is useful since, as you point out, the amount of variance even in a simple Divination effect is already quite... varied according to the deck. I think it would be useful to have a definition of variance for a card that did not depend on the ambient deck (or at least did so minimally) and ideally also that did not require advanced mathematics to understand.

      Delete
    16. I think it's feasible top define deck-independent variance by taking a weighted disjoint union over all possible decks in the format when you calculate U.

      Alternatively, you could look at V as a variable on the probability space of decks and therefore new able to talk about the mean of variance, or variance of variance. (Since as you pointed out, Lava Axe decreases in variance of you have multiples!)

      But I'm not convinced there's any way for variance to be well-defined in a way that doesn't take some serious math. Magic's game tree is freaking gigantic.

      That's why I'm speaking to the intuitive notion of "how much does it help you win", which is easier to communicate. Ultimately, it means people will come in with different assumptions, though.

      Delete
    17. I think one can still nail down some specific types of variance.

      For example, Edicts are high variance because it matters what your opponent is doing, that is if they cast Hordeling Outburst or Centaur Courser. [Depending on how broad a view you take, this is either Matchup variance or Game state variance. Both are clearly legitimate types of variance we should consider.]

      Llanowar Elves (and just about every other one drop) are high variance because it matters when you draw them. Any time after turn 2 and they are probably a dead draw. [This can't really count, though, because otherwise all cheap cards and expensive cards are automatically high variance, which we probably don't want.]

      Lava Axe and Trumpet Blast are high variance because they depend on your deck. [I'd also argue this probably shouldn't count as variance, because you get to decide your deck. Rather than "high variance" I would call cards like Lava Axe and Trumpet Blast "narrow," meaning that only specific decks in the format want them. Narrowness and broadness are super, super important concepts to consider when developing for draft, but I don't think it is correct to consider them part of variance.]

      Gilt-Leaf Ambush and Molten Birth are high variance because even as you cast them you literally don't know what they are going to do. It is explicitly random. [This could be called variance, but I think it is far more useful to call it what it is, these cards are "random" rather than simply high variance. That has a clear design meaning.]

      Divination and Mind Rot are high variance because we don't know what will happen when we cast them, but we would if we fully understood the game state. We might have paid attention to our opponent to know they are on no lands in hand with Mind Rot, or scryed so we know we're more likely to draw good cards with Divination, etc. When our opponent attacks their 4/4 into our 3/3 and we debate if we should Giant Growth our 3/3 but worry they might have Divine Verdict, our Giant Growth is displaying this kind of variance (and of course so do most all combat tricks). [I'd call this "hidden information variance."]

      Delete
  3. Really interesting read, and if nothing else I've picked up a new bit of vocabulary! Probably because I'm Spikey, I have the greatest positive reaction to low variance, high entropy cards and the greatest negative reaction to high variance, high entropy cards. Give me a card that affords me a ton of ways to misplay it and I will love it.

    Would so-called "meta decks" also fall into the high variance, low entropy category? e.g. Dredge's contributions toward winning the game all play out extremely similarly in games where the opponent doesn't have any grave hate, and its contributions toward trying and failing to win the game also play out extremely similarly in games where the opponent begins with a Rest in Peace in play.

    ReplyDelete
    Replies
    1. Hmm, I hadn't considered the idea of how these concepts apply on a deck-wide level! I think the idea of entropy transfers just fine, and most combo decks are very low entropy. It's a little less obvious how variance transfers over, since the naive mathematical notion doesn't work.

      Here's a thought experiment which might clarify what I mean by that: Suppose deck A is a combo deck that kills on turn zero regardless of hate 50% of the time, and fizzles and loses the other 50%. Deck B is a conventional aggro deck that goes 50/50 against every other deck in the meta. Our intuition tells us that deck B is more reliable than deck A and therefore has lower variance. However, there's no obvious mathematical sense in which that's true; their behavior is completely identical in terms of winning and losing. In order to make such a definition work, we'd have to quantify a notion "winning by a lot" or something like that.

      Delete
    2. I guess you could look at differing win rates against the entire meta so that it's not just a Bernoulli variable; then you can have meaningfully different variances.

      Delete
    3. Can we not say a deck is high variance if it includes more card names than a deck without and that tutors reduce a deck's variance?

      Delete
    4. Using card names is difficult because of redundancy in card effects. For example, a typical Monored Burn deck in cube consists entirely of 1-of spells with no tutors, but will yield very similar draws from game to game because all the spells do more or less the same thing.

      Delete
    5. I think Jenesis' point is broader than that narrow case, though; every Cube deck is all singletons, and most of them have no tutors. Does that mean most Cube decks are equal in variance? Surely that can't be true! Or, say, tutor-less EDH decks...

      The real problem (from my perspective) is that defining a random variable to take the variance of requires putting arbitrary labels on how well a deck does; like, my Storm deck performs at 10 half the time and 0 half the time, where my aggro deck performs at 5 100% of the time. Same mean, different standard deviation. But now I'm required to describe what it means for a deck to perform at a "10", which is hard to so in a simple, consistent fashion.

      Delete
    6. "Count cardnames" was short-hand for "identify the number of different things in your deck" and I'd expect a deck running both Searing Spear and Incinerate to treat them as, like, 1.1 different cards.

      The number this evaluation leads you to is an entirely different measurement from the number in Havelock's last example. It has nothing to do with deck success, only how often it does the same thing.

      Delete
    7. Rather be pedantic than make unsupported assumptions!

      If we let each individual player come up with a valuation for "how different are cards A and B" then we end up with a fuzzy weighted average for each deck; however, trying to come up with a universal scale of "different from rest of deck" even for a simple card like Lightning Bolt, given how many different decks it can go into, is probably unworkable.

      Delete
    8. Another assumption I seem to have made is that it is neither possible nor fruitful to come up with a perfect formula for calculating a deck's variance; that an estimation is as close as we'll get and informative enough.

      Delete
  4. First, if anyone reading this hasn't watched Dr. Garfield discuss Luck versus Skill in games, I'd highly recommend it: https://www.youtube.com/watch?v=dSg408i-eKw

    Second, I thought this was a great article and helps to crystallize some things I've been thinking about in game design.

    Third, I like Jenesis's comment on mapping psychographics to the square.
    low entropy high entropy
    low variance Spike Johnny/Timmy
    high variance Johnny/Spike TImmy

    ReplyDelete
    Replies
    1. I cannot recommend the Garfield video enough. I post that link around here a lot!

      Delete
    2. I'll definitely watch that video when I've got a free hour.

      Delete
    3. Where's Spike/Timmy?

      And another data point that Spike doesn't necessarily have to dislike high variance: In his set review of Origins, LSV notes that Avaricious Dragon "is extremely swingy" but "I like high variance cards, and this really is powerful, so in a deck that can naturally empty its hand I think this is a gamble worth taking." He also has a well-documented fondness for 7-drops.

      Delete
    4. I've always thought of LSV as a Timmy. Have you seen the Lorthos draft?

      Delete
    5. Yes and it is amazing.

      My favorite LSV draft is DGR #1 though. (How do I do the fancy linking thing?) Or as a commenter has called it, "LSV tortures BenS for 30 minutes."
      https://www.youtube.com/watch?v=skklzQyTUeM

      Delete
    6. html:
      <A HREF="url">text</A>

      Delete
  5. Yes, I think this is a fantastic article, although I don't get why "entropy" is the term used.

    In fact, I wonder if maybe "perceived variance" and "power variance" might be better terms. Or even "variance on play" compared to "variance in deck-building"?

    ReplyDelete
    Replies
    1. Perhaps "power variance" and "experience variance" are the most intuitive ways to get across the concept. They're a bit of a mouthful, though...

      Delete