Tue Apr 12 2022 | Zack Capozzi | College
College Women
PHOTO BY KEVIN P. TUCKER
Welcome to Beyond the Basics!
My name is Zack Capozzi, and I run LacrosseReference.com, which focuses on developing and sharing new statistics and models for the sport.
The folks at USA Lacrosse Magazine offered me a chance to share some of my observations in a weekly column, and I jumped at the chance. Come back every Tuesday to go beyond the box score in both men’s and women’s lacrosse.
It all started with win probability. I was watching a Notre Dame-v-Denver men’s lacrosse tournament game, and as the game got down to crunch time, the question came up: I wonder what their win probability is right now? At this point, the concept of win probability was not especially new. ESPN’s game pages for baseball and football regularly showed the probability that either team would win whatever game was going on at the moment.
But as you might imagine, the Google search to determine the win probability in the lacrosse game we were watching turned up a grand total of zero results. Before the end of the game, LacrosseReference.com had been purchased from GoDaddy — and the rest is history.
As the earliest tool in the lacrosse stats toolbox, I sometimes forget about the novelty of win probabilities in lacrosse. But as we work to bring more statistical insight to lacrosse, I thought it would be worthwhile to go into a bit more detail on the metric and the process that gets it on to your screens each week.
Let’s start with the basics. Win probability refers to the chance that a specific team will win a specific game. If you had two perfectly evenly matched teams playing a game on a neutral field, both teams would open the game with a 50 percent win probability. And of course, the win probability for the two teams should always add up to 100 percent.
Abstract concepts can be tough, so let’s put this in terms of what you know in your gut from watching years of lacrosse. Here are the win probabilities for home teams with the specific halftime leads:
Tied: 51.3%
Leading by one goal: 69.0%
Leading by two goals: 82.3%
Leading by three goals: 91.9%
So if your team is ever down on the road by a goal at halftime, know that there is still a 3-in-10 chance that they come back to win the game.
These are averages, and they obscure differences between teams. A team that plays at a quicker pace has a greater chance of coming back because there are more possessions and more of a chance to bridge the gap.
My favorite part of having a win probability model is being able to produce win probability charts throughout the course of a game. Here’s an example from a recent Ohio State victory over Louisville.
Can’t you just feel the excitement of the Louisville comeback to tie it at the end of regulation? And then a tense overtime period punctuated by the game-winner for the Buckeyes.
But how does one come to a win probability for a given matchup?
In short, you need two things: A) a numeric ranking to determine the relative strength between the two teams, and B) some formula for converting that into a probability. I use my Lax-Elo ratings to estimate the strength of the two teams. And there is a tried and true formula for converting Elo ratings into a win probability (which I wrote about at length earlier this year).
Take the recent game between Marquette and Villanova as an example.
Using the Elo ratings (1546 for Marquette and 1462 for the Wildcats) and after applying the home-field advantage bump (about four percentage points), we end up with Marquette as a slight favorite. (As it turns out, the Golden Eagles ended up winning 19-14). As you might have guessed, the in-game win probability chart like I showed for the Buckeyes always starts with the pregame win probability as the starting point.
But interpretation here matters quite a bit. And this is frustrating for some people, but that 61 percent should be interpreted as: “if these teams played 100 times, we would expect Marquette to win 61 of those games.” It definitely does not mean that the model is 61 percent confident that Marquette will win.
This is a bit odd, but this also means that if the Win Probability model gives Team A a 90% chance to beat Team B, there is nothing wrong with the model if Team B ends up winning the game. The issue would arise if, out of 100 90-percent win probability games, the favorite wasn’t winning around 90 of those games. When the model says 90 percent, you want it to mean 90 percent.
Where pre-game win probability is a simple calculation based on the outputs of a simple model, producing a live in-game win probability is an entirely different animal. There are three factors that go into the real-time win probability calculation. The first, which we have already discussed is the pre-game win probability, which is based on the Lax-Elo ratings.
The second, which we’ve also already touched on is time-and-score. Put simply, teams that are winning by more have higher win probabilities. Conversely, the same lead is more likely to result in a win if there is less time left for the opponent to mount a comeback. From there, you would just reduce the weight given to the pre-game estimate and increase the weight given to time-and-score as the game progresses. If you did that however, you’d have a very strange win probability estimate indeed.
The reason that these two factors is not enough is that win probability is very dependent on what’s going on in the game. You might have two evenly matched teams, tied at 10 on a neutral field with two minutes to go in the game. Is the win probability 50/50? Probably not. If Team A just won a draw control, they are probably going to have the upper hand because they have the ball. The win probability estimate should reflect that.
As I wrote about last week, in my approach, the expected-goal values of individual plays are the technique for assessing which team has the upper hand at any given moment. With an estimate of how many “expected goals” each team is set to score in the next sequence, we can adjust the time-and-score factor to be more responsive to what has happened in the flow of the game.
With this third factor in hand, we’ve got what we need to calculate a win probability for any game at any point in the game. And the win probability chart flows from that.
But while win probability is a fun way to second-screen a game, it actually has other uses once you have the model in place. For example, game control is a metric that describes how dominant a team is throughout their games. You could argue that the best teams should have the highest cumulative win probability. A last second, come-from-behind-victory where a superior team overcomes a sluggish first half is not as impressive as a dominant game start-to-finish. (Note: the game control metric below is adjusted to account for the strength of the opponents faced, so please don’t think I’m suggesting that UMass is better than Boston College.)
To calculate a game control metric, all you need to do is take the average win probability for every team at every point in every game they’ve played. As of right now, the top five teams in Division I women’s lacrosse, with respect to game control are:
Maryland: 77.2% cumulative win probability
North Carolina: 76.9%
UMass: 74.8%
Boston College: 74.7%
Davidson: 64.5%
The other big benefit of the win probability concept is that once we know the win probability for each team in any match-up, we have a great way to project outcomes down the road, even all the way to Selection Sunday. Take the ACC tournament as an example:
Every night, I run through the remainder of the season (up to Selection Sunday) 3,000 times to get to an idea of how the post-season might play out for each team. By knowing the win probabilities for all remaining games:
You can see where teams are likely to end up seed-wise in post-season tournaments.
You can project how likely a team is to end up in every RPI spot
You can also run through the tournaments themselves and see how often each team comes away with an AQ
If you make some assumptions, you can even estimate how likely each team is to have a home-game in the NCAA tournament
(If you are curious, here are the numbers for Jacksonville.)
I hope I’ve given you a useful view into how the sausage gets made. Win probabilities are in that class of metrics that are pretty easy to interpret and very difficult to create. But to really incorporate them into your lacrosse fan toolkit, I do think it’s important to have some idea of how what you see on the field translates into what you see on the screen.
My goal with this column is to introduce fans to a new way to enjoy lacrosse. “Expand your fandom” is the mantra. I want you to walk away thinking about the players and stories presented here in a new light. But I also understand that some of these concepts can take some time to sink in. And part of the reason for this column is, after all, to educate.
To help this process along, I have several resources that have helped hundreds of lacrosse fans and coaches to internalize these new statistical concepts. The first is a Stats Glossary that explains each of my statistical concepts in more detail than I could fit here. The second is a Stats 101 resource, which provides context for each of my statistics. What is a good number? Who’s the current leader? That’s all there.
And last, I would love to hear from you. If you have questions or comments about the stats, feel free to reach out.
No. 11 UC Santa Cruz jumped three spots this week and now resides just outside the top 10.
Clemson Club made a significant jump to join the top 10 for the first time.
Keiser picked up one more first-place vote, giving them all eight this week.
The Rams top the D-II poll for the third consecutive week.