The @UmpScorecards platform relies on three key metrics to analyze umpire performance: accuracy (and expected stats), consistency, and favor. These metrics are calculated in house using algorithms inspired by others in the baseball community and developed by the @UmpScorecards team. To read more about these algorithms and the thought process behind creating them, click on one of the links below.
There are also a few more secondary components that we use in our platform and algorithms that we thought deserved some explanation. Feel free to explore the links below.
What data do you use, and where does it come from?
MLB releases detailed data for every pitch of every game. Each morning, my program grabs all of this pitch by pitch data from the previous day’s contests. Within the data, each pitch is assigned 89 attributes, from the pitcher's release position to the pitch’s horizontal acceleration. We care about 5 of those 89 values. Two are the pitch’s horizontal (plate_x) and vertical (plate_z) position as it crosses the plate. Two are measures of the top and the bottom of the strike zone (sz_bot and sz_top), values that reflect the size of the zone once adjusted for batter height and stance. Finally, we use the resulting call of the pitch. In conjunction, these 5 values can tell us whether a pitch was a strike or a ball, and whether or not it was called correctly.
Why doesn’t your data match with what I saw on T.V.?
For one, the box is not exact. Camera angles and shake can alter the position of the T.V. strike zone relative to the true strike zone. Second, TV boxes often don’t correctly represent the ball itself. The dot that the TBS electronic strike zone uses to represent the ball, for example, is much smaller than an actual baseball. Finally, it is unclear which, if any, T.V. broadcasts adjust the virtual strike zone to adjust for batter stance on a pitch by pitch basis.
Why doesn’t your data match with what I saw on this other online graphic?
Strike zone graphics from Baseball Savant, ESPN, and other sources do not adjust the top and bottom of the strike zone between at bats. MLB's Gameday feature adjusts between at bats but (seemingly) not in between pitches. At the same time, the @Umpscorecards strike zone plots adjust for the strike zone of every pitch, resulting in differences relative to the graphics of other providers. For more information on how this works, click here.
Why doesn’t the archive data always match with the Twitter graphics?
Unfortunately, MLB does a small amount of post-processing of each game's data which can have a small (but noticeable) impact on results. That means that each time the data on the site is fully refreshed, the archive may no longer be fully aligned with the graphics.
Why is some data on the archive marked with an * or marked as ND?
As of v3.0.0, the @UmpScorecards platform tracks games that have erroneous or missing pitch data to ensure that game counts — the number of games each team has played, for example — are correct across the site. On the Games tab, such games are marked with an * if 5 or fewer pitches are missing data, and ND otherwise.
The date on which the game was played.
The home plate umpire at the start of the game.
The team name abbreviation.
Home (Home Team)
The home team name abbreviation.
Away (Away Team)
The away team name abbreviation.
The number of games played by the team or umpired by the umpire.
The number of games won by the team.
The number of games lost by the team.
R [H] (Home Team Runs)
The runs scored by the home team
R [A] (Away Team Runs)
The runs scored by the away team
PC (Pitches Calls)
The number of pitches called by the umpire.
IC (Incorrect Calls)
The number of pitches incorrectly called by the umipre.
CC (Correct Calls)
The number of pitches correctly called by the umipre.
xIC (Expected Incorrect Calls)
The expected number of incorrect calls made by an average umpire.
xCC (Expected Correct Calls)
The expected number of correct calls made by an average umpire.
CCAx (Correct Calls Above Expected)
The difference between actual correct calls and expected correct calls.
The percent of called pitches called correctly by the umpire.
minAcc (Minimum Accuracy)
The minimum percent of called pitched called correctly by the umpire in a single game, across all gamed umpired by the umpire.
maxAcc (Maximim Accuracy)
The maximum percent of called pitched called correctly by the umpire in a single game, across all gamed umpired by the umpire.
xAcc (Expected Accuracy)
The expected number of correct calls divided by the total number of calls made by an umpire.
AAx (Accuracy Above Expected)
The difference between actual accuracy and expected accuracy.
The percent of calls inconsitent with the established umpire zone.
avgCon (Avgerage Consistency)
The average percent of calls incosistent the established umpire zone in games umpired by the umpire.
Fav[H] (Favor [Home])
The difference between the home team's run expectancy impact and the away team's run expectancy impact for a given game.
avgFav (Team Avgerage Favor)
Total Favor divided by number of games played by the team.
avgFav (Umpire Avgerage Favor)
The absolute value of Favor for all games umpired by the umpire divided by the number of games umpired.
avgLev (Avgerage Leverage)
The average absolute value of the run impact of each missed call by the umpire, across all games umpired by the umpire.
avgPI (Avgerage Pitcher Impact)
Total Pitcher Impact divided by number of games played by the team.
avgBI (Avgerage Batter Impact)
Total Batter Impact divided by number of games played by the team.
avgRI (Avgerage Run Impact)
The absolute value of the run impact of each missed call by the umpire, across all games umpired by the umpire, divided by the number of games umpired by the umpire.
pctFav (Percent Favored)
The percent of games played by the team in which the team had a greater total run impact than the team's opponent.
totFav (Total Favor)
The sum of the Total Batter Impact for the team and Total Pitcher Impact for the team.
totRI (Total Run Impact)
The sum of the favor of every missed call.
totBI (Total Batter Impact)
The sum of the run impact of each missed call when the team is batting, for all games played by the team.
totPI (Total Pitcher Impact)
The sum of the run impact of each missed call when the team is pitching, for all games played by the team.