Individual Pitcher Platoon Splits: How Real?

Jul 14, 2023

Setting the Stage

I am lucky never to lack for material, not just in terms of things to say, but subjects to tackle. So, no matter what it might look like, I am actually not trying to milk this platoon thing for all it is worth, cycling through regular hitters, switch hitters, and now pitchers in the way one might preview different divisions or demographics. In the wake of finding that individual platoon splits for regular hitters are flimsy, studying their basis with pitchers seemed nothing less than a moral obligation. You can review the hitters’ study here click health, including technicalities about how various statistics were calculated.

The main reason that I imagined hitters would differ from one another in the size of their platoon difference, namely, that they would differ in how well they “saw the ball” and in how well they “hung in there” doesn’t seem to apply to pitchers, and from that standpoint, we might expect pitchers to evince smaller individual platoon differences. On the other hand, pitchers may be able to affect how well same-side batters “hang in there” based on their arm angle. Pitchers also seem to feature very different repertoire depending on whether they’re facing righties or lefties. Just one element of this is that some reserve their change-up for batters from the opposite side. The same pitch may also play very differently versus righties and lefties. Think of how effective inside sliders are coming from a pitcher from the opposite side of the batter, for instance. The nuances that could create individual platoon profiles for pitchers are seemingly endless, and even someone like myself, somewhat lazy about taking in these types of details, can build a strong case.

To be clear, the hitters’ study actually suggested that individual platoon effects do in fact permeate the individual record, but with the exception of strikeouts, they are much smaller than most would believe. Much variation in performance from player to player in baseball is random, but the surprise was in learning that individual platoon effects are mostly random. I combined the 2012 - 2015 seasons and the 2016 - 2019 seasons and found a correlation in platoon batting average difference of the two periods of just .17. For Isolated Power, the platoon correlation was .15. Walk rate platoon difference had a higher correlation, .28. Strikeouts were the only category with the kind of reliable difference from player to player I had expected: a .50 correlation.

Who Is in This Study?

In studying pitchers, I kept the template I used for hitters, combining the 2012 - 2015 seasons and the 2016 - 2019 seasons and correlating the statistics between the halves. In doing this, one concern I had was that pitchers seem to work so hard at shaping their pitches and integrating analytics that they can metamorphize from season to season. Would putting four seasons together then be problematic? To get the requisite sample size in terms of plate appearances, I doubted I had much choice, however. And 2012 - 2015 really precedes the current analytics age, if 2016 - 2019 doesn’t. There didn’t use to be so much evolution in pitch shaping. Then I was also worried about how many pitchers I would be able to find who compiled records in both halves, as it is my perception that pitching arms are regularly snapping in two, and pitchers who last more than four seasons are lucky to do so.

Arguing in favor of keeping the same definition of periods I used in the hitting study was that I had gone to great lengths to analyze whether righties and lefties faced different platoon environments in the two halves. I concluded that they did not, to any real extent. This therefore allowed me to augment my sample size and consider platoon changes from right-handers and left-handers as one and the same. Changing the periods even a little would have meant that I would have needed to examine this assumption with the same thoroughness this time around, and if I had found a difference between righties and lefties comparing the periods, I would not have been able to combine them in analysis.

Then, too, the size of the correlations is almost certainly arbitrary, and partly a function of the number of plate appearances that go into each half. The idea of keeping the arbitrariness the same and facilitating comparison between hitters and pitchers as much as possible was appealing.

Having said this, while I did mandate at least 400 plate appearances facing right and left per time period, just as I had with hitters, the size of the correlations is affected by the actual number of plate appearances involved. A 400 minimum in both cases does not mean that the hitter and pitcher datasets were equally robust. We know that starting pitchers outrun hitters in plate appearances when both are playing regularly, for instance, although this fact does not mean that the pitchers who qualified pitched to more hitters on average than the hitters who qualified faced pitchers, if pitchers tended not to stay around as much as hitters, or if relievers entered the dataset, etc.

In any event, the net caught 118 pitchers this time around, compared to just 75 in the hitters’ study. In my mind, this is good first and foremost because it means that the study truly represents Major League Baseball at this time. We could be said to be looking at a set of pitchers, not a subset. It is also true that the higher your n is, the more accurate your correlations are. They have smaller confidence intervals, and have more of a chance to be significant, which never hurts. When we’re talking about 118 vs. 75, the difference is small, as confidence intervals grow in inverse not of sample size, but of the square root of sample size. But if 75 players were sufficient before, the sample is certainly more than adequate this time.

It is inevitably a very right-handed dataset. To begin with, while left-handed hitters comprised 42% of plate appearances from 2012 - 2019, left-handed pitchers were on the mound for only 28% of plate appearances. Compounding the difficulty with studying left-handed pitchers, the hitting team has more flexibility to control platoon match-ups, and when left-handed pitchers pitch, the rate of left-handed hitters is down to 29%. So, with the requirement of 400 plate appearances-per-period-per-platoon situation absolute, only 15% of qualifiers (18 of the 118) ended up left-handed. They tripped on that lefty-lefty requirement. The plate appearance minimum was a foundational piece of this study and a carryover from the one before, however, so I did not entertain modifying it.

My fear that pitching in two four-year periods was unrealistic appears to have been unfounded. Verlander, Scherzer, Darvish, Greinke, and Wheeler were very much names during both periods and were qualifiers, but so too were Kyle Gibson, Carlos Corrasco, Chris Archer, Jordan Lyles, Collin McHugh, and Jake Odorizzi. They don’t always float above the fray, but might well have qualified as well had the study expanded to the most recent time period. I also noted relievers Fernando Rodney, Kenley Jansen, and Tyler Clippard, and there were probably others.

Just as the hitting team’s hunting the platoon advantage leaves lefty-lefty encounters rare, it pushes lefty-hitter-versus-righty-pitcher match-ups almost to equity with righty-hitter-versus-righty-pitcher match-ups. Right-handed pitchers, therefore, qualified easily. Many surely met the platoon requirements of a period in just one season. Thirty-seven right-handed pitchers faced 400 left-handed batters in 2012, for instance. Even in 2019, when workloads had decreased, fourteen did so. With the two periods, the study does require that the player pitched extensively in at least two seasons. From a statistical standpoint, I am actually not the least bothered by some qualifiers pitching in just two seasons. If there is a problem, it is that pitchers pitched different numbers of seasons, and more consistency could likely be expected from those whose halves spanned less time. So, when I get to the summary analysis, I won’t be able to label the results and identify its exact meaning in the way I would like to. Luckily, I did stumble into evidence that this variation in active years might not have made the difference I feared, and I will discuss that in the analysis.

Study Categories

My six holdover categories from the hitting study were Batting Average, OPS, Isolated Power, Home Runs per At-Bat, Walk Percentage, and Strikeout Percentage. (Before doing this work, by the way, it had never occurred to me that platoon pitching analysis forces us to analyze from the hitter’s perspective and to discard traditional pitching stats. What kind of sense does “E.R.A. versus left-handers” make, for instance? This must have been a bit of a leap for some fans when the need of analyzing pitcher platoon performance first arose.) In the interest of being able to analyze at the most fundamental level, I added to the mix Batting Average on Balls in Play (BAbip), conveniently available on Baseball Reference. Then the FanGraphs Splits tool enabled me to download GB/FB ratio by platoon situation. I converted that to Groundball Percentage (a 2/1 ratio equals 66.7) and went on to figure the platoon differences just as I did with the other statistics.

The trend is that pitchers get more groundballs versus their own dominant hand type. MLB wide, the differences by platoon situation and period ranged from 6.1 percent more ground balls by lefties against lefties from 2012 - 2015, to 1.3% more ground balls by righties against righties from 2012 - 2015.

The Platoon Terrain, and Issues in Combining

I had forgotten that there is a healthy difference in average platoon difference for left-handers and right-handers. This holds whether we are talking about pitchers or hitters. Framing the difference in terms of OPS from 2012 - 2019, the root cause is the .669 for left-handers versus left-handed pitching. Right-handers versus right-handed pitching fared much better at .711. For right-handed hitters versus left-handed pitching, and left-handed hitters versus right-handed pitching, there is no OPS difference (.753 versus .751). (All of these numbers from all of MLB, not from just my dataset.)

Almost to a man (87% over 2012 - 2015, 85% over 2016 - 2019), right-handers walk left-handers more often than they walk right-handers (and I’ve excluded intentional walks). Judging by a weaker walk platoon tendency from left-handed pitchers, I think this is partly caused by the left-handed hitters who happen to be in the majors being better at drawing walks than the right-handers. Walks are rather an exception, though. Right-handed pitchers do not generally show such reliable platoon splits. Their differences are particularly small when it comes to allowing home runs. The 100 right-handers allowed a home run percentage to left-handers just 0.1% higher than the home run percentage they allowed to right-handers over 2012 - 2015; from 2016 - 2019, the percentages were even. By contrast, the left-handed pitchers in the study showed a HR% platoon split of 1.0 over 2012 - 2015, and 0.9 over 2016 - 2019.

For mathematical perfection, I would like for right-handers and left-handers to have the same platoon differences, but this is a study of correlating two periods, so the key to possible bias lies in whether the difference between the types changes. For the most part, it does not. The home run statistics cited above are a good example. In all, for the eight categories under study, there was an average effect size difference of 0.50 between right-handers and left-handers over 2012 - 2015, a number that increased to 0.57 over 2016 - 2019. But the average difference in the 1215/1619 effect size comparisons was just 0.24.

As a fallback, I did all analyses separately for each hand, anyway. I also took a category where the relative platoon difference changed somewhat from 1215 to 1619, strikeouts, and systematically changed the strikeout rates to eliminate the difference. This changed the strikeout consistency correlation by only .01.

First Results, Consistency Correlations. Discussion, Stat by Stat.

Here are the consistency correlations, also shown for right-handers and left-handers when the groups are left separate. With the numbers established, I will then engage in a general discussion, liberally referencing the respective numbers in the hitting categories for comparison.

Batting Average .17 (RH .15, LH .20)

OPS .24 (RH .25, LH .02)

ISO .38 (RH .32, LH .37)

HRAB .33 (RH .28, LH .14)

BB Rate .34 (RH .32, LH .32)

SO Rate .58 (RH .58, LH .64)

BAbip .00 (RH .02, LH -.04)

Ground Ball Rate .74 (RH .72, LH .74)

Spurred by the surprising hitting results, my great desire was to learn if individual pitcher platoon really exist as they are assumed to. Even with the benefit of these results, I think it’s a question that can be hotly debated. The answer requires a careful reading of the data. I may change my mind in the writing of this, but my current answer is that while pitcher platoon statistics appear only a tad more formful than hitter platoon statistics, that is basically a validation of them, as I think of hitters as having more control over outcomes.

Pitching Platoon OPS only carried over with a .24 correlation. It’s both a fairly low correlation, and the same as for hitters (which was .23). OPS was the “bottom line” statistic I used here, so some would insist that the same randomness that dominates hitting platoon stats drives pitcher platoon stats as well.

I do see a real difference between pitching and hitting platoon consistency in terms of power, though. Isolated Power for hitters carried over at just .15; for home runs divided by at-bats, the correlation was .20. By comparison, the two power platoon correlations with pitchers were over .30.

Not to reprise the other post, but the power numbers with hitters are just very surprising. It suggests hitters keep the same swing versus both types of pitchers, and home runs are just a matter of “running into one.” Or maybe what the numbers say is more nuanced, because we do see general power platoon differences. They just are fairly uniform, player to player. It’s a weird picture, I don’t see any way around that.

We get a big clue as to what may be behind the pitcher power platoon correlations looking at the gargantuan ground ball platoon correlation of .74. Within a period, the correlations between the two power stats and ground ball percentage varied from .46 to .57. A home run is by definition “not a ground ball,” but I assume the correlation between the categories would remain fairly large even with the actual home runs removed. The platoon ground ball correlation probably reflects the different pitch mixes pitchers use against left and right. We know from the means that they are generally more successful at getting same-side hitters to hit ground balls, although this is a safer bet for left-handers than right-handers. But the consistency correlation in fact speaks to something else: particular pitchers differentiate themselves by having bigger ground ball platoon splits than others. The correlation coefficient compares pitcher to pitcher, and doesn’t care what the means are. So, some pitchers have repertoires that cause big changes in their ground ball rate depending on whether they’re facing lefties or righties, and some don’t. Or, perhaps it’s more accurate to say that some pitchers change their repertoires more than other pitchers, and the result is a bigger difference in ground ball rate. Or it may sometimes be true that pitchers change their repertoires equally, but, starting with different offerings, the changes have different or even opposite effects. Regardless, it all results in big and consistent differences from pitcher to pitcher.

One would certainly have thought the difference in power platoon consistency would have led to more OPS consistency for pitchers than hitters, since power is a part of OPS, but it didn’t. It’s puzzling, but perhaps just one of those things. The platoon batting average consistency for pitchers was just .17, and that certainly hurt platoon OPS consistency. The .17 was the exact same as for batters. When it comes to BAbip and pitchers, it’s a given that there just isn’t any juice to squeeze. Add this .00 correlation to your anti-BAbip ammunition. Since BAbip Allowed is just a function of luck, it stands to reason that it didn’t appear filtered through the nuance of the platoon statistic, either.

There is much commonality in the hitting and pitching platoon correlations. That wouldn’t be a surprise if the numbers were all around 0, but that’s not the case. Because the mechanisms between hitter and pitcher platoon consistency seem very different, I find the similar numbers by category surprising. It’s possible the commonality says something about which statistics are generally reliable, and not about hitter- and pitcher-platoon dynamics being similar. This is all a transition to noting that strikeout percentage for pitchers was .58. The category had been the one ray of light for the theory of big individual hitter platoon differences, with a .50 correlation. Any of the basic arguments for individual pitcher platoon differentials in general could apply to strikeouts, with different arsenals depending on the hitter’s side of the plate and different effects of arm angle coming to mind.

Walk rate was formful for pitchers — a .34 correlation. You’ve probably never seen this cited individually as part of a platoon split, but perhaps you should have. From 2012 - 15, Lance Lynn walked 5.5% of righties, but 12.1% of lefties. From 2016 - 19, his righty/lefty percentages were 6.4% and 11.4%. Zack Wheeler’s splits were right behind Lynne’s. The separate pitching playbooks likely factor in here: different pitches are used, and are controlled more or less easily. The pitcher throws to one side to one type of batter, which he controls more or less well than when he throws to the other side to the other type of batter. There are a million possible theories (although most are probably bogus, since we’re not looking at a huge correlation). Some pitchers may make more of the platoon difference than other pitchers and nibble more (although that doesn’t necessarily strike me as something that would hold up over time). It’s also logical that walk rate could just reflect a pitcher’s general mastery in his ability to pitch to left and right, and not his control per se. Part of not walking hitters is getting them to chase pitches. But, because walk platoon differential doesn’t have much to do with Isolated Power differential (correlations of .09 and .10 in the periods), and a general mastery factor for limiting batting average is hardly emerging at all, I don’t think this idea has much credence. Platoon walk rate seems close to separate for pitchers. Note that, again, this is a correlation very much in line with its hitting counterpart, which was .28. But with walk consistency, it again seems to me like another case of hitters and pitchers getting to the same point from different paths.

Comparing Correlations for Righties and Lefties

A comparison of the separate righty/lefty correlations suggests that these processes work very much the same for both types of pitchers, whatever exactly they are. And based on the fact the left-handers essentially had to be active for more years than the right-handers, given the reality of the number of innings needed to pitch to 400 left-handers, and yet produced eerily similar correlations, the case can be made that a pitcher’s platoon profile endures. I should say that the correlations are peculiarly similar and almost surely, merely coincidentally similar; statistics tell us that an n of 18 does not allow for any kind of precision. For instance, we cannot conclude at the .05 significance level, even using a one-tailed test, that the platoon Isolated Power correlation of .37 for left-handers is above 0. So, to focus on the overlapping numbers is to capitalize on coincidence. Guilty as charged.

Predicting Platoon OPS from Other Platoon Differences, Intro

The picture that emerged in the hitter’s study was so different than what is assumed that it would have been natural to characterize it as “Individual platoon effects are bullshit.” Bill James must have accomplished dozens of these takedowns in the Baseball Abstracts. I don’t know that he ever did one precisely on clutch hitting, but that’s everyone’s go-to for research raising the specter of illusion. Maybe “hot hands” is a better example; James definitely did that one, and he was always overlooked in my academic textbooks in favor of later, redundant work.

But my critique was really a qualified critique. My study could be more accurately characterized as “Individual platoon effects are bullshit EXCEPT for strikeouts.” The analogue with Voros McCracken is “pitcher Hits Allowed are bullshit, except for strikeouts and home runs.” McCracken found that out rate when defined that way did repeat, while BAbip didn’t. This line of thought is that a whole category can be better understood by utilizing a smaller portion of it.

It seems a steeper hill to climb to show that strikeout platoon differential, standing up even while its statistical peers collapse, goes on to predict general platoon offensive performance in the name of OPS. The skepticism is first that strikeouts don’t have a ton to do with what kind of a hitter someone is in general, and second of all that individual platoon effects just have to be weaker than standard individual effects (McCracken wasn’t dealing with anything like a platoon effect). So the idea that strikeout platoon differential might be the key to future OPS platoon differential is dubious, but necessary to look into nonetheless. At least this time, for pitchers, I have found another platoon variable with even greater consistency than strikeouts: ground ball rate. So there are two candidates for predicting general platoon performance, or there is the possibility of using both.

Simple Correlations between Platoon OPS and Variables from Different Time Periods

Before working with the pitchers’ data, I revisited the hitters’ database, and I found that strikeout platoon differential was quite effective at predicting OPS platoon differential. The straightforward test is to correlate strikeout platoon differential compiled over 2012 to 2015 with OPS platoon differential compiled over 2016 to 2019. That produced a .35 r. But since this is essentially a laboratory, and not the real world, we are also allowed to go back in time. The correlation of strikeout platoon differential from ‘16 to ‘19 with OPS platoon differential from ‘12 to ‘15 came out similarly: .37. Translating the size of that correlation, hitter strikeout platoon differential can be characterized as providing useful information. Useful, as in predictive, and prediction is often the name of the game. And what really stands out is strikeouts are a whole lot better at predicting OPS than OPS itself, where the periods only correlated .23.

Since evaluation of pitchers often begins with strikeouts, while with hitters it is less a mark of quality than a description of style, one would guess strikeouts would be even more predictive with pitchers, even filtered through the platoon lens. But the most basic overview, the correlation of 2016 - 2019 OPS differential, did not conform to this expectation. Also included below are the correlations with walk rate, ground ball rate, and OPS itself from 2012 - 2015.

2012 - 2015 Variable Correlations with 2016 - 2019 OPS Platoon Differential

OPS .24

Ground Ball -.24

Strikeout -.18

Walk .01

I wasn’t sure whether to include the negative signs here, as sign can be confusing, and depends on how the variables are defined. Suffice it to say that, while the correlations were small, all were in the expected direction, or “right sign.”

These results seemed unlikely to be anomalous, but to be thorough I also looked at the prediction of OPS platoon differential from 2012 - 2015, correlating statistics from the later period.

2016 - 2019 Variable Correlations with 2012 - 2015 OPS Platoon Differential

OPS .24

Ground Ball -.20

Strikeout -.20

Walks .10

So the finding was that, while platoon strikeout rate and platoon ground ball rate are very consistent, they are not powerful in predicting OPS platoon differential. Although the correlation is weak, you might as well stay with OPS platoon differential itself as a basis for prediction.

Modeling Platoon OPS, and Conclusion

But I couldn’t put a wrap on things until I had run a model with some or all of the variables in tandem. The advantage of models is related to the refrain of “correlation and not causation.” While even in the rare precisely defined world that is baseball, a model is not a sufficient prerequisite for declaring causation, modeling multiple variables does allow for independent correlations and not spurious correlations. For instance, it’s easy to imagine strikeouts as seeming to have a positive correlation with runs scored, just because players who strike out also hit a lot of home runs. But if you have home runs and strikeouts together in a model as predictors, the boost from home runs is removed from the strikeout effect.

I modeled platoon OPS 2016 - 2019 for pitchers, and kept with the same four predictor variables I’d examined in isolation. I could begin the summary from any number of places, but will start by listing the Betas, which are the rough equivalent of correlations in models. In order of largest to smallest

Ground Ball -.316

Strikeout - .248

OPS .108

Walk - .062

So Ground Ball and Strikeout were no longer lagging behind OPS — they were trouncing her!

A hypothesis that could explain the change was quickly evident to me, sparked by the observation that Ground Ball and Strikeout correlated at -.33 with each other. Ground balls are good and strikeouts are good, and ground ball rate is very consistent, as is strikeout rate. But pitchers who get a lot of one, tend to have a below-average number of the other. So, when you just go with zero-order correlations, the unseen related element is exerting a negative pressure and canceling out the OPS benefit of the variable directly measured. When we model both variables, on the other hand, we can use the facts of the exact cases and distinguish between a high-strikeout and high-ground ball pitcher and a high-strikeout and low-ground ball pitcher, etc. When this is done, both variables grade out as a lot more useful. Meanwhile, the Beta weight for OPS, of a size that is not statistically significant, says we no longer need OPS at all.

Note that I defined strikeouts and ground balls so that they are independent. It is not the case that they have a negative correlation because all outs come out of the same pie, and more of one kind means less of another. Ground ball percentage, first of all, includes all batted balls, including base hits. Second, it is just ground balls/fly balls, with no reference to strikeouts. Felix Bautista with his 18 strikeouts-per -9 could theoretically have a 100% ground ball rate, the way I define it.

Strikeout rate and ground ball rate would not have supplanted OPS if they did not work better, but is there a way we can know just how much better? The model has a Multiple r. That can be compared to the simple r of OPS, which was .24. However, the Multiple r is slightly inflated because it has the benefit of an extra predictor behind it. The baseline or expected r of any model is not 0, and it grows with every predictor. So, the way to compare a model’s r to the simple r of OPS in the two time periods is through Adjusted R Square, which is adjusted for number of predictors. In this case, that translates to an r of .349. Given the strong repeatability factors found for both variables, and the .35+ correlation between strikeout rate alone and future OPS with hitters, one might have expected something higher, but the improvement over OPS alone is still about 50%.

Sticking with my philosophy of running the models in each time direction to double my data, I found that ground balls and strikeouts also emerged as the most informative variables when OPS 2012- 2015 was modeled from the later predictor variables. Strikeouts had the higher Beta this time, and the Betas were a bit lower. In the two-variable model, the Adjusted R came out to .283, a dip from .349 when the model was run in the opposite time direction. The t for OPS was .64 (key is that it is < 1), signaling that it was completely superfluous.

I had found before that one could not stop with the zero-order correlations and dismiss variables on that account. Why was ground ball rate predictive of success, anyway, I wondered? I had assumed its inverse link to extra base hits was the reason, which made pitchers who induced ground balls generally good pitchers. I therefore realized that I should run a model with Isolated Power and strikeouts, rather than just assuming ground balls would outperform ISO because of its higher repeatability factor.

After having done this, I determined that Isolated Power can indeed be a useful variable. It is definitely more useful than OPS itself in models, and it is particularly useful if you don’t have ground ball rate. Whether I used OPS 2012 - 2015 as the outcome variable or OPS 2016 - 2019, Isolated Power when paired with strikeout rate was a statistically significant predictor. Strikeouts and ISO actually worked a tiny bit better in predicting OPS 2012 - 2015 than strikeouts and ground balls did. But in comparing the OPS 2016 - 2019 models, the one with ground balls had a decided advantage, and overall the average ground ball model had an Adjusted R of .316, compared to .283 for the average ISO model. I found as well that the influence of ISO vanished when it was paired with both strikeouts and ground balls in predicting OPS 2016 - 2019.

After reviewing the ISO models and collecting more correlations, I actually don’t think ISO and ground balls are effective variables for the same reason. I don’t think they both belong as interchangeable choices in a list of sides, you might say. Despite their correlation of about .5, unlike ground ball rate, ISO correlates with strikeouts hardly at all. ISO actually seems to be helping as a predictor partly because of its zero-order correlation with OPS in the earlier or late year, which averaged .25. That’s right — Isolated Power, which as a part of Slugging Average is just a part of OPS, predicts future OPS better than OPS itself. That doesn’t seem to make sense, and may suggest that the OPS repeatability correlation is anomalously low. If you can look past it and the equal comparison with its hitting counterpart, you confront a mountain of evidence in favor of a greater role for individual pitcher platoon effects.

Even being of this opinion, it is not an accident that I confined my headline for this post to the topic of individual pitching platoon splits instead of offering a perspective or finding. I knew full well that a headline that individual pitcher platoon splits should be ignored or greatly preferred to hitter platoon splits might have arrested those yawns set off by an academic title before they reached full dentist-cooperating dimensions, but to provide one would not have been true to the data. For their complexity, the data are no less interesting, however.

Baseball Math, Baseball History, and Whatever Else