Updated: August 29, 2010

A new debate (updated)

by Luke Jackson · 0 comments

My buddy and I were having what 99.4% of the world population would categorize as an extremely bizarre debate between a 19-year-old and a 20-year-old — in the wake of Trevor Cahill suddenly being thrown about in the Cy Young conversation, we were discussing the merits of traditional pitching metrics (such as wins, ERA, BAA and WHIP) and more advanced metrics (such as FIP, xFIP and WAR.)

Dave Cameron of FanGraphs weighed in on Thursday about the Cahill-for-Cy talk:

“After Trevor Cahill lowered his ERA to 2.43 last night, Buster Olney tweeted that his numbers made Cahill a top contender for the AL Cy Young award. Keith Law quickly responded, noting that Cahill was 31st among AL starters in WAR and had a 4.07 FIP, suggesting that Cahill was in no way a Cy Young candidate despite the shiny low ERA.

Olney and Law clearly approach the award from different angles. Buster is more traditional, and prefers to use the numbers that have always been the standard for evaluating pitchers. Keith just wants to reward the guy that he thinks pitched the best, and doesn’t care about the way things have always been done. But their discussion raises an interesting question: what role should our stats have in the Cy Young discussion?”

I should mention that I love Cahill. He’s a big guy, works very quicky, has plus command and induces a ton of ground balls. I saw Cahill, one of Oakland’s many young arms, at Camden Yards this year in May. I didn’t know much of anything about Cahill before the game started, but I sure knew about him by the third inning. I was impressed. So even if I mention that advanced metrics aren’t incredibly kind to Cahill, remember that I really like this kid now and going forward.

But what should determine a Cy Young Award winner? Surely, there’s no one overarching statistic that can tell us who should win the Cy Young, although I suppose WAR attempts to do just that. In the argument that I had with my buddy, I was much more for the advanced pitching metrics (to go along with ERA), while my buddy had a fascination with traditional pitching metrics, especially WHIP.

I told him repeatedly that WHIP, and BAA for that matter, are both meaningless because each weighs a swinging bunt single the same as a home run, and each statistic is heavy on luck — much of the time, it’s no fault of the pitcher that a batted ball fell in for a hit. WHIP is also a poor metric for pitching performance because the numerator and denominator are different — walks and hits come on a per-batter basis, but the denominator focuses on per-inning performance. Needless to say, my buddy refuses to think WHIP is meaningless — in fact, he feels like it’s very meaningful – even though all of the evidence says otherwise.

My buddy then asked me if I thought OPS-against was a decent tool to use when evaluating pitchers as opposed to non-weighted statistics such as BAA. I said it was a lot better than BAA, because at least OPS-against begins to weigh positive outcomes for a hitter, but even then, there are much better tools to use to evaluate pitchers.

When I told him about FIP and how it theorizes that pitchers only have reasonable control over walks, homers allowed and strikeouts, he scoffed at me. He couldn’t believe that of all a batter’s base hits, only homers were accounted for. I told him that a ball in play falling in for a hit — or falling into a defender’s glove — has a ton to do with luck and the quality of the defense behind the pitcher. Therefore, by taking balls in play out of the equation, you can have a much better feel for how a pitcher has actually pitched rather than how well his defense fielded or how lucky the pitcher was. One can evaluate how well a pitcher has controlled the things he can control, or at least reasonably so.

My buddy said FIP is bull crap because to ignore all hits but home runs is idiotic. He also mentioned that taking home runs into account wouldn’t be fair, either, because of the difference in home parks around the league — which is when I mentioned that xFIP normalizes a pitcher’s home run rates. While FIP and xFIP theorize that pitchers have little to no control over whether balls in play become hits or not, my buddy thinks that pitchers definitely have reasonable control over whether a ball in play becomes a hit.

Personally, I’m not sure where I fall in right now. I’m not sure if I’m on the sabermetric level of thinking, which seems to think that no pitcher has any control over whether a ball in play becomes a hit, but I certainly believe luck and defense play a significant role in if a ball in play becomes a hit. I just don’t know if luck and defense play the entire role. So let’s examine, shall we?

For example, a big ground ball pitcher like Cahill (56.0% groundball rate) would logically seem to induce more outs on balls in play than a pitcher who didn’t have as high of a groundball rate, right? Groundball pitchers would seemingly have a lot more success than someone living up in the zone, right? Wrong.

Cameron had another post about Cahill later Thursday:

“One comment that keeps arising, however, is about the correlation between Trevor Cahill’s BABIP and his sinker, specifically his ground ball rate. Several people assert that Cahill is inducing weak, easy to field contact by pounding his sinker at the bottom of the strike zone, and that’s why his BABIP is just .217. There are a few problems with this assertion, though.

We know that BABIP on groundballs is higher than on flyballs, as a ball is more likely to sneak between two infielders than it is to fall in front of an outfielder. In general, groundball pitchers will post higher than average BABIPs, not the other way around, though the effect is generally pretty small.

The other problem… well, we’ll just demonstrate it this way.

Trevor Cahill: 56% GB%, 14.9% LD%, 29.1% FB%, .217 BABIP
Justin Masterson: 62.3% GB%, 14.9% LD%, 22.8% FB%, .344 BABIP

The argument that this particular skillset is the driver of a low batting average on balls in play falls apart when you consider that Masterson, who gets more groundballs and has an identical line drive rate, is posting one of the highest BABIPs in all of baseball. We cannot just see two variables and assume that one is the cause of the other. Cahill has a high groundball rate, and he has a low BABIP, but there’s just no evidence that the former is driving the latter.”

So maybe having a dominating sinker doesn’t mean you’ll definitely have a lower than normal BABIP. Maybe it is all about luck and defense in coorelation with whether a given ball in play becomes a hit. But a pitcher like Roy Halladay, with the quality of their repertoire, would surely have more control over whether balls in play become hits than, say, Jeremy Guthrie would, right? Seems logical enough. Halladay has a ridiculous repertoire; Guthrie, average. Hitters would seemingly put much better swings on a Guthrie fastball than a Halladay cutter in on their fists, right? And those balls put in play off of Guthrie would then be harder hit balls than off of Halladay, increasing the chance that these balls fall in for hits off Guthrie, right?

Well, I just looked up the respective BABIP numbers for Halladay and Guthrie. Halladay sits at .301, right around where pitchers usually sit. Guthrie is, surprisingly, at a very low .269.

I’m blown. Maybe it really is all about luck and defense. What other ways can you possibly explain Guthrie having a lower BABIP than Halladay, by such a degree? How can one possibly say that a pitcher can influence whether balls and play become hits or not when we see that Guthrie has a BABIP 32 points lower than Halladay and that Cahill/Masterson comparison?

The answer? We probably can’t. So does it mean that we look solely at the things pitchers can control when deciding the Cy Young winner? Do we just look at homers allowed, walks and strikeouts? It seems logical, but it also seems like a very incomplete picture.

I love FIP and xFIP when evaluating pitchers because it normalizes luck, defense and, for xFIP, home run rates, but it doesn’t show us how many runs a pitcher has actually given up — just how many runs he should be giving up, or is likely to in the future. We get to know about a pitcher’s end results with good ol’ ERA, which everyone loves. I like to take pitcher’s ERA and stack it up against his FIP and xFIP, where I can tell right away whether that ERA is likely to shoot up in the future or not. And we can also examine a pitcher’s BABIP in predicting future performance for a pitcher.

I like to use WAR, as well, in order to study a pitcher’s value to his team, and here’s another I like to use — plain old innings pitched. In this day and age with the high use of bullpens and the lack of starters like Halladay that can consistently go deep into games, those pitchers who can log seven to nine innings per start become uber-valuable to their team (and especially their team’s bullpen, who gets a break from having to pitch four or three innings.) To me, the worthless statistics in examining pitchers are wins, WHIP and BAA.

But this brings up the biggest question of them all — should we just look at what should have happened in regards to these pitchers’ results when evaluating the Cy Young winner, or do we just look at what we know has happened (in terms of runs allowed)? Olney took a shot at this over on Twitter on Thursday after he tweeted on Wednesday that Cahill was a legitimate Cy Young candidate. In a series of five tweets from Olney:

“The premise that Cy Young candidates should be judged on what their numbers should be, rather than what they actually are, is amusing. That thought process certainly would have altered the results in 1961 — because I guess Roger Maris wasn’t supposed to hit 61 homers. The Dodgers, I guess, should not have won the ’88 World Series; Bucky Dent should have flied out to left. Lucky? Really? They did it. The # say Orel Hershiser wasn’t supposed to pitch 59 consecutive scoreless innings. DiMaggio wasn’t supposed to hit in 56 straight. Were the ’07 Rockies not supposed to make the WS, based on the numbers before their streak? C’mon. The # are what they are, until they aren’t.”

I’m not sure what Maris, the 1988 Dodgers, Bucky Dent, Orel Hershiser, Dimaggio, or the 2007 Colorado Rockies has to do with the 2010 Cy Young race, but his point is clear. Olney feels like we should only measure a pitcher’s performance with what we know has occurred, not what may have occurred if that pitcher had a better defense or if that pitcher played in a friendlier home park. He feels like we should only go by what has actually happened. And by those standards, sure, Cahill — according to the lack of runs he’s given up — should be an AL Cy Young contender.

But, personally, I can’t just ignore FIP, xFIP and WAR. I can’t penalize a Cy Young contender other than Cahill who may not have the kind of defense the Athletics have. I can’t penalize a contender whose luck obviously isn’t anywhere near Cahill’s, and I also can’t penalize another contender who doesn’t have the privilege to play in that pitcher’s paradise in Oakland. We know based on research that pitchers only have reasonable control over homers allowed, walks and strikeouts, so we can’t just ignore the advanced metrics that emphasize the aspects of the game that a pitcher can reasonably control. And one can’t ignore WAR, which attempts to measure the value of a pitcher to his team.

I was also having this Cy Young discussion with Daniel Moroz of Camden Crazies on Twitter. He said the following when evaluating Cy Young candidates:

“Use everything; weight appropriately.”

And you know what? As simplistic as it seems, he’s correct. But how much weight does a voter give to FIP and xFIP, as opposed to ERA? Do wins, WHIP and BAA hold any weight at all? (I hope not.) This is the battle Cy Young voters will have to grapple with. Cameron, in his FanGraphs article that I started this blog post with, wrote of the difficulty in weighing these metrics:

“So, just like I would not rely solely on ERA to make a judgment about who deserves the Cy Young award, neither would I rely solely on FIP. When trying to evaluate how a pitcher did in the past, ERA includes too many things that aren’t under his control, while FIP strips out too much. If I had to choose one or the other, I’d go with FIP over ERA, because I think it gets you closer to reality, but we don’t have to choose. We can look at the whole picture, and that’s what I suggest people do with their Cy Young picks.”

Do you know what the beautiful thing is? That we’re actually having this discussion. It’s the beauty of baseball. We can argue and argue and argue some more over statistics, and we can all come upon different conclusions. It’s part of what makes baseball so special. And with time, more statistics are making their way into the mainstream, and we can have more to argue about.

For what it’s worth, the top three pitchers on my AL Cy Young ballot would be, 1) Felix Hernandez, 2) Cliff Lee, and 3) Francisco Liriano. My NL Cy Young ballot would be, 1) Roy Halladay, 2) Josh Johnson, and 3) Adam Wainwright.

UPDATE: I had Daniel Moroz of Camden Crazies look over this piece and tell me what he thought of it. He corrected me in my assertation that WHIP and BAA are worthless — he told me both stats are just flawed. Also, he corrected me when I wrote that pitchers have zero control over balls in play becoming hits – he said that one pitcher just can’t really control what becomes a hit any better than another pitcher. I just wanted to clear those things up.

Moroz also told me that I should explained my Cy Young rankings, and he’s definitely right. One of the biggest reasons why I picked Halladay and Hernandez as my Cy Youngs was that both have pitched a tremendous amount of innings – Halladay, 207.0; Hernandez, 204.1 — and lead their league in that category by significant margins. The quality of those innings is also tremendous. Halladay is second in the NL in both ERA (2.22) and FIP (2.75), and is first in the entire major leagues in xFIP (2.89). Hernandez is third in the AL in both ERA (2.47) and FIP (3.03), and is tied for second in the AL in xFIP (3.26). Other contenders in some cases are better than Halladay and Hernandez in the categories of ERA, FIP and xFIP, but not significantly enough for me to outweigh the value of the amount of innings that Halladay and Hernandez have logged.

Leave a Comment

Previous post:

Next post: