Sunday, December 30, 2007

Pitching Statistic

A while ago I wrote about a stat I created called Runs Created Average (RCA). Looking back the post I made was pretty rambling and I`ve tinkered with it since then. Judging by the poll I made about it a lot of you weren't reading this blog then either.

It has been shown that ERA, WHIP, hits, and other traditional pitching statistics are poor indicators of a pitcher's true skill and poor predictors of pitchers future performance. This is because hits heavily affect these stats and hits are largely determined by the defense and the luck of the placement of batted balls. The writer of that article, Voros McCracken, then proceded to figure out what stats were not affected by defense or the luck of the batted ball. He decided to use SO's, BB"s, and HR's to create a new stat called DIPS ERA.

When I first read about this in Baseball Between the Numbers. It showed how consistent several statistics wehere and compared to the others stats used in DIPS, SO's and BB's, HR's where very low. Later I also learned about another theory that pitchers can't control their HR/FB% (meaning that pitchers can control whether a ball is a FB or a GB but not what that GB or FB ends up being, the HR/FB% theory is not universally accepted).

I thought it would make more sense to use batted ball types (GB's, FB's, LD's, and IFFB's) instead of HR's. So after looking around a little I found the percentage of the time the different batted ball types and the run values for BB's, SO's, and HBP's were outs.

Here is the percentage of the time each of these events were outs:

SO= 100%

BB= 0%

HBP= 0%

GB= 77.2%

FB= 81.2%

IFFB= 96.2%

LD= 18.8%


This can be used to find out how many outs the pitcher should have gotten. You convert these to decimals and multiply it by the amount of times the appropriate event happened. This will get you the amount of outs that should have happened on each event. Add these up and you`ll get the total amount of outs the pitcher should have gotten.

Here's the fomula for Expected Outs Pitched:

EOP= (SO*1.00)+(BB*0.00)+(HBP*0.00)+(GB*.772)+(LD*.1 88)+(FB*.812)+(IFFB*.962)

You then divide this by three to convert it to Expected Innings Pitched

EIP= EOP/3

Next you have to figure out how many runs should have been given up and to do this you need to figure out the run value of each event. These were determined using play by play data to find how many runs a team would be expected to score before one of these events happened and after. The value of the event is how much the amount of runs expected to be scored is changed by on average.

Here is how many runs each event adds to the average situation:

SO= --0.287 runs
BB= 0.315 runs
HBP= 0.342 runs
GB= --0.101 run
FB= 0.035 runs
IFFB= --0.243 runs
LD= 0.356 runs

You can use this to figure out how many runs a pitcher should have given up above average by multiplying each event's value by the amount of times that event happened with that pitcher pitching. You then add them all up. I call this Expected Runs Surrendered Above Average.

Here's the formula with the values of each event put it:

ERSAA= (SO* -.287)+(IFFB*-.243)+(GB* -.101)+(FB*0.035)+(LD*0.356)+(BB*0.315)+(HBP*0.342)

This doesn't give you the the actual number of runs the pitcher should have given up though. You need to add the ERSAA to the number of runs the average pitcher would give up in as many innings as the pitcher who`s RCA you are figuring out. So next you multiply the Expected Innings Pitched by the Average Runs Scored per game and add the Expected Runs Surrendered Above Average. This gets you Expected Runs Surrendered.

ERS= (EIP*ARS)+ERSAA

Now finally you can figure out Runs Created Average by using what is essentially the formula for ERA.

RCA= (ERS/EIP)*9


As an example I`ll compare the Yankees and Red Sox's projected starting rotations for 2008. Unfortunately, I don't know of a site that tracks batted ball data for minor leaguers and I don't want to use Joba's stats in the bullpen so Clay Bucholz, Ian Kennedy, and Joba Chamberlain won't be used. The Yankees only have 4 of their pitchers that I can find their RCA's for so I`ll just compare the top four (according to each official website) for both teams. Keep in mind that 4.50 is average.


Yankees:


1) Chien Ming Wang- 184 Expected Innings, 90 Expected Runs, 4.34 RCA

2) Andy Pettite- 209 Expected Innings, 103 Expected Runs, 4.43 RCA

3) Phil Hughes- 72 Expected Innings, 35 Expected Runs, 4.38 RCA

4) Mike Mussina- 149 Expected Innings, 80.5 Expected Runs, 4.86



Team- 614 Expected Innings, 308.5 Expected Runs, 4.52 RCA


Red Sox:

1) Josh Beckett-207 Expected Innings, 68 Expected Runs, 2.96 RCA


2) Curt Schilling- 154 Expected Innings, 73 Expected Runs, 4.27 RCA

3) Daisuke Matsuzaka- 206 Expected Innings, 92 Expected Runs, 4.02 RCA

4) Tim Wakefield- 187 Expected Innings, 94 Expected Runs, 4.52 RCA



Team- 754 Expected Innings, 327 Expected Runs, 3.90 RCA


This obviously shows that the Yankees have basically an average rotation while the Red Sox have a very good one. In fact, its not even close. However, Joba Chamberlain, Ian Kennedy, Clay Bucholz, and Jon Lester all either don't make the cut for the four man rotatuion or have not enough experience as a starter in the majors and will likely be important players which might help the Yankees a bit but not enough.

Thanks for reading all this. I`d love some feedback on it :)

9 comments:

SG said...

I like it. I use linear weights as part of my pitching projections when figuring out how many runs we'd expect a pitcher to allow going forward, and I think this includes more useful information than that. You should run a weighted average of the last three years for a few people and see how predictive it ends up being.

SG said...

If I wasn't clear, I meant use 2004-2006 with a weight like 3/2/1 and see how closely it modeled what happened in 2007.

Rebecca said...

That is way past my powers in math, but I am very, very impressed.

Anonymous said...

Mike NYY - Very nice. Does the runs scored per batted ball type include errors made and other defense related play? Also, I have some stats issues I’m working on that I would love some feedback on. I actually have some posted on a website. Woulkd you be up for a ’stats chat’ with me?

OldYanksFan
(singledddd -at- Yahoo)

Anonymous said...

anonymous, yes it does.

Anonymous said...

Thanks SG, I`ll do that as soon as I get a chance

Anonymous said...

Mike, this is a much better statistic to use than ERA and, if you've read anything I've done over at Statistically Speaking (MVN) you'll see that I just completely disregard ERA as well as breaking down W-L records into four sub-categories.

The only "issue" I have with RCA is that it is a season-as-a-whole evaluator. For instance, if Tim Wakefield gives up 5 runs in three straight games it would be considered 3 average-bad starts. However, if he gives up 10 runs in G1, 0 runs in G2, and 5 runs in G3, he actually made 1 great start, one average-bad start, and one terrible start.

That being said, it is not really an "issue" but rather it all depends on how you like to evaluate. With my SP-Effectiveness Points system I am more about looking at each individual tree within the forest to properly evaluate a pitcher. You might be more interested in just looking at a season as a whole, which is fine, but just know that there will be things (more than some) that get lost in the equation.

End result, though, great math work, great idea, and a barometer much better than ERA.

Pizza Cutter said...

Mike: a good use of the data which fits with a lot of things that have been found out there, including my own stuff. Good work.

Mike N. said...

Thanks, for the feedback. I appreciate it