Saturday, January 5, 2008

More randumb thoughts…

I’ll open with some thoughts about Thursday’s points. To begin with, I’d like to expand on some things discussed with Crashburn Alley’s always erudite Bill Baer both here and at Ball-Hype. Bill is my unofficial mentor to Phillies phandom (and helpful guide filling gaps in my various sabermetric deficiencies) since I’m a recent convert. Suffice it to say, I value his opinion and want to re-post some of the points under consideration.

On TPoSGD:

He: Did I just have a fun time reading about why I'm wrong?

To defend Sabermetrics, though, I don't think they aim to establish a certain formula for success, it's simply a byproduct. Sabermetrics are just more advanced ways of discerning value in a player, and noticing the qualities of the most valuable players (note: not Most Valuable Players, tee hee) gives us an indicator on what may be the most successful way to build a ball club.

And Billy Beane's A's were built not around using numbers to get the best players; rather, they were built with the restriction of a low budget and the idea of finding value in areas other teams tended to ignore, such as on-base percentage. That's why you see the A's of the early 2000's stocked with OBP's between .350 and .400 -- all above league average.

You're too right when you say that there are events that go unnoticed in the box scores. But two items about that:

1) Sabermetrics don't claim to take into account all factors of a baseball game. Simply, they just use what is logged either in the official box scores or in the accumulated data from a host of sources.

2) The items that don't get logged in statistical tables are few and far between, especially with the advent of the Pitch F/X system.


Me: There’s no need to defend sabermetrics. You’re not wrong. I’m a big believer in it myself. For the most part, I agree with most of its precepts. It’s increased understanding of the game. I’m hugely grateful for the new frontiers of research it is traversing. My only real issue with it is that baseball in not entirely quantifiable. You acknowledge this yourself—a lot of sabermetricians do not. To me, it’s not unlike the classic gag ‘Who are you going to believe—me or your own lying eyes?’

I know what I saw over 162 games—for people who didn’t have the same point of reference telling me that you should always let guys like Sal Fasano, Jason Phillips, Hector Luna etc. swing away in any and all situations doesn’t ring true. I noticed a problem in May and folks were telling me early, often and repeatedly that the Jays are handling things correctly only to see in June-end of season that approach costing the Jays runs and games.

When the Jays finished with one of the lowest scoring teams in the AL and being told that it was still the correct approach … well you can see where I might have a problem with that. As I wrote throughout last season, the philosophy utilized by the Jays was based on a certain set of circumstances: a batting order of a healthy and productive Reed Johnson, Lyle Overbay, Vernon Wells, Troy Glaus, Frank Thomas, Alex Rios, Aaron Hill, Gregg Zaun and Royce Clayton (heh). They had the personnel at the beginning of the season for that. When the roster and expectations changed due to injuries and slumps Toronto never stopped to reassess things. Even though they no longer had the pieces in place, they continued as if they had. It would be like the 1985 Cardinals getting injuries to Willie McGee, Vince Coleman, Andy Van Slyke, and Ozzie Smith and being replaced with Matt Stairs type fill-ins and still playing the speed game.

As I wrote, baseball is probably the game with the most variables to it. There is no one matrix of any kind that can capture all of them.

Mathematics is the ultimate truth: 2+2 will always equal four. However, to paraphrase Jimmy Dugan (with a slight alteration) “There is no ultimate truth in baseball.” It’s the classic square peg in a round hole conundrum.

As I said, I’m a big believer in sabermetrics but when you’re using the perfect, flawless system that is the study of mathematics in assessing a uniquely human endeavour in all its flaws and quirks there is going to be a lot leakage around its boundaries.


You can see here why I like as many data points as possible. As I mentioned, I’m not a traditional stats guy. I do understand and appreciate the study of sabermetrics. Like many, I grew up reading Bill James’ abstracts in the 1980’s and while I do not fully comprehend all of it (due to insufficient math skills) I do understand what James was (and is) trying to do.

Bill James’ work back then (and today) wasn’t about studying statistics as much as trying to glean as much extra data about the game and quantifying it in an understandable fashion. James was looking at numbers through the prism of the game--not looking at the game through the lens of statistics.

Put another way, he was letting baseball define the numbers--not allow the numbers to define the game.

For example, all a run means is that a base runner crossed home plate. An RBI means that the batter made contact (or reached base) and in doing so, allowed a base runner to cross home plate.

As we all know, the games is much more complicated than that. There are three other bases where significant activity takes place and these impact on runs scoring. It’s the same defensively, while it all boils down to catching and throwing--a lot of different things can occur while this is going on and these are things of which James was trying to assign appropriate value.

James (and others) view themselves as not just teachers--but students of the game. Unfortunately, some of James’ disciples actions demonstrate that they view themselves only as teachers--setting the ignorant masses straight on matters pertaining to baseball. The thing is, numbers tell us more what has happened rather than what will happen.

As Yoda once said: “Always in motion the future is.”

While the numbers do have a degree of predictive value, the accuracy of these predictions is based on the amount of variables introduced into the situation under consideration since it occurred on earlier occasions. This is why sample size is so important. The more times a given situation occurs, the more examples become available for study (including the inevitable variables that can potentially come into play) and the more predictive value it will possess.

On Ball-Hype, Bill dropped another well thought out post:
John, let me throw some numbers at you using BaseballProspectus.com's Run Expectancy Matrix.

Runners on first and second, no outs: 1.51044 runs are likely to result based on every exact situation in 2007.

Runners on second and third, one out: 1.44328 runs are likely to result...

It is technically a bad idea to give up an out here, but if you break it down a bit, it becomes clearer.

Runner on first, no outs: .92599 runs.
Runner on second, one out: .72842 runs.

Runner on second, no outs: 1.18953 runs.
Runner on third, one out: .98694 runs.

However, this doesn't take into account who is batting. There is little difference between expected run production bunting versus not bunting.

When you have Sal Fasano batting, though, the odds of him getting a productive non-out hit are lower than the odds of making a non-productive out. Thus, bunting is probably the correct situation here.

So, when you say that Sabermetric philosophy states that bunting is the wrong idea here, it is with the caveat that it is not taking into consideration many factors, including who is batting, who is hitting behind the proposed bunter, who is pitching, who may be pitching, etc.

If there is some mathematical formula that can accurately show when intentional out-making is beneficial, I'm sure sabermetricians will adapt and include that.

Even as a big-time promoter of Sabermetrics, I pretty much agreed with everything you wrote. And I put on a pot of water to boil for Ramen noodles, and I was so into reading this article, that I nearly forgot about it and that outcome wouldn't have been so pleasant.


Now you know why I respect Baer’s opinion--he knows far more about sabermetrics than a poseur like myself yet comprehends that there are still frontiers to be explored. I do respect and understand the importance of Run Expectancy. As Bill notes, there are variables that have to be considered. Not too many teams have had the misfortune of trying to contend in the DH league with a great many games where the bottom of the lineup is replacement-level or lower.

It is what was lost on many last summer--your traditional Run Expectancy studies were not conducted assuming a potentially contending American League club (read: with a DH available) with three batters with aggregate on base and slugging averages south of .300 batting 7-8-9.

I was of the opinion that Run Expectancy was of little usage in getting a grasp on the Blue Jays 2007 season due to the variables involved. I could see by watching the games what was not working and trying to find something that might work. The Jays had three outmakers; outs only produce runs when there is a runner at third and less than two out and the out is the result of (certain types of) contact or a passed ball third strike. That being the case, the optimum scenario for the Jays was--when the bottom of the lineup was due up-- to make sure that man on third (or second and third) with less than two out was their best bet to get run production out of them. All of the fireworks of ‘07 surrounded trying to find the best way to reduce the damage the bottom of the lineup was inflicting on run scoring.

Sadly, many trotted out Run Expectancy as the miracle cure not realizing that it is based on what has happened--and not necessarily what will happen. Had they pointed to a study breaking down man on second (or first and second) nobody out with three .229/.280/.295 batters due up that demonstrated swinging away maximizes scoring with a sufficient sample size to have a degree of predictive value--that would be one thing. If I insisted they have such a thing--they would have rightly told me that hasn’t been enough examples of this to have such information handy.

This is precisely why I felt a reassessment in offensive philosophy by the Blue Jays in 2007 was warranted--and that the answer didn't lay in Run Expectancy in this particular instance.

Hawk up a HOFer…

As I mentioned during last week’s segment on ESPN 1450’s Mike Gill Show I recently came around on Andre Dawson’s Hall of Fame worthiness. During my work with “the Dweeb Team” (BTW … Mr. Tango re-did the front page--check it out) I spent a lot of time looking back at those old Expos teams.

It brought back so many memories of him that I took another hard look at his career. For me, the No. 1 barrier I had regarding Dawson was his sub-par OBP. I guess I felt a bit of a hypocrite for minimizing Raines not reaching 3000 hits since he was looking to get on base by any means possible. He was a leadoff man doing his job--that job being getting on base and not worrying about how it looked on his stats.

Well, Dawson understood his job as a middle-of-the-order hitters as being driving in runs. Everybody looked at Dawson to hit, to drive in runs--not to wait out walks. I find it hard to fault Dawson’s OBP since it wasn’t viewed as a serious part of a run producer’s responsibilities.

Raines was supposed to get on base--that meant he walked on 1330 occasions and reached base 3977 times. It would have been selfish or Raines to have a mindset where he concentrated on hitting .300 and reaching 3000 hits even if it meant his career OBP ended up at .355 instead of .385. Dawson scored 1373 runs, drove in 1591, with 2774 hits--over 1000 of them for extra bases. He produced 2526 ‘eyeball runs’ while launching 438 HR, so it can be said he was an excellent run producer--he did his job.

The final point came about when I happened onto Richie Ashburn's page of Baseball-Reference. Fellow aficionados of Sean Forman’s magnum opus know how you can start clicking comparable players of somebody whose numbers you decided to look up. That’s what I was doing (heck, I can't even remember who I initially was looking up) when I came to Ashburn’s page. I felt he was overlooked for Cooperstown for a long time and was glad when he made it. Anyway, I never really looked hard at his career SLG before and it dawned on me it was below league average.

However, he was a leadoff batter so SLG wasn’t too important to his job description. Ashburn was a gifted fielder, a career .308 hitter, reached base 3815 times (.396 OBP in 9736 PA), and scored 1322 runs. He had enough plusses to outweigh that one minus. Well, like Ashburn, Dawson was a gifted fielder winning Gold Gloves in center and right field, topped 300 stolen bases, had five seasons of 25 HR/25 SB, with all those runs, RBI and extra base hits. So many positives and only one negative--it was then I felt that I had been too hard on Dawson.

I still feel Raines was the better player, but I’ve moved “Hawk” from my borderline HOFer (I tend to put borderline guys on my “nay” list) to HOF status. Should Raines make it on Tuesday, expect some noises about Dawson in future (assuming he falls short this year).

Best Regards

John

1 comment:

Bill B. said...

You speak too kindly of me. But it is true that I will never attempt to cook and read your articles at the same time.

With the variables, you can kind of educated-guess your way into logging which decisions are statistically favorable in each situation, but nothing specific.

You might be able to say that with a .750 OPS #7 hitter and a .690 OPS #8 hitter, you don't bunt with the #7 hitter unless it's late, you're down by one run, and there's a runner on second base with no outs, assuming that your #8 hitter is good at hitting fly balls.

But I don't think we'll ever be able to say definitively that you don't give up outs in specific situations.