Roy Halladay to undergo tests on his right shoulder

After allowing nine runs on Sunday, Halladay admits to dealing with shoulder discomfort.

Remembering the biggest free agent busts in franchise history

How could we celebrate Opening Day without thinking of Danny Tartabull and other free agent flops?

Monday, October 29, 2012

Giants prove best club rarely wins World Series

PBR - This should be obvious to even the most casual of fans, but the best team in baseball does not always win the World Series. In fact, since 2000 only two teams have led the majors in wins and gone on to win the World Series - the 2009 Yankees (103) and the 2007 Red Sox (96). 

If you haven't yet I encourage you to take a look at my recent article about using mathematics and specific metrics to compare clubs across generations. True, winning the World Series is a major accomplishment, but a champagne-covered trophy and a parade are not requisites for greatness, nor do they guarantee a spot among baseball's all-time elite.

Not taking anything away from San Francisco, but do we firmly believe the Giants were the best team in baseball this year? Yes, the club may have been the "best" in the sense that they performed better than anyone else following the regular season, but were they the most talented club in baseball this year? 

Put another way, did the team with the greatest statistical ability win the World Series? 

The Greatness Number ranks the Giants as the 11th best club in the majors this season, meaning ten other teams were statistically stronger (NYY, WSH, TB, TEX, StL, ATL, OAK, CIN, LAA, CWS). If you ranked clubs by win totals the Giants ranked sixth.

It's a fact that the best team (statistically) rarely wins the World Series. Probability and chance variability play a significant role in determining the outcome of a playoff series. Knowing that, take a look at how the Giants compare to World Series winners dating back to 2000. The clubs are ranked according to their Greatness Number - the number is parentheses is the club's rank in baseball history.

  1. 2007 Boston Red Sox (54)
  2. 2002 Anaheim Angels (74)
  3. 2004 Boston Red Sox (100)
  4. 2009 New York Yankees (164)
  5. 2001 Arizona Diamondbacks (214)
  6. 2008 Philadelphia Phillies (361)
  7. 2010 San Francisco Giants (374)
  8. 2005 Chicago White Sox (431)
  9. 2011 St. Louis Cardinals (629)
  10. 2012 San Francisco Giants (654)
  11. 2003 Florida Marlins (758)
  12. 2000 New York Yankees (765)
  13. 2006 St. Louis Cardinals (1036)
This list provides context as to where the 2012 Giants belong in baseball history. Of more than 2,300 clubs the '12 Giants rank 654th, good enough to belong in the top 30% but nowhere need good enough to belong in the discussion of an all-time great club.

- Patrick Gordon is the editor of the Philadelphia Baseball Review. Contact him at pgordon@philadelphiabaseballreview.com or @Philabaseball on Twitter.

Tuesday, October 23, 2012

A look at the worst season in Phillies history

PBR - If you've been following along with my recent posts then you may know I've established a mathematical formula that easily allows for the comparison of baseball clubs across eras using three important metrics. If you haven't been following I strongly suggest you check out these two articles to get up to speed:  [Part I | The myth behind earned run average and batting average] and [Part II | The creation of the Greatness Number].

Using the Greatness Number formula (.503+(OPS*.097)+(RDiff*.006)+(WHIP*-.053) I've ranked every Phillies team from 1900 through 2012 and found the 1945 club to be the worst in franchise history.

A two-team town in the 1940s, Philadelphia baseball fans were familiar with the bottom of the standings. The Athletics and Phillies finished last in their respective leagues in 1938, 1940, 1941, 1942, and 1945. Attendance floundered and interest, particularly in the Phillies, waned.

Sensing a need to win back fans, owner Ruly Carpenter sponsored a contest in 1944 to give the Phillies a new nickname and by the start of the 1945 season the Phillies were alternately known as the Blue Jays. The franchise changed the color of the script on their uniforms and added a Blue Jay patch on their sleeve (see image above), yet the team continued to struggle and the new nickname failed to take off. The Blue Jays experiment  concluded after the 1946 season.

The 1945 Phillies finished the season 52 games behind the Chicago Cubs and pieced together a woeful .299 winning percentage. The club ranked last, or next to last, in every relevant offensive and pitching category in the National League with Vince DiMaggio and his 18 homers the lone bright spot on the roster. The team finished with a -316 run differential.

To truly understand how bad the 1945 Phillies were let's look at how the club stacked up against every other club in major league history. To do this, we will look at how many standard deviations the club fell from every other club in the three metrics used in the Greatness Number:

  • OPS: .634 = -1.5 S.D.
  • WHIP: 1.59 = -2.01 S.D.
  • RDiff: -316 =  -2.7 S.D.
Combining the standard deviation totals the 1945 Phillies had a standard deviation score of -6.19, marking the club as one of the worst in major league history. The club's Greatness Number is .285, also among the worst in major league history.

One more funny note about the Blue Jays nickname - nearly 5,100 entries were received in the nickname contest, yet fans remained disinterested in the club and the new look. Ironically, some students at Johns Hopkins University in Baltimore took offense to the Phillies using Blue Jays as a nickname and logo because it was their nickname and mascot for nearly 70 years. "It is a reprehensible act which brings disgrace and dishonor to the good name of Johns Hopkins University," read a letter written by members of the student body. 

Over the next few weeks I'll be releasing more info from my Greatness Number study.

- Patrick Gordon is the editor of the Philadelphia Baseball Review. Contact him at pgordon@philadelphiabaseballreview.com or @Philabaseball on Twitter.

Monday, October 22, 2012

Phillies spring training details released; time to start booking flights to Clearwater

PBR - The Phillies' spring training schedule features 16 games at Bright House Field in Clearwater and a pair of exhibition contests at Citizens Bank Park against the Toronto Blue Jays on Friday, March 29 and Saturday, March 30.
 
The club unveiled their 2013 spring training schedule on Monday.
 
Pitchers and catchers report to Clearwater on Tuesday, February 12, with the full squad reporting Friday, February 15.
 
The Grapefruit League season begins with a home contest against Houston on February 23. Other opponents include Detroit, New York, Atlanta, Minnesota, Pittsburgh, Toronto, Tampa Bay, Baltimore and Boston.
 
According to the Phillies, spring-training 3-game packs will go on sale Wednesday, December 12. Individual ticket sales for spring training games will begin on Thursday, January 10. Tickets can be purchased at phillies.com or by phone by calling 215-463-1000.
 
All home game times are 1:05 p.m.
 
If you plan on making the trek to Florida remember you may miss some players as the World Baseball Classic coincides with spring training, running from March 2 thru March 19.
 
The Phillies open the regular season in Atlanta on Monday, April 1.
 
- Patrick Gordon is the editor of the Philadelphia Baseball Review. Contact him at pgordon@philadelphiabaseballreview.com or @Philabaseball on Twitter.
 

Saturday, October 20, 2012

The Greatness Number: Using mathematics and regression to compare clubs across eras

PBR - Through the previous regression analysis [click here to view the article] with wins as the dependent variable we now know WHIP, OPS, and Run Differential all have a strong relationship to success. Using these three metrics, it is my belief that a mathematical formula can be constructed to definitively determine the best club in baseball history.

I used wins in the previous analysis as the dependent variable because I simply wanted to see what metrics correlated strongest to winning. Ironically, there actually is a major drawback in using wins as a means to compare clubs across eras. Why? Think of the several ways the sport has changed over the last century, specifically with regard to the number of teams in each league and scheduling. MLB introduced the 162-game schedule in the early 1960s, but prior to that the schedule was 154 games and before that it was 140 games. Win totals may be greater now because clubs have more games to play, meaning a team with more wins isn't necessarily better than a team with less wins. The equalizer is winning percentage.

Winning percentage is found by dividing the games a club has won by the total number of games played. For example, say Team A won 90 games in 1928 (140-game schedule) and Team B won 92 games in 1955 (154-game schedule.) Looking purely at the win total, some fans may say Team B was a better club, but is that true? If you divide 90 wins by 140 you get a winning percentage of ..643 - if you divide 92 by 154 you have a .597 winning percentage. These examples demonstrate why when comparing clubs across eras that more wins does not necessarily mean a better club.

To find the coefficients for the model I ran a regression analysis and used winning percentage as the dependent variable and WHIP, OPS, and RDiff as the independent variables. Remember, these coefficients will come from data from over 2,300 teams dating from 1900 through 2011. This means there is no bias towards clubs that played in a specific era.

The model: (.503+(OPS*.097)+(RDiff*.006)+(WHIP*-.053).

The result of the equation, which I'll call a Greatness Number, is a number similar to winning percentage and a figure that normalizes teams across eras by focusing specifically on skills (OPS, WHIP, RDiff) and not simply wins. Once you run the equation you can take the results and fairly compare clubs across eras.

As an example, let's compare the 1935 Detroit Tigers and the 1955 Brooklyn Dodgers. For the Tigers, the equation is (.503+(.801*.097)+(254*.006)+(1.44*-.053) = .660. For the Dodgers, the equation is (.503+(.804*.097)+(207*.006)+(1.29*-.053) = .640. Given the results, the '35 Tigers have a higher Greatness Number than the '55 Dodgers, despite having less wins and a lower winning percentage. This means the '55 Dodgers can be considered better than the '35 Tigers.

The relationship between the Greatness Number and winning percentage is 94.6% (per a correlation analysis).

Top 10 clubs in history ranked by Greatness Number

Top 10 clubs in history ranked by Winning Percentage
Over the next few days I'll unveil more details from my analysis of over 2,300 clubs, specifically information related to Philadelphia teams.

In the meantime, the 1939 Yankees carry the crown of the greatest baseball team to ever have played the game..

If we treated the Greatness Number as a winning percentage and assumed the '39 Yankees and '27 Yankees played similar 162-game seasons we can predict the '39 Yanks would finish ahead of the '27 Yanks by two games.

Formula: (GN*162)-162)

For the '39 Yankees: (.765*162)-162 = 124 wins and 38 losses.
For the '27 Yankees: (.750*162)-162 = 122 wins and 40 losses.
- Patrick Gordon is the editor of the Philadelphia Baseball Review. Contact him at pgordon@philadelphiabaseballreview.com or @Philabaseball on Twitter.

Tuesday, October 16, 2012

Exposing the myth of earned run average and batting average

By PATRICK GORDON | Managing Editor
@Philabaseball

Advanced statistical research has grown over the past decade with the progression of sabermetrics, yet traditional news outlets and baseball broadcasters remain reluctant to present new-age metrics. Paradoxically, the statistics mainstream baseball journalists and broadcasters interpret as important (i.e. batting average and earned run average) are mathematically insignificant in predicting team success. The simplicity of these timeless metrics distorts their actual relevancy. Unfortunately, journalists and broadcasters rely on the public’s understanding of rudimentary baseball statistics, so the faux-importance of nonessential metrics is prolonged.

The purpose of this article is twofold. First, using a regression analysis I will examine the relationship between wins, earned run average, and batting average. Secondly, I will identify specific metrics that correlate significantly to wins and team winning percentage. Once complete, I plan to use the findings of this study to construe a mathematical model using metrics strongly related to wins to appropriately rank baseball teams across leagues and generations. 

Henry Chadwick, a New York-based sportswriter, created the earned run average statistic in the early 1900s. Chadwick’s goal was to devise a metric that appropriated responsibility for earned runs that scored during a contest. The growth of relief pitching in the early portion of the century furthered Chadwick’s cause, allowing him to create a statistic that accounted for multiple pitchers performing in a single game. Prior to earned run average, pitching effectiveness was simply accounted for by tabulating wins and losses. 

The notion of a good earned run average has changed over time and is dependent on the era.  For example, today an ERA below 2.00 is stellar, but it was a common mark during the dead-ball era of the 1910s. Other factors, such as the designated hitter and ballpark affects, influence how earned run average is observed. 

Traditional journalists and broadcasters use earned run average as a means to indicate success for two primary reasons – familiarity and ease of calculation. As stated, baseball statisticians have been calculating earned run average since the early 1900s. Fans understand what constitutes a good ERA and a poor ERA. The formula is also simple to understand and compute: earned runs allowed multiplied by nine, divided by innings pitched ((ER*9)/IP)). 

Chadwick also created the batting average statistic. A derivative from cricket, Chadwick altered the cricket formula to better measure individual batting ability and found hits divided by at bats to be a suitable statistic (H/AB). Similar to earned run average, what constitutes a solid batting average is often reliant upon the specific generation. Today, a .300 batting average is considered excellent while .230 is regarded as poor.  

Similar to earned run average, forward-thinking baseball statisticians argue batting average is a poor metric in measuring offensive performance because it has a significantly weak relationship to runs scored.

Theoretically, the team with the highest winning percentage should rank among the top of the league in most statistical categories. Generally speaking, the majority of baseball fans look at batting average to indicate offensive success, and earned run average to indicate pitching success. True, both statistics can provide insight as to how a team performs, but analytically neither is a relevant predictor of winning. Both remain relevant because of their simplicity and familiarity. 

Using statistical data from Fan Graphs, I will import statistics for every Major League Baseball club from 1900 and through 2011. In total, the dataset will include information for 2,310 different teams. Using Excel, I will run a regression analysis using the following empirical model: 

The purpose of this study is to disprove the myth that batting average and earned run average are strong predictors of wins. Mathematically eliminating these two metrics as variables will allow for the strengthening of game prediction and team-ranking models. 
                 
                     ·         Dependent Variable
-          Wins
·         Independent Variables
-          OPS (+) On-Base-Percentage plus Slugging
-          Rdiff (+) Run Differential
-          ISO (+) Isolated Slugging
-          SPD (+) Speed Score
-          AVG (+) Batting Average
-          WHIP (-) Walks + Hits divided by Innings Pitched
-          SO-P (+) Strikeouts by pitching
-          K/9 (+) Strikeouts per nine innings
-          BB/9 (-) Walks per nine innings
-          K/BB (+) Strikeouts per walks
-          HR/9 (-) Homeruns per nine innings
-          FIP (-) Fielding Independent Pitching Statistic
 
I will examine the regression analysis and use T-statistics and significance values of each metric to conclude the importance level of each as they relate to wins. Fielding statistics are not included in the analysis because the available statistics are questionable. Fielding percentage does not include range and other such factors, nor are accurate fielding statistics available before the 1950s. 



Looking at the p-values, it is clear OPS, Rdiff and WHIP have a strong relationship to team wins. To the contrary, AVG and ERA both have p-values significantly removed from zero. 

You can also see OPS, Rdfiff and WHIP are the only categories with higher absolute value T-statistics of over two, demonstrating a strong relationship between the three categories and team win totals. To confirm the relationship, the only categories with P-values below .05 are OPS, Rdiff and WHIP. p-values of .05 or less are considered to represent differences that are less likely to occur by chance.

To reiterate, the absolute t-statistics for OPS, Rdiff and WHIP are each greater than two and the p-values are below .05, indicating a likely relationship with wins. None of the other variables indicated a strong statistical relationship (i.e., p < .05) with wins. Ironically, ERA and AVG contribute little to the prediction of a team's win/loss record.    

With these results we now know two unique pieces of information. First, AVG and ERA are nothing more than functional statistics that simply carry the benefit of familiarity. Neither metric correlates strongly to winning. Secondly, OPS, Rdiff and WHIP should be emphasized by traditional media outlets as statistics that relate to success.

Compared to earned run average, WHIP more directly measures a pitcher’s effectiveness against the batters faced. The metric was invented by Daniel Okrent in 1979 and serves as a staple in fantasy baseball. Unlike ERA, WHIP is not adjusted for ballpark effects.

On-base plus slugging (OPS) is a sabermetric rooted metric that combines on-base percentage and slugging average. The statistic measure a player’s ability to reach base and hit for power. Unlike the traditional batting average metric that does not differentiate between a single and a home run, OPS accounts for power. The statistic became popular in the mid-1980s and has become more recognized within the past decade.

Run differential is exactly what it sounds like. This statistic is relevant only for a team and is calculated by subtracting the number of total runs scored by total runs allowed.     

Baseball is a game of tradition followed by self-proclaimed purists. Earned run average and batting average are simplistic metrics embedded in the fabric of the sport. I am not trying to render these statistics meaningless, but I do implore mainstream media outlets and broadcasters to cease the perpetuation of earned run average and batting average as vitally important statistics when better alternatives exist. 

- Patrick Gordon  is the editor of the Philadelphia Baseball Review. Contact him at pgordon@philadelphiabaseballreview.com or @Philabaseball on Twitter.