Deep, Backward, and Square: November 2009

30 November 2009

Going downhill quickly

This post started off as quite a mundane analysis of workaday stats, but I think that analysis led to a reasonably interesting conclusion.

The catalyst was the recent match between India and Sri Lanka in Kanpur and, in particular, India's innings. They amassed 642, which proved quite enough to see Sri Lanka off by an innings and plenty; nevertheless, that impressive total was a bit of a comedown, given they had been 613/4 a dozen or so overs before they were bowled out. That's quite a collapse. A contributor to the Cricinfo text commentary summed it up rather neatly:

I have never seen such an unbalanced score card. When we look at the runs it is so lopsided on the the top that I feel my Cricinfo page will roll over by 180°.

I was party to some related discussions on both TMSB Exiles and Grockles. On the latter, Frome Exile came up with an interesting way of looking at the collapse: he worked out that there had only been one instance in test history of a larger absolute difference between the contributions of the first five partnerships and the last five. That was England's monstrous first innings of 849ao v. West Indies in Kingston, 1930, in which the fifth wicket fell at 720, meaning the first five wickets added 591 more than the last five (11 more than the corresponding discrepancy in India's Kanpur innings). In case anyone's interested, I've put a full list of test innings sorted in this way here.

A similar – though, I think, ever-so-slightly more informative – way of looking at the question is to concentrate on the relative difference between the amount of runs scored for the first five wickets and the amount scored for the second five – in other words, the proportion of the the final, all-out total that was contributed by the first five partnerships. The table below shows score at the fall of the fifth wicket, final total, and the relationship between the two for all all-out innings in test history (obviously, it doesn't make sense to ask the same question of innings that were declared or otherwise prematurely curtailed). The match at the top of the list is a famous, though statistically anomalous one, which only technically counts as an all-out innings.

The Sabina Park massacre aside, the most lopsided innings is one in which Australia's first 5 partnerships scored 61 times as many runs as the rest. In fact, the collapse was more dramatic than even that stat suggests, because the fifth-wicket partnership also realised no runs, so Australia – chasing 382 to win the match – fell from 305/3 to 310ao. The prime architect of Australia's demise was Sarfraz Nawaz, who took 7/1 in a spell of 33 balls, to finish with career-best figures of 9/86. Some sources refer to Sarfraz's feat as the first great spell of reverse-swing bowling in tests; others note that he took the new ball just before the rampage began, which would make such an interpretation unlikely. One way or another, it was a sensationally effective burst.

India's Kanpur innings is 24th on the list, one of 30 test innings in which the last five wickets contributing less than five percent of the all-out total. At the other end of the table, there are seven instances of the first five partnerships providing 10% or less of the final score. That entry at the bottom isn't, as I immediately imagined, Trueman's debut, when India were famously reduced to 0/4; it's the fourth match of that series, in which they managed a whole 6 runs before losing their 4th wicket, but lost their 5th the next ball. On this occasion, it was Trueman's opening partner, Alec Bedser, who was the main destroyer; Norman Preston's Wisden write-up describes the carnage – and the subsequent revival led by Indian skipper Vijay Hazare – in detail.

Here's where the story begins to get a bit interesting. Having assembled all these stats, I casually had a glance at the typical relationship between these two variables (the amount of runs scored for the first five wickets, and the amount of runs scored in the remainder of the innings). The results were nothing like what I was expecting.

It turns out that score at the fall of the 5th wicket is a terrible predictor of the amount the last five wickets will contribute. I would have imagined that instances in which the first half of an innings was high-scoring would – noteworthy collapses aside – have been those in which the second half also went well for the batting team. Similarly, you'd guess that, if the first 5 fall over cheaply, the tail are unlikely to contribute much. It turns out that you can't make those sorts of assumption at all.

The graph below shows every all-out innings in test history, with runs for the first five wickets on the x-axis plotted against runs for the last 5 on the y. Before generating this plot, I expected to see a fairly noticeable positive correlation, with the datapoints lining up from the origin of the graph in a positive trend up and right. No such thing. It's all scatter and no plot, and r² (which quantifies the strength of association between two variables – see this earlier post) is a dismal 0.0041.

Figure 1: All-out test innings – runs for the first five wickets and runs for the last five

FallOf5th

If you stick a linear regression line through the dataset (as I have, above), you get y = 95.5 + 0.0414x, which means that, at the fall of the fifth wicket, our best guess of what the all-out total will be is

runs scored so far + (0.0414 × runs scored so far) + 95.5

... but what this analysis shows very clearly is that you'd be an idiot to head off down the bookie's armed with that equation, because our best guess is dreadful. In fact, if it tells us anything, it suggests that dramatic collapses and dramatic tail-wagging are much more likely than you might imagine (maybe there's some value to be had there!) The lesson is clear: what happens in the first half of an innings tells us nothing about what we can expect in the second half. For example, on average throughout test history, whenever the fifth wicket has fallen at a score between 50 and 99, the remaining batsmen have added a further 95.7; whenever the first five partnerships have realised between 400 and 449, the last five wickets have typically amassed... 95.3.

Further investigation shows that this finding is not confined to the fifth wicket. At any stage of a test innings, what has happened up until the fall of a given wicket is a useless predictor of what's going to happen afterwards. Figure 2 shows analogous graphs to that shown above for all other wickets. In every case, there's a whole lot of noise and no noticeable signal. The highest r² for any of these analyses is that calculated for the ninth wicket – just 0.0053 (that is: only half a percent of the variance in the tenth-wicket partnership scores is explained by variability in totals at the fall of the ninth).

Figure 2: All-out test innings – correlations between scores before and after the loss of each wicket

FoW

I don't quite know what to make of these findings. At extreme ends of the spectrum, one can just about understand how two halves of an innings might compensate for each other. As Wickham observed on Grockles,

Part of the explanation may be that tail-enders are more likely to dig in if the top-order batsmen have scored relatively few runs and that this tendency helps to counteract the impact of wickets which are more difficult to bat on.

I am sure this is a useful observation. It seems to me that the reverse may be true, as well: if the top order have scored heavily, maybe the tail play with abandon in search of the quick runs that the match situation is likely to demand, and thereby score less heavily than they might have done. I don't know that these explanations help us with the majority of test innings, however. After all, most times, the tail are neither digging in for grim death nor swinging with carefree abandon.

The alternative explanation is that we massively overinterpret those factors we identify as significant in shaping an innings. We watch five batsmen fall quickly, and we conclude that the wicket is unreliable, or the bowlers irresistible; when the runs have come easily for the top order, we imagine that the conditions are favourable, or the attack toothless. But maybe we've got a bad appreciation of the random and – pace Louis MacNeice – cricket is crazier and more of it than we think. This is far from the first time that, having had a good dig into the evidence, I've reached the conclusion that the game is far more susceptible to dumb luck than we ever acknowledge.

Nervous noughties

Over on TMSB Exiles, groovyshortlegs noted

[Doug] Bollinger faces his first ball in ODIs, in his 5th match, and [he's] run out without facing another. He has one test, one innings 0*. 6 internationals, no runs, what's the longest streak I wonder?

Turns out the answer to groovy's question is Venkatesh Prasad. He had played in 12 internationals (all ODIs) before he registered his first international run. That run consisted of 2 ducks and 10 DNBs. Nathan Bracken played 11 ODIs before he even got a bat, but got off the mark at the first time of asking. Roger Harper played in 9 matches (3 ducks) before he scored his first run.

The players who actually got to the crease most times without scoring at the beginning of their careers were Brendon Bracewell (who started out with 5 0s - 4 in tests, 1 in an ODI) and Commandur Rangachari (who had 5 ducks before finally scoring in his fourth test).

At least Bollinger's matches have all come in 2009. Some players have taken up to a dozen years to get off the mark in international cricket. Top of the lot is Barbadian George Carew, who made his test debut for the West Indies v. England in Bridgetown on 8th January 1935. He was dismissed for nought in the first innings, and did not get the chance to bat in the second (due to what looks, from this distance, like a rather generous declaration). It was 1948 before he got another chance to break his duck, but it was an opportunity he seized, scoring 107 v. England in Trinidad. History has paid rather more attention to his opening partner, Andy Ganteaume (whose century remains the only one scored in a batsman's only test innings), but Carew set a record of his own by taking so long between his international bow and his first run.

Altogether, 65 players have seen at least a year go by between their international debut and the match in which they finally broke their duck:

Name InternationalDebut FirstRun Days 1. GM Carew 08/01/1935 11/02/1948 4,782 2. WH Copson 24/06/1939 16/08/1947 2,975 3. GP Swann 23/01/2000 01/10/2007 2,808 4. CM Bandara 27/05/1998 10/12/2005 2,754 5. TWJ Goddard 25/07/1930 24/07/1937 2,556 6. A Mishra 13/04/2003 06/11/2008 2,034 7. WCA Ganegama 08/04/2001 26/01/2006 1,754 8. Saiful Islam 31/12/1990 05/04/1995 1,556 9. G Duckworth 26/07/1924 11/08/1928 1,477 10. GRA de Silva 14/06/1975 09/06/1979 1,456 11. Sarfraz Nawaz 06/03/1969 29/12/1972 1,394 12. Shabbir Ahmed 19/09/1999 18/05/2003 1,337 13. RA Gaunt 24/01/1958 17/08/1961 1,301 14. GJ Hopkins 29/06/2004 20/12/2007 1,269 15. RJ Bright 30/03/1974 04/06/1977 1,162 16. PR Downton 23/12/1977 13/02/1981 1,148 17. MG Webb 05/03/1971 01/03/1974 1,092 18. DJG Sammy 08/07/2004 07/06/2007 1,064 19. AS Luseno 06/04/2003 01/03/2006 1,060 20. HT Davis 18/04/1994 14/02/1997 1,033 21. JP Yadav 06/11/2002 26/08/2005 1,024 22. NW Bracken 11/01/2001 26/10/2003 1,018 23. Munir Malik 04/12/1959 05/07/1962 944 24. TG Shaw 10/11/1991 08/04/1994 880 25. GB Troup 16/10/1976 16/02/1979 853 26. A Nehra 24/02/1999 07/06/2001 834 27. Arshad Khan 01/02/1993 11/04/1995 799 28. RD Jackman 13/07/1974 28/08/1976 777 29. HAPW Jayawardene 28/06/2000 21/07/2002 753 30. GB Legge 24/12/1927 10/01/1930 748 31. Tauseef Ahmed 27/02/1980 05/03/1982 737 32. BW Hilfenhaus 14/01/2007 16/01/2009 733 33. G Geary 26/07/1924 10/07/1926 714 34. ML Patel 14/09/2004 15/08/2006 700 35. PJ Ongondo 30/09/1999 18/08/2001 688 36. CPH Ramanayake 08/03/1986 02/01/1988 665 37. G Noblet 03/03/1950 22/12/1951 659 38. Shakeel Ahmed 01/05/1993 07/02/1995 647 39. PE McIntyre 26/01/1995 10/10/1996 623 =39. GN de Silva 30/04/1983 12/01/1985 623 41. JM Patel 26/02/1955 19/10/1956 601 42. W Mwayenga 24/11/2002 29/05/2004 552 =42. ML Nkala 27/09/1998 01/04/2000 552 44. Abdur Rauf 02/02/2008 04/07/2009 518 45. DB Close 23/07/1949 22/12/1950 517 46. H Verity 29/07/1931 02/12/1932 492 47. TB Mitchell 10/02/1933 08/06/1934 483 48. M Hayward 18/08/1998 09/12/1999 478 49. CL White 09/10/2005 14/01/2007 462 50. PK Lee 18/12/1931 23/02/1933 433 51. AL Logie 19/12/1981 23/02/1983 431 52. RS Ghai 05/12/1984 26/01/1986 417 53. PS Vaidya 22/02/1995 12/04/1996 415 54. KS More 05/12/1984 18/01/1986 409 55. Alamgir Kabir 21/07/2002 27/08/2003 402 56. GI Allott 13/01/1996 06/02/1997 390 57. Sarandeep Singh 25/11/2000 19/12/2001 389 58. SP Gupte 30/12/1951 21/01/1953 388 59. WB Phillips 22/10/1982 11/11/1983 385 60. E Otieno 18/10/2007 31/10/2008 379 61. MG Hughes 13/12/1985 26/12/1986 378 62. DI Joyce 13/06/2006 23/06/2007 375 63. VM Muddiah 12/12/1959 16/12/1960 370 64. GP Wickramasinghe 31/12/1990 02/01/1992 367 65. AP Grayson 10/10/2000 10/10/2001 365

By the way, the player who had most FC games under his belt without scoring had the misfortune never to record that elusive first run. John Howarth played 13 matches for Nottinghamshire in 1966 & 1967; in that time he registered 7 ducks (3 not-out) and 16 DNBs, but he never saw the scoreboard tick over next to his number. A similar fate was met by Somerset's Seymour Clark, who may have only played 5 FC games but, in that time, he managed 2 more innings than Howarth without ever scoring a run. It's probably not surprising that he got dropped after that.

UPDATE: Doug Bollinger finally got off the mark in the second innings of Australia's third test v. West Indies at Perth. That was his fourth visit to the crease in his tenth international game, meaning he's ended up in equal-third place with Roger Harper and RL Sanghvi.

Anni mirabiles

This series of posts was inspired by the notable exploits of Marcus Trescothick in 2009 (well, we Somerset supporters think they're notable, whether you do or not). He was the English season's leading scorer, amassing 2,934 runs across all competitions, and deserved winner of the PCA's 2009 MVP award as well as their Players' Player of the Year.

One particular milestone is that Trescothick's career batting average has, at the end of the current season, risen above 40, for the first time since he opened his first-class account (in a remarkable 1993 match that Somerset fans remember fondly for reasons that have nothing to do with the debutant). Admittedly, this doesn't sound like so momentous a watershed for a performer of supposed world class, especially in this run-heavy era. These days, after all, there are players who consistently average over 40 without causing much more than a faint blip on the international radar. However, to understand quite what an achievement it has been for Trescothick to drag his average above this level, it needs to be emphasised quite how poor – and quite how prolongedly poor – the first part of his career was.

When he made his test debut – v. West Indies in 2000 – Trescothick had a batting average of 30.60 from 168 first-class innings. (The identification of a true talent hiding beneath that less-than-mediocre record is one of the few shreds of evidence that Duncan Fletcher knows the first thing about cricket, if you ask me.) Trescothick, of course, made a confident 50 in his first test innings, and never looked back. For the next few years, his cricket was played almost exclusively for England (in 2004, he didn't play a single game for Somerset) and his successful international career provided a substantial fillip to his average. By the time he played his last first-class game in England colours (that infamous encounter with Pakistan at the Oval in 2006), his average had risen to 35.75 from 360 FC knocks.

Since the beginning of 2007 – for reasons that have been documented, where they have not been fabricated, elsewhere – Trescothick has been a Somerset cricketer exclusively. Undoubtedly, England have been the poorer for his absence; you'll forgive us Taunton regulars a little parochialism if we don't entirely share the regret felt by cricket fans in other corners of the country. Across those three seasons, he has scored 4,418 runs in 48 FC matches at an average of 60.52.

2009 has been Trescothick's best season yet. He has scored 1,817 FC runs at 75.70, with 8 centuries. Only Jimmy Cook (11 in 1991 & 9 in 1990), Bill Alley (10 in 1961), and Viv Richards (9 in 1985) have scored more tons in a season for Somerset, and each of them had more matches in which to do so. In the process, he has passed 10,000 FC runs for the county. And when he was dismissed for 102 in the match against Lancashire at Taunton earlier this month, his average – at long last, 436 innings into his FC career – rose above 40.

So what may be remarkable about Trescothick's career trajectory is how he has slowly-but-steadily dragged himself up by the pad-straps from a distinctly unpromising beginning to arrive at a record that begins to do justice to his recognised abilities. Over on (everyone's favourite Somerset CCC messageboard) www.grockles.com, Loyal of Lhasa commented:

MT has now played for almost seventeen seasons and it is in this season that he has scored 10% of his career aggregate. I'm not sure how significant that is...

This struck me as a rather interesting question, so I did a bit of research on it, which led on to a bunch of additional questions.

There are two important notes to make about the analyses that follow. Firstly, they all revolve around calendar years rather than seasons. This is partially for the bad reason that it's much easier to extract years from my database, but also for the good reason that it makes observations slightly less noisy (lessening the possibility that a few freak innings might dominate the period in question). Secondly, all these analyses look at run aggregates alone when, in several instances, we might more naturally think in terms of batting averages. After all, no two years provide exactly the same opportunity to score runs, which is where the average comes in handy. My reliance on year-end aggregates isn't because I believe them to be some sort of superior metric for measuring a batsman's quality; it's just because they're what I'm interested in at the moment, in the light of Trescothick's high-scoring year. Most of the analyses could be rerun using averages, etc.; if anyone's interested to see them, let me know.

A very good year

It turns out that the fact that Trescothick scored 10% of his career runs in a single season may not be very significant at all. More than half of the 838 batsmen with 10,000 FC runs or more have gone through a calendar year in which they scored a higher proportion of their career runs than Trescothick has this year. Top of the lot is David Hussey, who racked up over a quarter of his life's runs in 2007. Below are the top few and the bottom few and a few notable anni mirabiles ('scuse slight Somerset bias) from in-between:

Table 1: The proportion of a batsman's career runs scored in his highest-scoring calendar year

Name CareerYrs CareerRuns BestYr BestYrRuns % 1. DJ Hussey 7 10,048 2007 2,722 27.1% 2. AR Morris 14 12,614 1948 3,149 25.0% 3. FMM Worrell 22 15,025 1950 3,738 24.9% 4. VM Merchant 21 13,470 1946 3,224 23.9% 5. J Darling 14 10,635 1899 2,530 23.8% 6. CF Walters 13 12,145 1933 2,832 23.3% 7. PA Jaques 9 11,707 2005 2,728 23.3% 8. ED Weekes 19 12,010 1950 2,749 22.9% 9. BC Booth 14 11,265 1964 2,536 22.5% 10. RM Cowper 11 10,595 1964 2,344 22.1% 11. CJL Rogers 12 12,464 2006 2,705 21.7% 12. J Ryder 21 10,501 1921 2,247 21.4% 13. JC Adams 18 11,234 1994 2,375 21.1% 14. WJ Cronje 13 12,103 1995 2,551 21.1% 15. CG Macartney 21 15,019 1921 3,147 21.0% 16. CH Gayle 12 11,256 2001 2,325 20.7% 17. MD Crowe 16 19,608 1987 4,045 20.6% 18. GHG Doggart 14 10,054 1949 2,063 20.5% 19. KR Stackpole 14 10,100 1972 2,053 20.3% 20. WA Brown 16 13,838 1938 2,793 20.2% ... 31. VT Trumper 19 16,939 1902 3,220 19.0% ... 56. BC Lara 20 22,156 1994 3,828 17.3% ... 68. WH Ponsford 14 13,819 1930 2,311 16.7% ... 84. RN Harvey 18 21,699 1953 3,506 16.2% ... 108. DG Bradman 20 28,067 1930 4,368 15.6% ... 118. SJ Cook 24 21,143 1991 3,234 15.3% ... 172. MEK Hussey 16 19,242 2001 2,711 14.1% ... 265. DCS Compton 22 38,942 1947 4,962 12.7% ... 456. ME Trescothick 17 16,645 2009 1,817 10.9% ... 522. L Hutton 19 40,140 1948 4,167 10.4% ... 595. GS Sobers 22 28,314 1968 2,745 9.7% ... 701. WR Hammond 26 50,551 1933 4,422 8.7% ... 714. H Sutcliffe 22 50,670 1932 4,373 8.6% ... 716. GA Hick 26 41,112 1988 3,540 8.6% 717. SR Waugh 21 24,052 1988 2,071 8.6% ... 725. IVA Richards 22 36,212 1976 3,080 8.5% ... 827. G Boycott 25 48,426 1970 3,109 6.4% 828. PA Perrin 29 29,709 1906 1,893 6.4% 829. EM Grace 33 10,025 1883 638 6.4% 830. CT Radley 24 26,441 1975 1,667 6.3% 831. A Jones 27 36,049 1963 2,159 6.0% 832. KWR Fletcher 27 37,665 1968 2,248 6.0% 833. G Gunn 27 35,208 1908 2,032 5.8% 834. FE Woolley 29 58,959 1929 3,389 5.7% 835. WG Quaife 31 36,012 1905 2,060 5.7% 836. JB Hobbs 26 61,760 1914 3,524 5.7% 837. DB Close 35 34,994 1959 1,990 5.7% 838. WG Grace 44 54,211 1871 2,739 5.1% figures correct at end of English FC season 2009; full list available here

The problem with this mode of analysis is that it doesn't take account of the length of a batsman's career. As Loyal of Lhasa responded,

at the end of one's first season one has by definition scored 100% of one's career runs in a single season.

Accordingly, it is no surprise to see some of the longest FC careers at the bottom of the list (to have joined the 10% club, Grace, Hobbs, or Woolley would have had to amass more than 5,000 of their runs in a single year). Back on www.grockles.com, Frome Exile made some similar comments:

[David] Hussey has a quarter of his runs in one seventh of his "years" which is hardly as remarkable as [Arthur] Morris' quarter of his runs in one fourteenth of his years.

Perhaps a more interesting analysis, then, would be one that attempted to capture the extent to which a batsman's best season was truly exceptional, when compared with the rest of his career. Perhaps you could weight the percentage figure according to the number of seasons played; mathematically, this is identical to finding the batsman's mean year score (i.e. what you'd typically expect him to score in any given year), and calculating the ratio of the exceptional year to the typical one. For example, when WG Grace scored 2,739 runs at the height of his early-career rampage in 1871, that was over twice as much as the 1,232 runs he scored per year on average throughout his career; in contrast, Hobbs's 1914 aggregate was only 48% higher than the 2,375 runs you could expect from him in any given year. So, of course, Frome Exile is entirely right: David Hussey's 2007 runfest amounted to a total that was 90% higher than his typical year's tally, whereas Morris's 1948 aggregate was three-and-a-half times higher than he managed, on average, through his career.

There's a problem with taking a simple ratio of extreme:mean, though, and that's that what makes an extreme observation unusal is not just its relationship to the mean, but how it corresponds to the whole set of observations. Statisticians would say we are interested in how the variable is distributed. To illustrate the importance of this, consider the following: The mean weight of males aged 20 or over is 13st 12lb, so someone who is 20% above average would be 16st 9lb (i.e. a bit of a porker, but nothing outlandish: the kind of bloke you pass in the street every day without thinking anything of it). On the other hand, the mean height of males aged 20 or over is 5'9", but someone who is 20% above average would be 6'11" (i.e. really quite unusually monstrous: the kind of bloke you certainly do notice when you pass him in the street). So the same 20% increment can produce results that are more or less exceptional, depending on the way the observations are distributed.

Fortunately, there is a well established way to quantify the exceptional-ness of a particular observation within a distribution. That method is the z-score. To calculate a z-score, one needs to know the mean and the standard deviation of the distribution in question; the z-score is simply the number of standard deviations between the extreme observation and the mean.

(Note for stats-heads: I did a bit of analysis, and it turns out that calendar-year aggregates tend to be pretty normally distributed. Although there's no reason why z-scores can't be calculated for asymmetric – and other non-normal – distributions, it's reassuring to know that we're not dealing with anything kinky, here.)

Calculating the z-score for each batsman's most productive year gives the following list:

Table 2: Batsmen's highest-scoring calendar year, compared to the rest of their careers, in terms of z-score

Name CareerYrs CareerRuns Avg. ± SD BestYr BestYrRuns z 1. CK Nayudu 41 11,825 288 ± 346 1932 1,893 4.64 2. B Sutcliffe 25 17,447 698 ± 723 1949 3,493 3.87 3. FMM Worrell 22 15,025 683 ± 792 1950 3,738 3.86 4. Mansoor Akhtar 23 13,804 600 ± 494 1987 2,328 3.50 5. Saleem Malik 22 16,586 754 ± 555 1991 2,693 3.49 6. J Ryder 21 10,501 500 ± 514 1921 2,247 3.40 7. BC Lara 20 22,156 1108 ± 811 1994 3,828 3.35 8. MH Mankad 29 11,593 400 ± 300 1946 1,402 3.34 9. ED Weekes 19 12,010 632 ± 641 1950 2,749 3.30 10. RF Pienaar 21 10,896 519 ± 468 1989 2,061 3.29 11. KD Mackay 19 10,823 570 ± 459 1956 2,079 3.29 12. VM Merchant 21 13,470 641 ± 790 1946 3,224 3.27 13. DP Hughes 23 10,419 453 ± 270 1982 1,303 3.15 14. JC Adams 18 11,234 624 ± 569 1994 2,375 3.08 15. P Roy 22 11,868 539 ± 470 1959 1,979 3.06 16. RN Harvey 18 21,699 1206 ± 753 1953 3,506 3.06 17. A Shrewsbury 27 26,505 982 ± 505 1887 2,520 3.05 18. RG Pollock 28 20,940 748 ± 338 1965 1,757 2.98 19. AW Nourse 30 14,216 474 ± 538 1924 2,076 2.98 20. HW Taylor 23 13,105 570 ± 501 1924 2,042 2.94 ... 56. VT Trumper 19 16,939 892 ± 879 1902 3,220 2.65 ... 68. SJ Cook 24 21,143 881 ± 913 1991 3,234 2.58 ... 81. DCS Compton 22 38,942 1770 ± 1272 1947 4,962 2.51 82. AR Morris 14 12,614 901 ± 896 1948 3,149 2.51 ... 87. GA Hick 26 41,112 1581 ± 791 1988 3,540 2.48 88. DG Bradman 20 28,067 1403 ± 1202 1930 4,368 2.47 ... 125. IT Botham 20 19,399 970 ± 469 1982 2,056 2.32 ... 145. WG Grace 44 54,211 1232 ± 669 1871 2,739 2.25 ... 156. SM Gavaskar 22 25,834 1174 ± 771 1971 2,890 2.23 ... 171. H Sutcliffe 22 50,670 2303 ± 941 1932 4,373 2.20 ... 204. FE Woolley 29 58,959 2033 ± 638 1929 3,389 2.13 ... 239. GA Gooch 26 44,846 1725 ± 871 1990 3,523 2.07 ... 359. MEK Hussey 16 19,242 1203 ± 798 2001 2,711 1.89 ... 369. ME Trescothick 17 16,645 979 ± 447 2009 1,817 1.87 ... 402. WH Ponsford 14 13,819 987 ± 727 1930 2,311 1.82 ... 410. GS Sobers 22 28,314 1287 ± 804 1968 2,745 1.81 ... 462. G Boycott 25 48,426 1937 ± 674 1970 3,109 1.74 ... 479. IVA Richards 22 36,212 1646 ± 833 1976 3,080 1.72 ... 514. WR Hammond 26 50,551 1944 ± 1468 1933 4,422 1.69 ... 576. L Hutton 19 40,140 2113 ± 1275 1948 4,167 1.61 ... 581. DJ Hussey 7 10,048 1435 ± 803 2007 2,722 1.60 ... 609. KS Ranjitsinhji 15 24,692 1646 ± 1044 1899 3,284 1.57 ... 704. JB Hobbs 26 61,760 2375 ± 798 1914 3,524 1.44 ... 819. KP Pietersen 12 11,026 919 ± 564 2004 1,567 1.15 820. FC Holland 15 10,384 692 ± 381 1903 1,129 1.15 821. W Bates 11 10,249 932 ± 440 1885 1,433 1.14 822. E Cooper 10 13,304 1330 ± 515 1949 1,916 1.14 823. AJW Croom 18 17,692 983 ± 529 1931 1,584 1.14 824. HE Dollery 17 24,414 1436 ± 576 1949 2,084 1.13 825. KL Hutchings 11 10,054 914 ± 744 1908 1,744 1.12 826. B Lilley 15 10,496 700 ± 338 1928 1,074 1.11 827. RDB Croft 21 12,365 562 ± 216 1995 801 1.10 828. MS Nichols 16 17,823 1114 ± 501 1933 1,661 1.09 829. HW Stephenson 17 13,195 776 ± 336 1953 1,143 1.09 830. SA Marsh 18 10,098 561 ± 322 1990 911 1.09 831. R Kilner 13 14,707 1131 ± 420 1913 1,586 1.08 832. G Barker 18 22,286 1238 ± 468 1960 1,741 1.07 833. JT Brown 16 17,920 1120 ± 707 1896 1,873 1.07 834. D Brookes 21 30,874 1470 ± 720 1952 2,229 1.05 835. A Hamer 12 15,465 1289 ± 575 1959 1,850 0.98 836. LTA Bates 19 19,380 1020 ± 511 1926 1,518 0.97 837. C Lee 13 12,129 933 ± 601 1962 1,503 0.95 838. A Young 15 13,159 877 ± 390 1930 1,219 0.88 figures correct at end of English FC season 2009; full list available here

CK Nayudu's place at the top of this list is easily explained: for the majority of his career (a lengthy one: he played his last FC match aged 68!), Nayudu played a handful of games per year in India, amassing no more than a few hundred runs each time. Then, in 1932, he took part in India's inaugural test tour of the British Isles (during which he had the honour of captaining them in their first test match). That year, he played 55 FC innings, accumulating almost 2,000 runs. This single year is completely inconsistent with Nayudu's otherwise gentle career (it would be even more incongruous if he had not returned to England four years later, picking up just over 1,000 runs along the way).

Not far below Nayudu, we find Brian Lara's astonishing 1994 (the year during which, most memorably, he broke the records for both test and FC high-scores). Altogether, that year, he amassed almost 4,000 runs and, though the rest of his career was hardly a bust, that aggregate is more than twice as many as he managed in any other calendar year.

The player at the bottom of the list happens to be a Somerset player: "Tom" Young. Once he had established himself in the side of the 1920s, his season's aggregate was always very close to 1,000 runs so, when he recorded his personal best of 1,219 in 1930, it was the least exceptional best year of any collected here.

Figure 1: BC Lara and A Young – first-class runs per calendar year

BCL 1994

So what about Trescothick? His 2009 is in the top half of the list, but there's nothing too remarkable about it. It is quite obvious that, impressive though it was, his best year does not share the sore-thumb status we saw in Lara's record.

Figure 2: ME Trescothick – first-class runs per calendar year

MET

If we make one key assumption about batsmen's calendar-year aggregates (that they are normally distributed – which, by and large, they appear to be), we can quantify just how unexpected it was. A z-score of 1.87 corresponds to a probability of 0.969 (see here for an online calculator). In turn, this means that – in a career like Trescothick's, with a mean year's aggregate of 979 ± 447 – we would expect a batsman to score 1,817 runs or more one year in every [1/(1-p)=] 33 he played, as a result of simple random variation (i.e. even if his opportunities to accumulate runs – as well as his ability to score them – had remained constant over time). If Nayudu's career had followed a consistent course (without incongruous trips abroad), we would have expected one year like his 1932 every 575,753 years!

So Trescothick's 2009 falls into the category of good but not stand-out exceptional. To the extent that it fails to confirm our hypothesis about his apparently extraordinary year, this is a disappointing finding. On the other hand, it is arguably to Trescothick's credit: after all, it is only possible to set a conspicuous high-water mark if it stands in contrast to a typically lower level of achievement.

Indian summers

The next development came, once more, as a result of a comment from Loyal of Lhasa. He said,

I thought it interesting that Trescothick had probably taken rather a long time to get to [his best season], for he was hardly a speedy starter....

So I rejigged the stats, to see how far into each player's career his best calendar year came. The table below shows how many years each batsman had been playing for – BestYrNo – when he compiled the largest aggregate of his career (this is a count of years in which the player in question actually played FC cricket so, e.g., anyone whose career straddled a war does not have those years added to his total). As before, the dataset is limited to those amassing at least 10,000 FC runs.

Table 3: How late in his career each batsman's highest-scoring calendar year came

Name CareerYrs CareerRuns BestYr BestYrRuns BestYrNo 1. HTW Hardinge 28 33,519 1928 2,446 23 =2. EM Grace 33 10,025 1883 638 22 =2. JA Newman 24 15,364 1928 1,773 22 =4. R Abel 24 33,128 1901 3,309 21 =4. JH Board 25 15,674 1911 1,184 21 =4. GAR Lock 26 10,342 1966 831 21 =7. CP Mead 28 55,061 1928 3,745 20 =7. FE Woolley 29 58,959 1929 3,389 20 =7. CA Milton 27 32,150 1967 2,089 20 =7. JC Balderstone 24 19,034 1982 1,482 20 =7. SJ Cook 24 21,143 1991 3,234 20 =12. S Coe 24 17,438 1914 1,258 19 =12. AW Nourse 30 14,216 1924 2,076 19 =12. A Ducat 20 23,373 1930 2,067 19 =12. B Mitchell 22 11,395 1947 2,243 19 =12. DR Turner 24 19,005 1984 1,365 19 =12. AR Butcher 22 22,667 1990 2,116 19 =18. PF Warner 26 29,028 1911 2,274 18 =18. FA Pearson 23 18,734 1921 1,498 18 =18. JR Freeman 20 14,602 1926 1,958 18 =18. WRD Payton 23 22,132 1926 1,864 18 =18. AS Kennedy 26 16,586 1928 1,437 18 =18. Imtiaz Ahmed 24 10,393 1962 1,646 18 =18. EJ Barlow 25 18,212 1976 1,965 18 =18. JH Hampshire 24 28,059 1978 2,105 18 =18. P Willey 26 24,361 1983 2,036 18 =18. M Amarnath 24 13,747 1983 1,620 18 =18. GA Gooch 26 44,846 1990 3,523 18 =18. AJ Stewart 23 26,165 1998 1,986 18 =18. SC Ganguly 19 14,933 2007 1,391 18 =31. J Vine 23 25,171 1912 1,887 17 =31. EJ Smith 22 16,997 1925 1,477 17 =31. WE Astill 30 22,735 1926 2,218 17 =31. JWH Makepeace 21 25,799 1926 2,340 17 =31. EH Hendren 27 57,611 1928 4,024 17 =31. AE Dipper 21 28,075 1928 2,365 17 =31. CK Nayudu 41 11,825 1932 1,893 17 =31. VM Merchant 21 13,470 1946 3,224 17 =31. RWV Robins 25 13,884 1946 1,397 17 =31. GO Dawkes 18 11,411 1960 964 17 =31. B Constable 20 18,849 1961 1,799 17 =31. MH Denness 22 25,886 1975 1,904 17 =31. RW Taylor 27 12,065 1976 837 17 =31. JA Ormrod 24 23,206 1978 1,535 17 =31. BF Davison 21 27,453 1983 2,341 17 =31. P Carrick 22 10,300 1988 815 17 =31. NE Briers 22 18,726 1990 1,996 17 =31. HP Tillakaratne 22 13,258 2001 1,554 17 =31. S Chanderpaul 18 17,569 2008 1,709 17 =31. ME Trescothick 17 16,645 2009 1,817 17 ... =51. GS Sobers 22 28,314 1968 2,745 16 ... =109. H Sutcliffe 22 50,670 1932 4,373 14 ... =109. WR Hammond 26 50,551 1933 4,422 14 ... =272. DB Close 35 34,994 1959 1,990 11 ... =340. JB Hobbs 26 61,760 1914 3,524 10 ... =340. WH Ponsford 14 13,819 1930 2,311 10 ... =340. L Hutton 19 40,140 1948 4,167 10 ... =423. G Boycott 25 48,426 1970 3,109 9 ... =514. DCS Compton 22 38,942 1947 4,962 8 ... =514. RN Harvey 18 21,699 1953 3,506 8 ... =514. MEK Hussey 16 19,242 2001 2,711 8 ... =597. WG Grace 44 54,211 1871 2,739 7 ... =597. VT Trumper 19 16,939 1902 3,220 7 ... =597. BC Lara 20 22,156 1994 3,828 7 ... =687. GA Hick 26 41,112 1988 3,540 6 ... =746. IVA Richards 22 36,212 1976 3,080 5 ... =746. SR Waugh 21 24,052 1988 2,071 5 ... =795. DG Bradman 20 28,067 1930 4,368 4 ... =820. EW Dillon 15 11,006 1902 1,655 3 =820. R Kilner 13 14,707 1913 1,586 3 =820. TC Dodds 17 19,407 1947 2,147 3 =820. WGA Parkhouse 17 23,508 1950 2,284 3 =820. FA Lowson 10 15,321 1951 2,373 3 =820. RG Broadbent 14 12,800 1952 1,556 3 =820. AA Baig 20 12,367 1959 1,890 3 =820. JM Parker 14 11,254 1973 1,847 3 =820. DW Hookes 17 12,671 1977 1,634 3 =820. BA Edgar 15 11,304 1978 1,392 3 =820. RJ Blakey 19 14,674 1987 1,456 3 =820. UC Hathurusingha 16 10,862 1991 1,325 3 =820. ML Hayden 19 24,603 1993 2,760 3 =820. DR Martyn 16 14,630 1993 1,617 3 =834. GHG Doggart 14 10,054 1949 2,063 2 =834. AJ Moles 12 15,305 1987 2,238 2 =834. DP Ostler 14 10,856 1991 1,284 2 =834. AD Brown 18 15,806 1993 1,382 2 =834. A Symonds 16 14,477 1995 1,724 2 figures correct at end of English FC season 2009; full list available here

Trescothick is, indeed, one of the higher entries, in equal thirty-first place out of 838, amongst those who waited until their 17th year to post their highest aggregate. There are some good players who took even longer, though, including Bobby Abel, Frank Woolley, Arthur Milton, Graham Gooch, and Alec Stewart. Another of Somerset's star openers, Jimmy Cook, is in the upper reaches the list, due to the obvious circumstances of his career.

At the very top of the list is one of Kent's greatest runscorers (and one of Tottenham Hotspur's shortest-lived managers), Wally Hardinge. He first appeared for Kent in 1902 and, from 1911 to 1931, never failed to pass 1,000 runs per summer for them. However, it wasn't until 1928 – the 23rd of his 28 years playing FC cricket – that he reached his zenith, with 2,446 runs at 59.65, including the highest score of his FC career, 263* v. a pretty strong Gloucestershire attack.

It seems that good batsmen very seldom give up when their runscoring powers are in the ascendancy. Amongst the 10,000 club, there's only one player who stopped playing immediately after his highest-scoring year, and that's Derbyshire stalwart Alan Hill. Like Hardinge, his best season also included his highest FC innings, 172* v. Yorkshire at Sheffield.

At the bottom of the list, it turns out that no one who has gone on to rack up 10,000 FC runs has failed to surpass his debut year, but there are a few who never got past their sophomore effort. For some reason, this group appears to have a strong association with Birmingham.

Getting better all the time?

So much for viewing Trescothick's 2009 as a one-off burst of brilliance; what about the alternative view that it was merely the latest manifestation of an inexorable rise? Looking back at figure 2, it appears to be the case that his runscoring capacity has increased over the course of his career. But can we do something a bit more informative than eyeballing an apparent trend?

To anyone with a bit of stats about them, the answer is pretty obvious. We have two variables – years and year-by-year aggregates – and we want to know the extent to which one predicts the other: how much is the passage of time reflected in a batsman's year-end aggregates? The most common way of estimating this is to calculate a value that is sometimes called the coefficient of determination, but is more commonly known as r². This figure estimates the amount of variation in one variable that is explained by the other; the higher the r², the closer the correlation. If a batsman's totals went up (or down) by exactly the same amount every year, then his r² would be 1: 100% of the variance in the observed aggregates is explained by the year-on-year trend. If, on the other hand, there were absolutely no evidence of a (linear) relationship between calendar year aggregates and time, then r² would be 0.

The next table shows this value for the set of batsmen with 10,000 FC runs, including all those with an r² of 0.5 or higher (indicating a reasonably clear correlation between time and runs). Note that, at this stage, we do not care whether the association is a positive or a negative one; for now, we are only seeking to identify the batsmen with the most consistent trends to their careers, regardless of whether they were getting better or worse.

Table 4: Batsmen with consistent trends in calendar-year aggregates, sorted by r²

Name CareerYrs Debut M I R Ave r² 1. WW Whysall 16 1910 371 601 21,592 38.76 0.939 2. A Flower 20 1986 223 372 16,379 54.06 0.797 3. JA Rudolph 13 1997 160 272 11,371 44.77 0.789 4. CJL Rogers 12 1998 149 263 12,865 52.30 0.781 5. M van Jaarsveld 16 1994 222 373 15,587 45.98 0.765 6. W Bates 11 1877 299 495 10,249 21.58 0.764 7. FA Lowson 10 1949 277 449 15,321 37.19 0.755 8. AG Prince 15 1995 166 265 10,204 44.75 0.739 9. EJ Smith 22 1904 496 814 16,997 22.39 0.712 10. G Dews 16 1946 376 642 16,803 28.53 0.704 11. TT Samaraweera 15 1995 207 285 11,233 48.00 0.674 12. JF Parker 15 1932 340 523 14,272 31.58 0.654 13. WE Bates 18 1907 406 684 15,964 24.41 0.652 14. WN Slack 12 1977 237 398 13,950 38.97 0.637 15. HFT Buse 17 1929 304 523 10,623 22.70 0.634 16. GE Tribe 14 1945 308 454 10,177 27.36 0.629 17. CS Elliott 14 1932 275 468 11,965 27.26 0.616 18. JC Balderstone 24 1961 390 619 19,034 34.11 0.605 19. AC Smith 18 1958 428 612 11,027 20.92 0.602 20. CC Inman 15 1956 255 422 13,113 34.51 0.595 21. C Charlesworth 20 1898 372 632 14,289 23.62 0.571 22. KS Duleepsinhji 9 1924 205 333 15,485 49.95 0.552 23. ME Trescothick 17 1993 254 438 16,645 40.11 0.549 24. TH Clark 13 1947 263 426 11,490 29.39 0.544 25. AJ Strauss 12 1998 181 321 13,090 42.92 0.543 26. MJ Di Venuto 18 1992 298 528 22,751 46.43 0.539 27. A Young 15 1911 311 539 13,159 25.45 0.539 28. H Morris 17 1981 314 544 19,785 40.30 0.532 29. JA Jameson 17 1960 361 611 18,941 33.35 0.528 30. MS Nichols 16 1924 483 756 17,823 26.56 0.525 31. GHG Doggart 14 1948 210 347 10,054 31.52 0.521 32. JL Hopwood 15 1923 400 575 15,548 29.90 0.512 33. HE Dollery 17 1933 436 717 24,414 37.50 0.508 34. N Pothas 17 1993 201 310 10,604 42.25 0.501 ... 96. KP Pietersen 12 1998 140 233 11,026 50.81 0.322 ... 169. GS Sobers 22 1953 383 609 28,314 54.87 0.225 ... 204. SJ Cook 24 1972 270 475 21,143 50.46 0.183 ... 253. DCS Compton 22 1936 515 839 38,942 51.85 0.146 ... 283. GA Hick 26 1983 526 871 41,112 52.24 0.126 ... 302. FE Woolley 29 1906 978 1,530 58,959 40.77 0.115 ... 307. WH Ponsford 14 1921 162 235 13,819 65.18 0.114 ... 361. SM Gavaskar 22 1966 348 563 25,834 51.36 0.087 ... 389. IVA Richards 22 1972 507 796 36,212 49.40 0.075 ... 395. MEK Hussey 16 1994 225 401 19,242 53.30 0.073 ... 451. WG Grace 44 1865 868 1,478 54,211 39.45 0.056 ... 495. GA Gooch 26 1973 580 990 44,846 48.85 0.043 ... 510. JB Hobbs 26 1905 832 1,325 61,760 50.66 0.038 ... 521. DG Bradman 20 1927 232 338 28,067 95.14 0.036 ... 528. IT Botham 20 1974 401 617 19,399 33.97 0.035 ... 571. H Sutcliffe 22 1919 754 1,098 50,670 52.02 0.026 ... 594. AR Morris 14 1940 162 250 12,614 53.68 0.022 ... 630. DJ Hussey 7 2003 132 203 10,048 55.21 0.015 ... 669. L Hutton 19 1934 513 814 40,140 55.29 0.009 ... 674. G Boycott 25 1962 609 1,014 48,426 56.77 0.009 ... 679. KS Ranjitsinhji 15 1893 307 500 24,692 56.37 0.008 ... 736. CK Nayudu 41 1916 207 344 11,825 35.94 0.004 ... 759. BC Lara 20 1988 261 440 22,156 51.89 0.002 ... 778. VT Trumper 19 1895 255 401 16,939 44.58 0.001 ... 791. WR Hammond 26 1920 634 1,005 50,551 56.04 0.001 ... 827. FMM Worrell 22 1942 207 326 15,025 54.24 0.000 figures correct at end of English FC season 2009; full list available here

As predicted, Trescothick is amongst those with the most obvious trends to their careers: an r² of 0.549 is suggestive of a fairly clear correlation between time and runs, with over half of the variance in Trescothick's year-end aggregates explained by his year-on-year improvement.

The player at the top of this list is Nottinghamshire's leading runscorer of the 1920s, William "Dodger" Whysall. Whysall's Wisden obituary notes that he "matured slowly as a cricketer", which rather underplays the way in which he very gradually but very assuredly developed from a fairly ordinary performer into his county's most reliable batsman. Below, his year-by-year aggregates are plotted, with a regression line indicating the trend (ordinary least squares linear regression).

Figure 3: WW Whysall – first-class runs per calendar year, with fitted regression line

If you're wondering why his career came to an end despite showing such an encouraging trend, the sad answer is that Whysall died in late 1930. Alongside his status as the most consistently improving batsman in the record-books, he is probably also the only player to die of complications of an injury sustained on the dancefloor.

The figure below shows scatterplots for the batsmen at nos. 2–5 on the list. It is no surprise to see several current players, here: obviously, to begin a career with a positive trend is a more easily achievable feat than it is to sustain a year-on-year improvement from the beginning to the end of one's time in the game.

Figure 4: First-class runs per calendar year, with fitted regression lines, for selected batsmen

Because all the plots have been standardised to the same scale, the extent of year-on-year improvement can be seen in the gradient of the regression lines (the steeper the line, the more dramatic the improvement). So, of the four players whose career trends are pictured, Chris Rogers's career to date has shown the most meteoric rise, whereas Andy Flower's improvement was more gradual and sustained. This immediately suggests another question: whose career has shown the greatest year-on-year increase (i.e. who has the steepest regression line)? Before asking this, it is necessary to limit the dataset to those players for whom we can be reasonably confident that there is some sort of trend there at all. A straightforward way to do this is to exclude all the observations for which the regression model did not estimate a significant gradient (conventionally, statisticians tend to set the threshold for significance at p≤0.05 – this is loosely equivalent to saying we'll accept 1 false-positive per 20 analyses we conduct, although the precise definition of a p-value is not quite as intuitive to non-statisticians). It turns out that 177 of our 838 batsmen meet this criterion.

Having assembled a set of players whose careers appear to follow some sort of trend, we can sort it according to our best estimate of the average number of runs by which each player's year-end aggregates went up (or down) each year (technically, this is the beta coefficient from the regression model). This value, marked YrOnYr in the table, provides the gradient of the slopes seen in the scatterplots. Note that negative values (indicating that the player's year-end aggregates showed a decreasing trend) are possible.

Table 5: Batsmen with the most consistently positive trends in calendar-year aggregates

Name CareerYrs Debut M I R Ave r² YrOnYr (95%CI) p 1. KS Duleepsinhji 9 1924 205 333 15,485 49.95 0.552 280.5 (54.6, 506.4) 0.022 2. CJL Rogers 12 1998 149 263 12,865 52.30 0.781 231.9 (145.5, 318.3) <0.001 3. WW Whysall 16 1910 371 601 21,592 38.76 0.939 190.8 (163.0, 218.7) <0.001 4. CF Walters 13 1923 245 427 12,145 30.75 0.492 157.8 (51.4, 264.2) 0.008 5. WN Slack 12 1977 237 398 13,950 38.97 0.637 155.7 (72.9, 238.5) 0.002 6. JA Rudolph 13 1997 160 272 11,371 44.77 0.789 129.9 (85.3, 174.4) <0.001 7. CC Inman 15 1956 255 422 13,113 34.51 0.595 124.0 (62.6, 185.3) 0.001 8. W Bates 11 1877 299 495 10,249 21.58 0.764 116.0 (67.3, 164.7) <0.001 9. M van Jaarsveld 16 1994 222 373 15,587 45.98 0.765 112.4 (76.6, 148.1) <0.001 10. CS Elliott 14 1932 275 468 11,965 27.26 0.616 111.7 (56.2, 167.2) 0.001 11. TH Clark 13 1947 263 426 11,490 29.39 0.544 110.7 (43.4, 178.0) 0.004 12. LB Fishlock 16 1931 417 699 25,376 39.34 0.306 106.8 (14.5, 199.1) 0.026 13. JF Parker 15 1932 340 523 14,272 31.58 0.654 105.2 (59.3, 151.1) <0.001 14. CL Smith 14 1978 269 466 18,028 44.40 0.362 100.4 (16.6, 184.2) 0.023 15. CAG Russell 19 1908 436 717 27,354 41.57 0.366 100.0 (32.7, 167.3) 0.006 16. C Lee 13 1952 271 472 12,129 26.60 0.415 99.5 (21.1, 177.9) 0.017 17. H Morris 17 1981 314 544 19,785 40.30 0.532 98.9 (47.8, 149.9) 0.001 18. JA Jameson 17 1960 361 611 18,941 33.35 0.528 97.2 (46.6, 147.7) 0.001 19. MJ Di Venuto 18 1992 298 528 22,751 46.43 0.539 97.1 (49.5, 144.8) 0.001 20. G Dews 16 1946 376 642 16,803 28.53 0.704 95.9 (60.2, 131.6) <0.001 ... 26. A Flower 20 1986 223 372 16,379 54.06 0.797 88.9 (66.6, 111.2) <0.001 ... 57. ME Trescothick 17 1993 254 438 16,645 40.11 0.549 65.6 (32.9, 98.4) 0.001 ... 74. GS Sobers 22 1953 383 609 28,314 54.87 0.225 58.8 (7.9, 109.6) 0.026 ... 85. SJ Cook 24 1972 270 475 21,143 50.46 0.183 55.2 (3.6, 106.8) 0.037 ... 149. AP Lucas 29 1874 256 435 10,263 26.38 0.187 -13.6 (-24.8, -2.4) 0.019 150. KJ Key 26 1882 367 567 13,008 26.23 0.158 -21.2 (-41.9, -0.6) 0.044 151. FH Gillingham 22 1903 210 352 10,050 30.64 0.353 -25.6 (-41.8, -9.5) 0.004 152. FR Brown 25 1930 355 536 13,325 27.36 0.174 -25.7 (-49.8, -1.5) 0.038 153. PA Perrin 29 1896 537 918 29,709 35.92 0.297 -26.0 (-41.9, -10.2) 0.002 154. DB Close 35 1949 785 1,225 34,994 33.23 0.188 -26.2 (-45.6, -6.9) 0.009 155. Hanif Mohammad 25 1951 238 370 17,059 52.33 0.178 -29.0 (-55.9, -2.1) 0.036 156. KWR Fletcher 27 1962 730 1,167 37,665 37.78 0.192 -30.1 (-55.6, -4.7) 0.022 157. CP McGahey 24 1894 437 751 20,723 30.21 0.178 -30.8 (-60.2, -1.5) 0.040 158. AF Wensley 20 1922 399 595 10,875 20.48 0.212 -33.0 (-64.5, -1.5) 0.041 159. NH Fairbrother 20 1983 366 580 20,612 41.22 0.327 -33.7 (-57.6, -9.7) 0.008 160. JF Steele 17 1970 379 605 15,054 28.95 0.241 -41.2 (-81.5, -0.9) 0.046 161. AA Baig 20 1955 235 391 12,367 34.07 0.235 -42.7 (-80.9, -4.5) 0.030 162. LH Tennyson 22 1913 477 759 16,828 23.34 0.397 -44.6 (-70.2, -19.0) 0.002 163. JP Stephenson 21 1985 303 512 14,773 32.40 0.258 -45.5 (-82.6, -8.4) 0.019 164. APF Chapman 20 1920 392 554 16,309 31.98 0.380 -48.5 (-79.1, -17.8) 0.004 165. AC Smith 18 1958 428 612 11,027 20.92 0.602 -50.0 (-71.5, -28.4) <0.001 166. FA Tarrant 24 1899 329 541 17,952 36.41 0.203 -50.6 (-95.0, -6.3) 0.027 167. KJ Hughes 17 1975 216 368 12,711 36.53 0.236 -52.3 (-104.1, -0.5) 0.048 168. PJ Watts 18 1959 375 607 14,449 27.95 0.356 -53.7 (-91.9, -15.4) 0.009 169. TG Evans 21 1939 465 753 14,882 21.23 0.388 -55.4 (-88.8, -22.0) 0.003 170. JR Mason 22 1893 339 557 17,337 33.28 0.427 -57.7 (-88.9, -26.6) 0.001 171. ERT Holmes 22 1924 301 465 13,598 32.85 0.287 -57.8 (-100.2, -15.3) 0.010 172. RR Relf 18 1905 302 529 14,522 28.42 0.304 -65.3 (-117.7, -12.9) 0.018 173. RES Wyatt 30 1923 737 1,141 39,405 40.05 0.378 -73.2 (-109.5, -36.8) <0.001 174. LJ Lenham 14 1956 300 539 12,796 26.17 0.433 -83.2 (-143.0, -23.3) 0.010 175. AJ Moles 12 1986 230 416 15,305 40.70 0.474 -89.7 (-156.2, -23.1) 0.013 176. GHG Doggart 14 1948 210 347 10,054 31.52 0.521 -101.7 (-163.0, -40.4) 0.004 177. FA Lowson 10 1949 277 449 15,321 37.19 0.755 -177.4 (-259.8, -95.0) 0.001 figures correct at end of English FC season 2009; full list available here

Several of the names at the head of this list are familiar from our previous analysis, but our leader is someone to whom we have not paid particular attention, as yet. Kumar Shri Duleepsinhji's career only spanned nine seasons, but that was long enough for him to rack up over 15,000 FC runs (including 50 centuries) at an average a hair under 50. Like his uncle, Ranji, he was also extremely successful in his few test matches in England colours. One reason why his career shows such a dramatic upward trend is that he was cut off just as he was reaching his peak. The innings that turned out to be his last in FC cricket was the 90 he made - we can assume, with his renowned elegance - at Taunton in 1932. Afterwards, he collapsed, and was forced to withdraw from England's forthcoming tour to Australia (which would, of course, be remembered for other reasons entirely). Ultimately, he was compelled to give the game up completely, on his doctors' orders. He was only 27 at the time.

At the other end of the spectrum, we find Frank Lowson, best known as Len Hutton's opening partner at Yorkshire in the last part of the great man's career. Lowson was in his thirties by the time he made his FC debut in 1949. His first three seasons were all extremely successful, culminating in selection for England in 1951, followed by a tour to the subcontinent that winter. Unhappily, he did not distinguish himself in his test outings, and his career began a slide that was never arrested until he bowed out, aged 43, in 1958. To be sure, amassing 10,000 FC runs is an achievement of which any batsman can be proud; nevertheless, amongst those to achieve this landmark, none had so steep a decline as Lowson, who scored an average of 177 fewer runs each year of his career.

Figure 5: First-class runs per calendar year, with fitted regression lines, for KS Duleepsinhji and FA Lowson

The apparent declines of two notable England captains – Brian Close and Bob Wyatt – need to be taken with a pinch of a salt. For almost a decade after his competitive career came to an end, Close had one FC outing per year, leading his own XI at the Scarborough festival. Similarly, Wyatt's career had a six-year coda, featuring very sporadic appearances for the MCC and – at a time when some of their fixtures attracted FC status – the Free Foresters. For each player, if analysis is confined to the competitive portion of their careers, the regression model no longer identifies a significant negative trend.

To conclude, we should return to the player who sparked off this line of enquiry in the first place. According to this analysis, Marcus Trescothick is what an American sportscaster might call the 57th improvingest batsman in FC history. He has tended to total 65.6 additional runs per year he has been in the game, which means that, if the linear trend is continued, we would expect him to amass 1,569 FC runs in 2010. Of course, Somerset's fans would be delighted if this trend – or something like it – extends for many more years to come.

Figure 6: ME Trescothick – first-class runs per calendar year, with fitted regression line

Introduction

Hello. After several years of doing sporadic work on cricket stats – occasionally posting bits and pieces here and there on various messageboards – I've decided it's time I started to collect my output in a slightly more structured way. This blog is an attempt to do that.

I think the posts are likely to fall into three broad categories:

The first type will use conventional cricket stats to answer simple questions (please feel free to ask me simple questions). These are the kinds of query you might normally put to your favourite online stats engine (e.g. Cricinfo's statsguru). Obviously, if one of those resources can give you your answer, then there's no point asking me, but there are questions that might be a bit more stubborn – perhaps because they don't quite fit into the parameters provided by those online engines, or perhaps because they relate to another kind of cricket (above all, I'm not aware of any online source that enables you to query FC records). I'm probably unlikely to come up with many of these posts on my own initiative, so how often they appear will depend on how frequently I get asked these kinds of questions. The kind of thing I have in mind is very much like the BBC used to do with Ask Bearders or Cricinfo still do with Ask Steven. Heck, let's call it Ask Gabe.

The second type of post will be one that breaks out some slightly more advanced stats to try and get beyond what normal cricket statistics can tell you. With very few exceptions (some are listed in the sidebar), cricket analysts have been dreadful at mobilising anything like the full range of tools at their disposal. Statisticians tend to make a distinction between descriptive statistics (those that simply present empirical data) and inferential statistics (those that seek to make sense of it). Cricket anaylsis is dominated by an awful lot of the former, and tends not to feature much of the latter. It is especially embarrassing to compare attempts at quantitative analysis of our game with those that innovative baseball statisticians have been producing for decades. I'm interested in applying a similar approach to cricket. I'm going to tag such posts as Going Deep (or, at least, I am unless I can think of something a bit better to call them).

The third type of post will take the second approach to its logical conclusion, to attempt to specify and answer some Big Questions. The kind of thing I have in mind is the sort of analysis that David Barry's really good at – an attempt to characterise and make transparent some of cricket's innermost dynamics. Tell the honest, I'm not absolutely certain I'll ever manage to get any of these together, 'cause they require... y'know... thinking really hard, and that. I do have a couple of questions at the back of my mind that may make it to the surface one of these days.

Some warnings:

The resources I've built for myself so far relate only to test, ODI, and first-class cricket (from 1850). I haven't, as yet, built anything to look at domestic limited-overs ("List A") or Twenty20 cricket, though it's on my mind to do so when I get a quiet moment (perhaps over Christmas).
Similarly, I don't haven't have access to any ball-by-ball stats. Much as we'd all love to get our hands on that kind of data, it's pretty hard to come by (and, of course, anything you might be able to derive only relates to the last decade or so).
Australia v. ICC World XI, 14–17 October 2005, wasn't a test match, and anyone who says otherwise is a liar. Similarly, these games ain't proper ODIs.
I'd love to promise frequent updates, but I'm known to be unreliable in such regards, so stuff will go up as and when I can find the time. Unless anyone wants to pay me a salary to do this shit.

Deep, Backward, and Square

30 November 2009

Going downhill quickly

Nervous noughties

Anni mirabiles

Introduction

Email

About Me

Blog Archive

Tags

Links

Good stats blogs

Fora

Current affairs

Somerset newsfeed

Cricinfo newsfeed

ECB domestic newsfeed

ECB general newsfeed

Scores (Cricinfo)

Scores (ECB)

Scholarship

David Foot

David Frith

Gideon Haigh

Vic Marks

Rob Steen

Deep, Backward, and Square

30 November 2009

Going downhill quickly

Nervous noughties

Anni mirabiles

Introduction

Email

Subscribe To

About Me

Blog Archive

Tags

Links

Good stats blogs

Fora

Current affairs

Somerset newsfeed

Cricinfo newsfeed

ECB domestic newsfeed

ECB general newsfeed

Scores (Cricinfo)

Scores (ECB)

Scholarship

David Foot

David Frith

Gideon Haigh

Vic Marks

Rob Steen