Deep, Backward, and Square: Going downhill quickly

This post started off as quite a mundane analysis of workaday stats, but I think that analysis led to a reasonably interesting conclusion.

The catalyst was the recent match between India and Sri Lanka in Kanpur and, in particular, India's innings. They amassed 642, which proved quite enough to see Sri Lanka off by an innings and plenty; nevertheless, that impressive total was a bit of a comedown, given they had been 613/4 a dozen or so overs before they were bowled out. That's quite a collapse. A contributor to the Cricinfo text commentary summed it up rather neatly:

I have never seen such an unbalanced score card. When we look at the runs it is so lopsided on the the top that I feel my Cricinfo page will roll over by 180°.

I was party to some related discussions on both TMSB Exiles and Grockles. On the latter, Frome Exile came up with an interesting way of looking at the collapse: he worked out that there had only been one instance in test history of a larger absolute difference between the contributions of the first five partnerships and the last five. That was England's monstrous first innings of 849ao v. West Indies in Kingston, 1930, in which the fifth wicket fell at 720, meaning the first five wickets added 591 more than the last five (11 more than the corresponding discrepancy in India's Kanpur innings). In case anyone's interested, I've put a full list of test innings sorted in this way here.

A similar – though, I think, ever-so-slightly more informative – way of looking at the question is to concentrate on the relative difference between the amount of runs scored for the first five wickets and the amount scored for the second five – in other words, the proportion of the the final, all-out total that was contributed by the first five partnerships. The table below shows score at the fall of the fifth wicket, final total, and the relationship between the two for all all-out innings in test history (obviously, it doesn't make sense to ask the same question of innings that were declared or otherwise prematurely curtailed). The match at the top of the list is a famous, though statistically anomalous one, which only technically counts as an all-out innings.

The Sabina Park massacre aside, the most lopsided innings is one in which Australia's first 5 partnerships scored 61 times as many runs as the rest. In fact, the collapse was more dramatic than even that stat suggests, because the fifth-wicket partnership also realised no runs, so Australia – chasing 382 to win the match – fell from 305/3 to 310ao. The prime architect of Australia's demise was Sarfraz Nawaz, who took 7/1 in a spell of 33 balls, to finish with career-best figures of 9/86. Some sources refer to Sarfraz's feat as the first great spell of reverse-swing bowling in tests; others note that he took the new ball just before the rampage began, which would make such an interpretation unlikely. One way or another, it was a sensationally effective burst.

India's Kanpur innings is 24th on the list, one of 30 test innings in which the last five wickets contributing less than five percent of the all-out total. At the other end of the table, there are seven instances of the first five partnerships providing 10% or less of the final score. That entry at the bottom isn't, as I immediately imagined, Trueman's debut, when India were famously reduced to 0/4; it's the fourth match of that series, in which they managed a whole 6 runs before losing their 4th wicket, but lost their 5th the next ball. On this occasion, it was Trueman's opening partner, Alec Bedser, who was the main destroyer; Norman Preston's Wisden write-up describes the carnage – and the subsequent revival led by Indian skipper Vijay Hazare – in detail.

Here's where the story begins to get a bit interesting. Having assembled all these stats, I casually had a glance at the typical relationship between these two variables (the amount of runs scored for the first five wickets, and the amount of runs scored in the remainder of the innings). The results were nothing like what I was expecting.

It turns out that score at the fall of the 5th wicket is a terrible predictor of the amount the last five wickets will contribute. I would have imagined that instances in which the first half of an innings was high-scoring would – noteworthy collapses aside – have been those in which the second half also went well for the batting team. Similarly, you'd guess that, if the first 5 fall over cheaply, the tail are unlikely to contribute much. It turns out that you can't make those sorts of assumption at all.

The graph below shows every all-out innings in test history, with runs for the first five wickets on the x-axis plotted against runs for the last 5 on the y. Before generating this plot, I expected to see a fairly noticeable positive correlation, with the datapoints lining up from the origin of the graph in a positive trend up and right. No such thing. It's all scatter and no plot, and r² (which quantifies the strength of association between two variables – see this earlier post) is a dismal 0.0041.

Figure 1: All-out test innings – runs for the first five wickets and runs for the last five

FallOf5th

If you stick a linear regression line through the dataset (as I have, above), you get y = 95.5 + 0.0414x, which means that, at the fall of the fifth wicket, our best guess of what the all-out total will be is

runs scored so far + (0.0414 × runs scored so far) + 95.5

... but what this analysis shows very clearly is that you'd be an idiot to head off down the bookie's armed with that equation, because our best guess is dreadful. In fact, if it tells us anything, it suggests that dramatic collapses and dramatic tail-wagging are much more likely than you might imagine (maybe there's some value to be had there!) The lesson is clear: what happens in the first half of an innings tells us nothing about what we can expect in the second half. For example, on average throughout test history, whenever the fifth wicket has fallen at a score between 50 and 99, the remaining batsmen have added a further 95.7; whenever the first five partnerships have realised between 400 and 449, the last five wickets have typically amassed... 95.3.

Further investigation shows that this finding is not confined to the fifth wicket. At any stage of a test innings, what has happened up until the fall of a given wicket is a useless predictor of what's going to happen afterwards. Figure 2 shows analogous graphs to that shown above for all other wickets. In every case, there's a whole lot of noise and no noticeable signal. The highest r² for any of these analyses is that calculated for the ninth wicket – just 0.0053 (that is: only half a percent of the variance in the tenth-wicket partnership scores is explained by variability in totals at the fall of the ninth).

Figure 2: All-out test innings – correlations between scores before and after the loss of each wicket

FoW

I don't quite know what to make of these findings. At extreme ends of the spectrum, one can just about understand how two halves of an innings might compensate for each other. As Wickham observed on Grockles,

Part of the explanation may be that tail-enders are more likely to dig in if the top-order batsmen have scored relatively few runs and that this tendency helps to counteract the impact of wickets which are more difficult to bat on.

I am sure this is a useful observation. It seems to me that the reverse may be true, as well: if the top order have scored heavily, maybe the tail play with abandon in search of the quick runs that the match situation is likely to demand, and thereby score less heavily than they might have done. I don't know that these explanations help us with the majority of test innings, however. After all, most times, the tail are neither digging in for grim death nor swinging with carefree abandon.

The alternative explanation is that we massively overinterpret those factors we identify as significant in shaping an innings. We watch five batsmen fall quickly, and we conclude that the wicket is unreliable, or the bowlers irresistible; when the runs have come easily for the top order, we imagine that the conditions are favourable, or the attack toothless. But maybe we've got a bad appreciation of the random and – pace Louis MacNeice – cricket is crazier and more of it than we think. This is far from the first time that, having had a good dig into the evidence, I've reached the conclusion that the game is far more susceptible to dumb luck than we ever acknowledge.

2 comments:

Anonymous1 December 2009 at 08:08
"A similar – though, I think, ever-so-slightly more informative – way of looking at the question"

Well, yiou would say that, wouldn't you! ;-)
Frome Exile.

But seriously, keep up the very impressive work!
Declaration Game25 August 2015 at 15:10
This is excellent. I've been pondering it all day and will probably continue to do so for days to come. Amongst the many things that interest me about it is that if you polled Test cricketers about whether they thought the runs scored by the top 5 batsmen would be a good predictor of the runs scored by the rest of the line-up, I'm sure they would say 'yes'. In other words, they are ignorant of a truth about the game they play.

Are you writing anywhere else now?

Chris (Declaration Game cricket blog) @chrisps01

Deep, Backward, and Square

30 November 2009

Going downhill quickly

2 comments:

Email

About Me

Blog Archive

Tags

Links

Good stats blogs

Fora

Current affairs

Somerset newsfeed

Cricinfo newsfeed

ECB domestic newsfeed

ECB general newsfeed

Scores (Cricinfo)

Scores (ECB)

Scholarship

David Foot

David Frith

Gideon Haigh

Vic Marks

Rob Steen

Deep, Backward, and Square

30 November 2009

Going downhill quickly

2 comments:

Email

Subscribe To

About Me

Blog Archive

Tags

Links

Good stats blogs

Fora

Current affairs

Somerset newsfeed

Cricinfo newsfeed

ECB domestic newsfeed

ECB general newsfeed

Scores (Cricinfo)

Scores (ECB)

Scholarship

David Foot

David Frith

Gideon Haigh

Vic Marks

Rob Steen