In my previous article, How Many Rating Points Is That?, I introduced a method for estimating my tactical rating point improvement from my improvement in solution times. After applying this method to the results of my tactics training experiments, it has become clear that the method can be improved upon.
In my earlier article, I used a scoring graph that closely followed that used by Chess Tactics Server (CTS). With CTS (and other tactical servers), the problems are given ratings and treated as opponents. Solving a problem quickly counts as a win for the user, and a failure or a slow success counts as a loss. For a correct solution, the scoring graph provides a score between 0 and 1, depending on the time spent solving the problem:
I found that I got similar results when, in place of this scoring graph, I simply scored 1 whenever I solved the problem in under 5 seconds and 0 otherwise. Superficially, it appears that just counting the number of solution times that fall within a time limit should be less accurate than making use of the precise values of all those solution times. However, in practice, the standard deviations given by the simple time limit method were often smaller (in relation to the rating improvement) than those obtained using the scoring graph. The main problem with the CTS method (and those used by other tactics servers) is that the resulting score does not relate directly to what happens in a real game. The score given by simple time limit method, on the other hand, does have a direct relationship with what happens in a real game.
The score given by the simple time limit method estimates the probability that you will find the tactic within the time available. If there is a single win or lose tactic per game at the level of the tactics problem, this probability is the same as your probability of winning the game (provided that the time limit matches the time available in a game). In practice, it is more likely that if you fail to spot a tactic, you will lose (or fail to gain) half a point. If there is only one such tactical chance per game, the score given by the time limit method will, in this case, over estimate your game score. However, if there are two tactical chances per game (attacking or defensive), and spotting each tactic in time earns you half a point, the time limit method gives a realistic estimate of the probability of winning the game.
(Suppose that there is a probability p that you will spot a tactic to earn half a point. There is then a probability 1 - p that you will not spot the tactic. Suppose also that there is the same probability p that you will spot a second tactic to earn another half point. The probability that you will earn two half points is p ^ 2, the probability that you will earn one half point is 2 * p * (1 - p), and the probability that you will earn no points is (1 - p) ^ 2. On average, you will gain p ^ 2 + p * (1 - p) = p points.)
If there is more than one tactical chance per player per each game, the time limit method underestimates the rating benefit of spotting those chances. The number of tactical chances per game will clearly depend on how sharp the positions are. You can get a feel for the numbers here by analysing your own games on a computer, or simply by playing against a computer. Taking the score given by the time limit method as the average number of points that you can expect to win tactically per game (at the level of tactical difficulty of the problems concerned) is likely to be conservative for lower rated players.
Previously, I converted my scores for solving chess problems into rating points using the English Chess Federation (ECF) method. This was adequate when the scores were near to 0.5, but the Elo method gives good results over a wider range:
For this method, your expected score s is given by:
s = 1 / (1 + 10 ^ -(d/400))
Where d is your rating minus that of that of your opponent (or the problem set here).
(The ECF method approximates this curve with a straight line from the bottom left hand corner (-400,0) to the top right hand corner (400,1).) Solving the Elo equation for d gives:
d = -400log(1/s - 1)
In this context, the score s is taken to be the fraction of the problems that you were able to solve within the time limit. To use this result, we can:
(1). Time ourselves solving a series of equally difficult problem batches.
(2). Calculate the values of s for each batch.
(3). Calculate the values of d for each batch.
(4). Plot the values of d on a graph.
(5). Fit a straight line to the graph.
(N.B. I assume that we time ourselves on our first pass through each problem batch, that there are no duplicate problems, and that the batches are all representative of the tactics that we will meet in real games.) Here is the graph for the Ivashchenko 1b Experiment, with a time limit of 55 seconds:
The line extends from first problem of batch A to the last problem of batch D, and the red dots are at the mid point of each problem batch. The graph suggests that I improved by about 100 Elo points, but the standard deviation is also about 100 Elo points (due to the large scatter), so we cannot draw any firm conclusions here.
(N.B. In my experiments, I stop the clock as soon as I believe that I have found the solution. This protocol enables me to estimate the number of problems that I can solve at different time limits; but I would get a higher score if I continued checking until the time limit concerned expired. However, the resulting underestimation of my performance probably is not significant, given all the other uncertainties.)
I believe that this method is an improvement on my previous one, and on those used by the online tactical servers. However, it is clear from the discussion above that all these methods have serious limitations. The only really sound approach here is to test a large number of accurately rated players at solving the problem set, as discussed in my earlier article: Tactics Performance Measurement.