One method of measuring your performance at solving a set of chess problems is to measure the average time spent solving each problem and the percentage score achieved, but there are difficulties with this method:
* If you solve the problems faster, but your score goes down, you do not know whether you are doing better or worse.
* Similarly, if you solve the problems more slowly, but your score goes up, you do not know whether you are doing better or worse.
* You are rewarded for giving up quickly whenever you encounter a difficult problem.
(You might say that you are being rewarded for good time management, but the time management skill here is different from that in a game, and your skill at time management really ought to be measured separately. What we really want to measure here is just your ability to get the right answer quickly.)
These difficulties can be avoided by measuring the solution times for individual problems. The smallest time limit applied to each problem individually that would still have allowed you to score 50% (the median solution time) does not suffer from the difficulties identified above, provided that you are always able to get at least 50% right. The largest time limit applied to each problem individually that would still have allowed you to solve at least 85% would also work, provided that you scored at least 85%. Similarly for any other percentage. The number of problems that can be solved within a fixed time limit applied to each problem individually also does not suffer from the difficulties identified above. The cumulative distribution of the solution times gives us the percentage that were solved with any such time limit that we may choose, assuming that failures are counted as infinitely long solution times. Here is the cumulative distribution for my first pass through batches E+F in the Bain Experiment:
Measuring performance at solving a set of chess problems is all very well, but it does not necessarily have a direct relationship with your tactical ability in a chess game. The problem here is that the tactics in the chess problems may not necessarily be representative of what you are likely to meet in practice. You might do very well at solving the chess problems, but this skill might turn out to be of little practical value. Alternatively, you might do badly at solving the chess problems, but this might not matter much in practice. The solution to this problem is to construct a set of problems that is statistically representative of what you are likely meet in practice.
There are computer programs that will automatically extract tactics problems from the games in a chess database. The problems on the Chess Tempo tactics server were constructed in this way, see: http://chesstempo.com/faq.html#tactics. In principle, we could use one of these programs to construct random samples of chess tactics as they occur in practice, and use these to measure our tactical ability. (We need a collection of tactics exams to measure our progress, because we can only use each exam once, for this purpose.) The difficulty here is the same one that opinion pollsters face: you need a very large sample to achieve an acceptable level of accuracy. The solution to this difficulty is statistical profiling. We can reduce the sample size needed by ensuring that each sample of tactical problems has the same statistical profile as the whole population of chess tactics as they occur in practice.
We could, in principle, use problem sets that have the same statistical profile as in the whole population of chess tactics for training - but the sets would contain a very high proportion of trivial tactics, and tactics so difficult that they can only be found by a computer. It makes more sense to construct samples in which the level of difficulty is restricted to a narrow band, so that we can chose problem sets that are at an appropriate level of difficulty for us. The level of difficulty could be assessed by a computer program, which could tell us how many half moves deep the solution is - or better, perhaps, it could tell us the total number of half moves in all the variations of the solution. Alternatively, the difficulty for human players could be assessed by carrying out tests.
If we are going to statistically profile our problem sets, we also need to classify the problems by type, e.g. by primary and secondary motif, or something more sophisticated. We could, in principle, program a computer to carry out this task. Alternatively, we could use human assessment. We need to ensure that each set of sample problems has the same distribution of problem types as the whole population, and the same distribution of difficulty within each problem type. This not only ensures that each set of problems is representative, but also avoids the practical difficulty that some players might be better at some types of problems than others.
If we measure our performance at solving these problem sets, we are measuring our absolute performance at finding chess tactics, as it occurs in practice, rather than our relative performance against the competition. Clearly, we would also like to know how well the competition does for each problem set as a whole - and for each problem type / level of difficulty within each set - so that we can our identify relative strengths and weaknesses. We can relate our scores (e.g. median solution time) to those of other players by plotting them on a scatter graph against their ratings. A typical scatter graph might look like this:
Clearly, there is not going to be a one to one relationship between tactics performance and rating (even for players with reliable real world ratings), because some players will be better at tactics, and others at other aspects of the game.
The method of constructing the problem sets described above is essentially the same the one that I used to construct the batches of problems in the Bain Experiment. Bain is one of many problem books in which each chapter contains problems of a different type, and the problems within each chapter are sorted into ascending order of difficulty. If we want to divide the problems into two representative sets, which both have the same distribution of difficulty for each type of problem, we can take the first set to be the odd numbered problems, and the second set to be the even numbered problems. Alternatively, if we want six sets, and there are six diagrams per page, we can take the first set to be first diagram on each page, the second set to be the second diagram on each page, and so on.
If we carry out this process with a problem book, the statistical distribution of problems will not necessarily reflect that of tactics as they occur in real games. As we saw in the Bain Experiment, this makes it difficult to relate an improvement at solving those problems to an improvement at finding tactics in real games. However, this would be less of a concern with a larger problem set, and it is possible that problems that have been selected for their instructional value are more effective for training purposes than problems that have been randomly extracted from games.
The approach outlined here is different from that typically adopted by tactical training software, which usually gives tactical ratings to its users based on their performance at solving problems. The tactical ratings assigned by the online tactical servers usually use the Glicko rating system, with the problems given ratings and treated as opponents. Solving a problem in a timely fashion is counted as a win for the user, and a failure or a slow success is counted as a loss. For a correct solution, Chess Tactics Server assigns the user a result between 0 and 1, according to the time the spent solving the problem:
(The Chess Tactics Server website says that this graph was chosen to make their rating system work, and that the short time limits discourage cheating, which affects the rating of the problems, see: http://chess.emrald.net/time.php.)
With tactical training software, you are invariably allowed to solve the problems repeatedly - which will improve your performance at solving those problems - but this improvement will not be fully reflected in your ability to solve fresh problems. Consequently, you are given a false impression of progress. (A worse problem is that you often do not get the opportunity to repeat the same problems to your chosen schedule.) I did a Google search and found forum posts that said that one of the online tactical servers gives average players International Master ratings, whereas another gives International Masters the ratings of average players!
See my later articles Rating Points Revisited and Rethinking Problem Server Ratings for further discussion.