Chapter 8 Correctness and Performance

The effort in design and implementation is aimed at producing a working and maintainable program with minimal effort. But how do we check that a program is actually working?

We typically define what a correct program should do only in an informal way, stating constraints that should be satisfied, data formats that should be accepted, etc. Proving correctness would require a much more formal approach, where we first agree on a specification in a formal language and then prove that the implementation satisfies that specification. But even that does not protect us from a misunderstanding in defining the spec itself, which may not reflect the wishes of the user.

Lacking this formal approach, the best hope of checking correctness of a program lies in exercising it with different types of tests.

Related to the question of correctness is the issue of performance. The program should not only produce correct results, but must produce them within a limited time. The correct result which is available after five years of computation is (most of the time) as useless as the wrong result obtained in five seconds.

8.1 Testing

Testing is often one of the activities that gets dropped when delivery time points get closer. This is a sad mistake, as any problem that is not recognized immediately will take much more effort to find and correct later on. In an ideal world, tests for each component are defined before implementation and cover each level of the design. In the real world we must find the right compromise between spending time on defining tests in the beginning and time spent on developing the application core.

A test may have different objectives:

The weakest form is a test that is run to check if a program works with the test data, i.e. produces some answer without generating an error. We have suggested this form of test above to check an (incomplete) early implementation.
A more complex test can be used to exercise all (or nearly all) of the program code. The test data must contain enough variation to force all alternative parts in the program. This can be achieved either by accumulating large number of tests or by designing particular tests to handle special cases.
Another test form is regression testing. This form of testing is used to evaluate results of a program run after each modification of the code. Results of previous runs are used to check the current results. This does not check for correctness, but rather for the same answer before and after a change.
Correctness of a program can only be checked if we have obtained the expected results of a test in an independent way, either by hand or by a trusted program, or simply by re-using benchmark sets from a third party.
We may also be using some tests to do performance testing, i.e. to check that the program finds the solution within given limits on execution time and/or system resources. Clearly, this only makes sense if we also check the result for correctness.

It is important to realize the limitations of the tests that we perform. If we have never checked that the solutions produced by a regression test are correct, then they will be most likely not correct. We only know that they are still the same solutions that the program has found before.

Unfortunately, testing of combinatorial problem solvers is even more complex than testing of “normal” programs. Often, a given problem has more than one solution, and it is not clear in which order the answers will be produced. Providing one solution may not be enough, but there may be millions of solutions in total.

8.2 Profiling

The line profiling¹ tool that we already mentioned above can be used to check the coverage of the program with some query. We can easily find lines that are not executed at all, indicating the need for another test case. If we cannot construct a test which executes a program segment, then this indicates some problem.

We can use the same profile to find program parts which are executed very often and this can provide hints for optimization. Normally it is better not just to concentrate on those parts that are called very often, but on those which are calling these predicates.

Figure 8.1: Profiling Example

Figure 8.1 shows the output of the profiling tool. Each line is marked with the number of times it is executed (the first number in green) and the number of times we backtrack over this line (the second number in red). In this example shown there are two parts of if-then-else predicates which have not been selected by the test example.

8.3 Reviewing

A standard technique to check for consistency in the development is a reviewing process. Each module of an application goes through a review process, where persons not connected with the development check the deliverables (source code and documentation) for completeness and consistency. This review process serves multiple purposes:

It forces the developer to finish a version of the program with a certain polish.
It helps to find inconsistencies or missing explanations in the documentation.
It encourages “best practice” in the ECLiPSe application development, bringing together experts from different application teams.
It helps spread knowledge about applications and their sub-systems, so that re-use opportunities are recognized earlier.

On the other hand, a number of problems are normally not recognized by a review:

The review checks one version of an application at a given time point. It does not guarantee that changes and modifications after the review are performed to the same standard.
A successful code review does not imply that the application code is correct. Reviewers might sometimes note suspect code, but a review cannot replace proper testing.
If nobody actually checks the code, then the whole process becomes useless overhead. This means that resources must be properly allocated to the review, it is not a task that reviewers can undertake in their spare time.
Comments and change requests in the review must be recorded and acted on. A formal review comment form may be used, alternatively we might work with detailed and complete minutes.

8.4 Issues to check for

8.4.1 Unwanted choicepoints

It is important to remove all unwanted choicepoints in an application, since they are a common source of errors. In addition, a choicepoint requires a significant amount of memory, so that leaving unwanted choicepoints is also a performance problem.

For most predicates, in particular those following one of the programming concepts in chapter 5, it is quite simple to avoid unwanted choicepoints. Other predicates may need more effort. We can use the ECLiPSe debugger to detect if there are any unwanted choicepoints. In the trace output for the EXIT port ECLiPSe will print a * if the predicate leaves a choicepoint. We can easily check a complete query by skipping over its execution and checking the exit port. If a choicepoint is indicated, we can re-run the query to locate the missed choicepoint.

8.4.2 Open streams

A common error is to open some file without closing it before leaving. This typically happens if it is opened in one part of the program and closed in another part. Normally everything works fine, but under some exceptional circumstances the second part is not executed, leaving the file open. This can consume system resources quite quickly, leading to follow-on problems. It is a good idea to verify that every call to open/3 is followed by a close/1, even if some other part of the program unexpectedly fails.

8.4.3 Modified global state

We have already stated that changing global state should be used as a last resort, not as a common practice. But if the program modifies dynamic predicates, creates global variables or uses record, then we must be very careful to restore the state properly, i.e. remove dynamic predicates after use, reset global variables etc.

8.4.4 Delayed goals

Normally, a solver should not leave delayed goals after it is finished. For some solvers, we can enforce this by instantiating all solver variables in a solution, others require more complex actions. If a query returns with delayed goals, this should be seen as an error message that needs to be investigated.

1: The profiling facility is now available as one of the ECLiPSe libraries in the library coverage. The actual output looks slightly different.