KtJet is hosted by Hepforge, IPPP Durham

More Profiling Studies

A large number of studies were carried out to find the best optimisation and setup for KtJet.

Initial studies showed that KtJet in double precision mode gave identical outputs to the Fortram ktclus. Furthermore, KtJet in single precision mode gave a large increase in efficency and speed of running with no significant change to the output (matching all the Fortran ktclus output jets to atleast 5 decimal places.)

Optimised compilation of KtJet is vital. The nature of Kt clustering means that the time taken to run goes as a funtion of n**3 to the number of input particles (n). With no optimisation KtJet can appear like a slow dinosaur in comparision to the Fortran (see figure 1.0). However, as soon as the optimisation flags are switched on in the compiler the KtJet executable out performs the fortran for cpu time used to reconstruct jets. Even the Double precision version of KtJet is significantly faster than the Fortran ktclus. (see figures 1.1) It can be seen that for gcc-2.96 optimisation flags -O1 (figure 1.1), -O2 (figure 1.2) and -O3 (figure 1.3) all give results where the KtJet is faster than Fortran ktclus.

Figure 1.0: Average time to cluster jets in event versus number of input particles. No optimisation flags used in compilation. (Black - Fortran, Blue - KtJet Double precision, Red - KtJet Single precision.)

Figure 1.1: Average time to cluster jets in event versus number of input particles.
-O1 gcc optimisation flag used in compilation.
(Black - Fortran, Blue - KtJet Double precision, Red - KtJet Single precision.)

Figure 1.2: Average time to cluster jets in event versus number of input particles.
-O2 gcc optimisation flag used in compilation.
(Black - Fortran, Blue - KtJet Double precision, Red - KtJet Single precision.)

Figure 1.3: Average time to cluster jets in event versus number of input particles.
-O3 gcc optimisation flag used in compilation.
(Black - Fortran, Blue - KtJet Double precision, Red - KtJet Single precision.)

The question as to which optimisation is best to use is dependent on the compiler and platform used and also the clustering flags set. Studies using gcc-2.96 and clustering PP events are shown in tables 1.0, 1.1 and 1.2 for optimisation -O1, -O2 and -O3 respectively. It can be seen that for the fortran it the most efficient optimisation flag is very dependent on the clustering scheme chosen, however, for KtJet there is no obvious variance and -O1 seem to give the best performance in all cases (-O2 gives smallest executable). Studies using 100 PP events each run over 10 times to get the average call time per event, using DeltaR distance scheme and E recombination scheme, were carried out for each optimisation flag. This was done for various numbers of input particles. The results for the Fortran ktclus can be seen in figure 2.0. Although optimisation gives some improvement in performance the effects are fairly small. The Results for KtJet (figures 2.1 and 2.2) on the other hand show both the importance of optimisation on performance and also getting the right optimisation flag. gcc-2.96 achieves best results with -O1.

Process
KtJet (double)
Fortran (double)
KtJet (single)
4 1 1
1.4234 +/- 0.0265
1.7229 +/- 0.0130
0.8836 +/- 0.0195
4 1 2
1.429 +/- 0.0314
1.6979 +/- 0.0264
0.8943 +/- 0.0276
4 1 3
1.4203 +/- 0.0241
1.6613 +/- 0.0094
0.8802 +/- 0.0073
4 2 1
1.4591 +/- 0.0155
1.6593 +/- 0.0169
0.9143 +/- 0.0091
4 2 2
1.4627 +/- 0.0134
1.6444 +/- 0.0088
0.9146 +/- 0.0070
4 2 3
1.4521 +/- 0.0164
1.6461 +/- 0.0121
0.9314 +/- 0.0205
4 3 1
1.4519 +/- 0.0100
1.9574 +/- 0.2157
0.9599 +/- 0.0183
4 3 2
1.4523 +/- 0.0164
2.1435 +/- 0.0197
0.9667 +/- 0.0115
4 3 3
1.4727 +/- 0.0138
2.152 +/- 0.0131
0.9587 +/- 0.0130
Executable size
841959
146685
842267

Table 1.0 : Average time to cluster jets in events. Inclusive mode, PP events, gcc optimization flag = -O1

Process
KtJet (double)
Fortran (double)
KtJet (single)
4 1 1
1.7892 +/- 0.0340
1.6923 +/- 0.0101
1.0178 +/- 0.0261
4 1 2
1.6194 +/- 0.0712
1.7398 +/- 0.0074
1.0356 +/- 0.0310
4 1 3
1.6793 +/- 0.0613
1.7437 +/- 0.0097
1.1117 +/- 0.0700
4 2 1
1.7993 +/- 0.0296
1.7455 +/- 0.0253
1.0468 +/- 0.0114
4 2 2
1.7671 +/- 0.1002
1.7131 +/- 0.0079
1.0479 +/- 0.0101
4 2 3
1.6132 +/- 0.0208
1.7127 +/- 0.0104
1.0393 +/- 0.0152
4 3 1
1.6249 +/- 0.0102
1.7436 +/- 0.0304
1.058 +/- 0.0136
4 3 2
1.6195 +/- 0.0158
1.8799 +/- 0.0953
1.0485 +/- 0.0131
4 3 3
1.6346 +/- 0.0156
1.7348 +/- 0.0114
1.0818 +/- 0.0603
Executable size
775283
146013
776295

Table 1.1 : Average time to cluster jets in events. Inclusive mode, PP events, gcc optimization flag = -O2

Process
KtJet (double)
Fortran (double)
KtJet (single)
4 1 1
1.6074 +/- 0.0418
1.7161 +/- 0.0074
1.0524 +/- 0.0317
4 1 2
1.5912 +/- 0.0244
1.7166 +/- 0.0052
1.0467 +/- 0.0196
4 1 3
1.5951 +/- 0.0289
1.7125 +/- 0.0094
1.0453 +/- 0.0315
4 2 1
1.6298 +/- 0.0142
1.732 +/- 0.0105
1.0595 +/- 0.0149
4 2 2
1.6374 +/- 0.0109
1.7332 +/- 0.0085
1.058 +/- 0.0144
4 2 3
1.6337 +/- 0.0119
1.7288 +/- 0.0129
1.0581 +/- 0.0143
4 3 1
1.6469 +/- 0.0141
1.7471 +/- 0.0120
1.0714 +/- 0.0155
4 3 2
1.6515 +/- 0.0170
1.7555 +/- 0.0143
1.0612 +/- 0.0178
4 3 3
1.6413 +/- 0.0181
1.7515 +/- 0.0131
1.0574 +/- 0.0208
Executable size
803779
146045
804939

Table 1.2 : Average time to cluster jets in events. Inclusive mode, PP events, gcc optimization flag = -O3

Figure 2.0: Average time to cluster jets in event versus number of input particles for the Fortran ktclus program using various optimisation flags. (Black No optimisation, Blue -O1, Red -O2, Green -O3.)

Figure 2.1: Average time to cluster jets in event versus number of input particles for double precision KtJet using various optimisation flags. (Black No optimisation, Blue -O1, Red -O2, Green -O3.)

Figure 2.2: Average time to cluster jets in event versus number of input particles for single precision KtJet using various optimisation flags. (Black No optimisation, Blue -O1, Red -O2, Green -O3.)

Different compiler versions

Studies were also made using different gcc compiler versions to reconstruct the jets in events using the single precision KtJet. The results are shown below. The general trend of best optimisation to use appears the same for all compilers. -O1 seems to give the fastest executable and the speeds at this optimisation are comparable for all three versions of compiler tested.

Figure 3.0: Average time to cluster jets in event versus number of input particles for the Fortran ktclus program using various gcc compiler versions with optimisation flag set to -O1. (Black gcc-2.95.2, Blue gcc-2.96, Green gcc-3.1.1)

Figure 3.1: Average time to cluster jets in event versus number of input particles for the Fortran ktclus program using various gcc compiler versions with optimisation flag set to -O2. (Black gcc-2.95.2, Blue gcc-2.96, Green gcc-3.1.1)

Figure 3.3: Average time to cluster jets in event versus number of input particles for the Fortran ktclus program using various gcc compiler versions with optimisation flag set to -O3. (Black gcc-2.95.2, Blue gcc-2.96, Green gcc-3.1.1)