The time taken to process events grows with the multiplicity n of the input objects as n3. As the multiplicity grows with energy, and current and future colliders have ever higher event rates, timing issues are therefore of increasing practical importance.
A few compilation options are worth considering if speed is critical.
A compile option to switch from single to double precision arithmetic is available, but single precision is used by default as this increases the speed by much more than 20%.
In gcc, setting the -0 option is essential, the level of optomisation (1,2 or 3) which gives the best results varies between platform and compiler version. We found that both -O1 and -O2 gave massive increases in speed for KtJet. Choice of architecture is also important. For example using the gcc compiler flag -march=athlon on an Athlon processor gives a performance gain of around 5%.
Studies have shown that using single precision in KtJet greatly increases the overall speed of execution of KtJet and has no significant effects on the output final jets. All profiling studies were done using gprof in conjunction with Python. All tests were run on redhat 7.* using gcc with -O2 optimisation. The processor was a PIII 866 MHz. More studies on profiling can be found here.
Comparisons were made for all settings and single precision gave significantly improved performance for all with no degradation in physics results. An example for PP events using inclusive mode is shown in table 1.0. This shows the time spent in each fuction for the most CPU time intensive functions.
Table 1.0: Profiling of KtJet: Average call time to functions for PP events (flags 4 1 1).
Profiling in Inclusive mode
The average call time for single and double precision in KtJet and also Fortran ktclus (double precision) were calculated for all combinations of flags and for all types of inputs (PP, Pe, eP, ee). For the inclusive mode overall call time was calculated as the time to call KtEvent constructor plus the call to KtEvent::GetJets(), for Fortran ktclus the time was the call to ktclus plus the call ktincl. For each collision type 10 events were generated and these were run over 10 times each to calculate the average call time. For inclusive mode, Table 2.0 shows the results for PP events, Table 2.1 shows the results for Pe events and Table 2.2 shows the results for eP events.
Table 2.0 : Average call time (seconds) to reconstruct jets in inclusive mode from PP events.