2 .ie t .ds , \h'\w'\ 'u/2u'
4 .TH bench 3 "9 March 2024" "Straylight/Edgeware" "mLib utilities library"
13 .B "#include <mLib/bench.h>"
16 .B "struct bench_time {"
23 .B "struct bench_timing {"
30 .B "struct bench_timerops {"
31 .BI " void (*describe)(struct bench_timer *" bt ", dstr *" d );
32 .BI " void (*now)(struct bench_timer *" bt ", struct bench_time *" t_out );
33 .BI " void (*destroy)(struct bench_timer *" bt );
35 .B "struct bench_timer {"
36 .B " const struct bench_timerops *ops;"
39 .B "struct bench_state {"
41 .B " double target_s;"
45 .BI "typedef void bench_fn(unsigned long " n ", void *" ctx );
47 .B "#define BTF_TIMEOK ..."
48 .B "#define BTF_CYOK ..."
49 .B "#define BTF_CLB ..."
50 .B "#define BTF_ANY (BTF_TIMEOK | BTF_CYOK)"
52 .B "struct bench_timer *bench_createtimer(void);"
54 .BI "int bench_init(struct bench_state *" b ", struct bench_timer *" tm );
55 .BI "void bench_destroy(struct bench_state *" b );
56 .BI "int bench_calibrate(struct bench_state *" b );
57 .ta \w'\fBint bench_measure('u
58 .BI "int bench_measure(struct bench_state *" b ", struct bench_timing *" t_out ,
59 .BI " double " base ", bench_fn *" fn ", void *" ctx );
65 provides declarations and defintions
66 for performing low-level benchmarks.
70 This function will be described in detail later,
72 it calls a caller-provided function,
73 instructing it to run adaptively chosen numbers of iterations,
74 in order to get a reasonably reliable measurement of its running time,
75 and then reports its results by filling in a structure.
77 With understanding this function as our objective,
78 we must examine all of the pieces involved in making it work.
83 is a gadget which is capable of reporting the current time,
84 in seconds (ideally precise to tiny fractions of a second),
86 A timer is represented by a pointer to an object of type
87 .BR "struct bench_timer" .
88 This structure has a single member,
91 .BR "struct bench_timerops" ,
92 which is a table of function pointers;
93 typically, a timer has more data following this,
94 but this fact is not exposed to applications.
96 The function pointers in
97 .B "struct bench_timerops"
102 must always point to the timer object itself.
104 .IB tm ->ops->describe( tm ", " d)
105 Write a description of the timer to the dynamic string
108 .IB tm ->ops->now( tm ", " t_out)
109 Store the current time in
113 used to represent the time reported by a timer
114 is described in detail below.
116 .IB tm ->ops->destroy( tm )
118 releasing all of the resources that it holds.
120 A time, a reported by a timer, is represented by the
121 .BR "struct bench_time" .
122 A passage-of-time measurement is stored in the
126 members, holding seconds and nanoseconds respectively.
127 (A timer need not have nanosecond precision.
128 The exact interpretation of the time \(en
129 e.g., whether it measures wallclock time,
131 or total thread CPU time \(en
132 is a matter for the specific timer implementation.)
133 A cycle count is stored in the
140 is set if the passage-of-time measurement
146 is set if the cycle count
149 Neither the time nor the cycle count need be measured
150 relative to any particular origin.
161 if the timer returned any valid timing information.
163 .SS The built-in timer
166 constructs and returns a timer.
167 It takes a single argument,
170 from which it reads configuration information.
173 fails, it returns a null pointer.
177 pointer may safely be null,
178 in which case a default configuration will be used.
181 set this pointer to a value supplied by a user,
182 e.g., through a command-line argument,
183 environment variable, or
186 The built-in timer makes use of one or two
188 a `clock' subtimer to measure the passage of time,
189 and possibly a `cycle' subtimer to count CPU cycles.
191 The configuration string consists of a sequence of words
192 separated by whitespace.
193 There may be additional whitespace at the start and end of the string.
194 The words recognized are as follows.
197 Prints a list of the available clock and cycle subtimers
201 Use the first of the listed clock subtimers
202 to initialize successfully
203 as the clock subtimer.
204 If none of the subtimers can be initialized,
205 then construction of the timer as a whole fails.
208 Use the first of the listed subtimers
209 to initialize successfully
210 as the cycle subtimer.
211 If none of the subtimers can be initialized,
212 then construction of the timer as a whole fails.
214 The clock subtimers are as follows.
215 Not all of them will be available on every platform.
217 .B posix-thread-cputime
218 Measures the passage of time using
219 .BR clock_gettime (2),
221 .B CLOCK_\%THREAD_\%CPUTIME_\%ID
225 Measures the passage of time using
229 is part of the original ANSI\ C standard,
230 this subtimer should always be available.
231 However, it may produce unhelpful results
232 if other threads are running.
234 The cycle subtimers are as follows.
235 Not all of them will be available on every platform.
238 Counts CPU cycles using the Linux-specific
239 .BR perf_event_open (2)
241 .BR PERF_\%COUNT_\%HW_\%CPU_\%CYCLES
243 Only available on Linux.
244 It will fail to initialize
245 if access to performance counters is restricted,
247 .B /proc/sys/kernel/perf_event_paranoid
251 Counts CPU cycles using the x86
254 This instruction is not really suitable for performance measurement:
255 it gives misleading results on CPUs with variable clock frequency.
258 A dummy cycle counter,
259 which will initialize successfully
260 and then fail to report cycle counts.
261 This is a reasonable fallback in many situations.
263 The built-in preference order for clock subtimers,
264 from most to least preferred, is
265 .B posix-thread-cputime
268 The built-in preference order for cycle subtimers,
269 from most to least preferred, is
276 .SS The benchmark state
279 tracks the information needed to measure performance of functions.
280 It is represented by a
281 .B struct bench_state
284 The benchmark state is initialized by calling
286 passing the address of the state structure to be initialized,
287 and a pointer to a timer.
290 is called with a non-null timer pointer,
291 then it will not fail;
292 the benchmark state will be initialized,
293 and the function returns zero.
294 If the timer pointer is null,
297 attempts to construct a timer for itself
299 .BR bench_createtimer .
301 then the benchmark state will be initialized,
302 and the function returns zero.
304 the timer becomes owned by the benchmark state:
307 on the benchmark state will destroy the timer.
310 is called with a null timer pointer,
311 and its attempt to create a timer for itself fails,
315 the benchmark state is not initialized
316 and can safely be discarded;
320 on the unsuccessfully benchmark state is safe and has no effect.
325 releases any resources it holds,
326 most notably its timer, if any.
329 .B struct bench_state
330 is defined in the header file,
331 only two members are available for use by applications.
334 A word containing flags.
337 The target time for which to try run a benchmark, in seconds.
338 After initialization, this is set to 1.0,
339 though applications can override it.
341 Before the benchmark state can be used in measurements,
344 This is performed by calling
346 on the benchmark state.
347 Calibration takes a noticeable amount of time
348 (currently about 0.25\*,s),
349 so it makes sense to defer it until it's known to be necessary.
351 Calibration is carried out separately, but in parallel,
352 for the timer's passage-of-time measurement and cycle counter.
353 Either or both of these calibrations can succeed or fail;
354 if passage-of-time calibration fails,
355 then cycle count calibration is impossible.
359 sets flag in the benchmark state's
362 if passage-of-time calibration succeeded,
365 if cycle-count calibration succeeded,
370 is set unconditionally,
371 as a persistent indication that calibration has been attempted.
375 function returns zero if it successfully calibrated
376 at least the passage-of-time measurement;
377 otherwise, it returns \-1.
380 is called for a second or subsequent time on the same benchmark state,
381 it returns immediately,
382 either returning 0 or \-1
383 according to whether passage-of-time had previously been calibrated.
387 .I benchmark function
390 .BI "void " fn "(unsigned long " n ", void *" ctx );
392 When called, it should perform the operation to be measured
397 argument is a pointer passed into
399 for the benchmark function's own purposes.
403 receives five arguments.
406 points to the benchmark state to be used.
410 .BR struct bench_timing
411 in which the measurement should be left.
412 This structure is described below.
415 is a count of the number of operations performed
416 by each iteration of the benchmark function.
419 is a benchmark function, described above.
422 is a pointer to be passed to the benchmark function.
424 does not interpret this pointer in any way.
428 function calls its benchark function repeatedly
429 with different iteration counts
431 with the objective that the call take approximately
433 seconds, as established in the benchmark state.
440 is satisfied when a call takes at least
442 Once the function finds a satisfactory number of iterations,
443 it stores the results in
445 If measurement succeeds, then
449 most likely because the timer failed \(en
454 structure reports the outcome of a successful measurement.
460 is set if the passage-of-time measurement in
464 is set if the cycle count in
469 The number of iterations performed by the benchmark function
470 on its satisfactory run,
475 The time taken for the satisfactory run of the benchmark function,
483 The number of CPU cycles used
484 in the satisfactory run of the benchmark function,
495 Mark Wooding, <mdw@distorted.org.uk>