.nf
.B "#include <mLib/bench.h>"
.PP
-.ta 2n
+.ta 2n +2n +2n
.B "struct bench_time {"
.B " unsigned f;"
-.B " kludge64 s;"
-.B " uint32 ns;"
+.B " union {"
+.B " struct { kludge64 s; uint32 ns; } ts;"
+.B " clock_t clk;"
+.B " kludge64 rawns;"
+.B " } t;"
.B " kludge64 cy;"
.B "};"
.PP
.B " double cy;"
.B "};"
.PP
+.B "#define BTF_T0 0u"
+.B "#define BTF_T1 ..."
.B "struct bench_timerops {"
.BI " void (*describe)(struct bench_timer *" bt ", dstr *" d );
-.BI " void (*now)(struct bench_timer *" bt ", struct bench_time *" t_out );
+.ta 2n +\w'\fBint (*now)('u
+.BI " int (*now)(struct bench_timer *" bt ,
+.BI " struct bench_time *" t_out ", unsigned " f );
+.ta 2n +\w'\void (*diff)('u
+.BI " void (*diff)(struct bench_timer *" bt ,
+.BI " struct bench_timing *" delta_out ,
+.BI " const struct bench_time *" t0 ,
+.BI " const struct bench_time *" t1 );
.BI " void (*destroy)(struct bench_timer *" bt );
.B "};"
.B "struct bench_timer {"
Write a description of the timer to the dynamic string
.IR d .
.TP
-.IB tm ->ops->now( tm ", " t_out)
+.IB tm ->ops->now( tm ", " t_out ", " f )
Store the current time in
-.IR t_out .
+.BI * t_out \fR.
The
-.B struct bench_time
-used to represent the time reported by a timer
-is described in detail below.
+.B BTF_T1
+flag in
+.I f
+to indicate that this is the second call in a pair;
+leave it clear for the first call.
+(A fake
+.B BTF_T0
+flag is defined to be zero for symmetry.)
+Return zero on success
+.I or
+permanent failure;
+return \-1 if timing failed but
+trying again immediately has a reasonable chance of success.
+.TP
+.IB tm ->ops->diff( tm ", " delta_out ", " t0 ", " t1 )
+Store in
+.BI * delta_out
+the difference between the two times
+.I t0
+and
+.IR t1 .
.TP
.IB tm ->ops->destroy( tm )
Destroy the timer,
releasing all of the resources that it holds.
.PP
-A time, a reported by a timer, is represented by the
-.BR "struct bench_time" .
-A passage-of-time measurement is stored in the
-.B s
-and
-.B ns
-members, holding seconds and nanoseconds respectively.
-(A timer need not have nanosecond precision.
-The exact interpretation of the time \(en
-e.g., whether it measures wallclock time,
-user-mode CPU time,
-or total thread CPU time \(en
-is a matter for the specific timer implementation.)
-A cycle count is stored in the
-.B cy
-member.
-The
+A
+.B bench_timing
+structure reports the difference between two times,
+as determined by a timer's
+.B diff
+function.
+It has four members.
+.TP
.B f
-member stores flags:
+A flags word.
.B BTF_TIMEOK
-is set if the passage-of-time measurement
-.B s
-and
-.B ns
-are valid; and
+is set if the passage-of-time measurement in
+.B t
+is valid;
.B BTF_CYOK
-is set if the cycle count
+is set if the cycle count in
.B cy
is valid.
-Neither the time nor the cycle count need be measured
-relative to any particular origin.
The mask
.B BTF_ANY
covers the
.B BTF_CYOK
bits:
hence,
-.IB f &BTF_ANY
+.B f&BTF_ANY
is nonzero (true)
if the timer returned any valid timing information.
+.TP
+.B n
+The number of iterations performed by the benchmark function
+on its satisfactory run,
+multiplied by
+.IR base .
+.TP
+.B t
+The time taken for the satisfactory run of the benchmark function,
+in seconds.
+Only valid if
+.B BTF_TIMEOK
+is set in
+.BR f .
+.TP
+.B cy
+The number of CPU cycles used
+in the satisfactory run of the benchmark function,
+in seconds.
+Only valid if
+.B BTF_CYOK
+is set in
+.BR f .
+.PP
+A
+.B "struct bench_time"
+representats a single instant in time,
+as captured by a timer's
+.B now
+function.
+The use of this structure is a private matter for the timer:
+the only hard requirement is that the
+.B diff
+function should be able to compute the difference between two times.
+However, the intent is that
+a passage-of-time measurement is stored in the
+.B t
+union,
+a cycle count is stored in the
+.B cy
+member, and
+the
+.B f
+member stores flags
+.B BTF_TIMEOK
+and or
+.B BTF_CYOK
+if the passage-of-time or cycle count respectively are valid.
.
.SS The built-in timer
The function
The clock subtimers are as follows.
Not all of them will be available on every platform.
.TP
+.B linux-x86-perf-rdpmc-hw-cycles
+This is a dummy companion to the similarly named cycle subtimer;
+see its description below.
+.TP
.B posix-thread-cputime
Measures the passage of time using
.BR clock_gettime (2),
The cycle subtimers are as follows.
Not all of them will be available on every platform.
.TP
-.B linux-perf-event
-Counts CPU cycles using the Linux-specific
+.B linux-perf-read-hw-cycles
+Counts CPU cycles using the Linux-specific
.BR perf_event_open (2)
function to read the
.BR PERF_\%COUNT_\%HW_\%CPU_\%CYCLES
.B /proc/sys/kernel/perf_event_paranoid
level is too high.
.TP
-.B x86-rdtsc
-Counts CPU cycles using the x86
+.B linux-perf-rdpmc-hw-cycles
+Counts CPU cycles using the Linux-specific
+.BR perf_event_open (2)
+function,
+as for
+.B linux-x86-perf-read-hw-cycles
+above,
+except that it additionally uses the i386/AMD64
.B rdtsc
+and
+.B rdpmc
+instructions,
+together with information provided by the kernel
+through a memory-mapped page
+to do its measurements without any system call overheads.
+It does passage-of-time and cycle counting in a single operation,
+so no separate clock subtimer is required:
+the similarly-named clock subtimer does nothing
+except check that the
+.B linux-x86-perf-rdpmc-hw-cycles
+cycle subtimer has been selected.
+This is almost certainly the best choice if it's available.
+.TP
+.B x86-rdtscp
+Counts CPU cycles using the x86
+.B rdtscp
instruction.
This instruction is not really suitable for performance measurement:
it gives misleading results on CPUs with variable clock frequency.
.TP
+.B x86-rdtsc
+Counts CPU cycles using the x86
+.B rdtsc
+instruction.
+This has the downsides of
+.B rdtscp
+above,
+but also fails to detect when the thread has been suspended
+or transferred to a different CPU core
+and gives misleading answers in this case.
+Not really recommended.
+.TP
.B null
A dummy cycle counter,
which will initialize successfully
.PP
The built-in preference order for clock subtimers,
from most to least preferred, is
-.B posix-thread-cputime
+.BR linux-x86-perf-rdpmc-hw-cycles ,
followed by
+.BR posix-thread-cputime ,
+and finally
.BR stdc-clock .
The built-in preference order for cycle subtimers,
from most to least preferred, is
-.B linux-perf-event
+.BR linux-x86-perf-rdpmc-hw-cycles
+then
+.BR linux-x86-perf-read-hw-cycles ,
followed by
+.BR x86-rdtscp ,
+and
.BR x86-rdtsc ,
-and then
+and finally
.BR null .
.
.SS The benchmark state
If it fails \(en
most likely because the timer failed \(en
then it returns \-1.
-.PP
-A
-.B bench_timing
-structure reports the outcome of a successful measurement.
-It has four members.
-.TP
-.B f
-A flags word.
-.B BTF_TIMEOK
-is set if the passage-of-time measurement in
-.B t
-is valid;
-.B BTF_CYOK
-is set if the cycle count in
-.B cy
-is valid.
-.TP
-.B n
-The number of iterations performed by the benchmark function
-on its satisfactory run,
-multiplied by
-.IR base .
-.TP
-.B t
-The time taken for the satisfactory run of the benchmark function,
-in seconds.
-Only valid if
-.B BTF_TIMEOK
-is set in
-.BR f .
-.TP
-.B cy
-The number of CPU cycles used
-in the satisfactory run of the benchmark function,
-in seconds.
-Only valid if
-.B BTF_CYOK
-is set in
-.BR f .
.
.\"--------------------------------------------------------------------------
.SH "SEE ALSO"