7.8 KiB
Benchmarks
The results of these benchmarks suggest that building this bc
with
optimization at -O3
with link-time optimization (-flto
) will result in the
best performance. However, using -march=native
can result in WORSE
performance.
Note: all benchmarks were run four times, and the fastest run is the one
shown. Also, [bc]
means whichever bc
was being run, and the assumed working
directory is the root directory of this repository. Also, this bc
was at
version 3.0.0
while GNU bc
was at version 1.07.1
, and all tests were
conducted on an x86_64
machine running Gentoo Linux with clang
9.0.1
as
the compiler.
Typical Optimization Level
These benchmarks were run with both bc
's compiled with the typical -O2
optimizations and no link-time optimization.
Addition
The command used was:
tests/script.sh bc add.bc 1 0 1 1 [bc]
For GNU bc
:
real 2.54
user 1.21
sys 1.32
For this bc
:
real 0.88
user 0.85
sys 0.02
Subtraction
The command used was:
tests/script.sh bc subtract.bc 1 0 1 1 [bc]
For GNU bc
:
real 2.51
user 1.05
sys 1.45
For this bc
:
real 0.91
user 0.85
sys 0.05
Multiplication
The command used was:
tests/script.sh bc multiply.bc 1 0 1 1 [bc]
For GNU bc
:
real 7.15
user 4.69
sys 2.46
For this bc
:
real 2.20
user 2.10
sys 0.09
Division
The command used was:
tests/script.sh bc divide.bc 1 0 1 1 [bc]
For GNU bc
:
real 3.36
user 1.87
sys 1.48
For this bc
:
real 1.61
user 1.57
sys 0.03
Power
The command used was:
printf '1234567890^100000; halt\n' | time -p [bc] -q > /dev/null
For GNU bc
:
real 11.30
user 11.30
sys 0.00
For this bc
:
real 0.73
user 0.72
sys 0.00
Scripts
This file was downloaded, saved at ../timeconst.bc
and the following
patch was applied:
--- ../timeconst.bc 2018-09-28 11:32:22.808669000 -0600
+++ ../timeconst.bc 2019-06-07 07:26:36.359913078 -0600
@@ -110,8 +110,10 @@
print "#endif /* KERNEL_TIMECONST_H */\n"
}
- halt
}
-hz = read();
-timeconst(hz)
+for (i = 0; i <= 50000; ++i) {
+ timeconst(i)
+}
+
+halt
The command used was:
time -p [bc] ../timeconst.bc > /dev/null
For GNU bc
:
real 16.71
user 16.06
sys 0.65
For this bc
:
real 13.16
user 13.15
sys 0.00
Because this bc
is faster when doing math, it might be a better comparison to
run a script that is not running any math. As such, I put the following into
../test.bc
:
for (i = 0; i < 100000000; ++i) {
y = i
}
i
y
halt
The command used was:
time -p [bc] ../test.bc > /dev/null
For GNU bc
:
real 16.60
user 16.59
sys 0.00
For this bc
:
real 22.76
user 22.75
sys 0.00
I also put the following into ../test2.bc
:
i = 0
while (i < 100000000) {
i += 1
}
i
halt
The command used was:
time -p [bc] ../test2.bc > /dev/null
For GNU bc
:
real 17.32
user 17.30
sys 0.00
For this bc
:
real 16.98
user 16.96
sys 0.01
It seems that the improvements to the interpreter helped a lot in certain cases.
Also, I have no idea why GNU bc
did worse when it is technically doing less
work.
Recommended Optimizations from 2.7.0
Note that, when running the benchmarks, the optimizations used are not the ones
I recommended for version 2.7.0
, which are -O3 -flto -march=native
.
This bc
separates its code into modules that, when optimized at link time,
removes a lot of the inefficiency that comes from function overhead. This is
most keenly felt with one function: bc_vec_item()
, which should turn into just
one instruction (on x86_64
) when optimized at link time and inlined. There are
other functions that matter as well.
I also recommended -march=native
on the grounds that newer instructions would
increase performance on math-heavy code. We will see if that assumption was
correct. (Spoiler: NO.)
When compiling both bc
's with the optimizations I recommended for this bc
for version 2.7.0
, the results are as follows.
Addition
The command used was:
tests/script.sh bc add.bc 1 0 1 1 [bc]
For GNU bc
:
real 2.44
user 1.11
sys 1.32
For this bc
:
real 0.59
user 0.54
sys 0.05
Subtraction
The command used was:
tests/script.sh bc subtract.bc 1 0 1 1 [bc]
For GNU bc
:
real 2.42
user 1.02
sys 1.40
For this bc
:
real 0.64
user 0.57
sys 0.06
Multiplication
The command used was:
tests/script.sh bc multiply.bc 1 0 1 1 [bc]
For GNU bc
:
real 7.01
user 4.50
sys 2.50
For this bc
:
real 1.59
user 1.53
sys 0.05
Division
The command used was:
tests/script.sh bc divide.bc 1 0 1 1 [bc]
For GNU bc
:
real 3.26
user 1.82
sys 1.44
For this bc
:
real 1.24
user 1.20
sys 0.03
Power
The command used was:
printf '1234567890^100000; halt\n' | time -p [bc] -q > /dev/null
For GNU bc
:
real 11.08
user 11.07
sys 0.00
For this bc
:
real 0.71
user 0.70
sys 0.00
Scripts
The command for the ../timeconst.bc
script was:
time -p [bc] ../timeconst.bc > /dev/null
For GNU bc
:
real 15.62
user 15.08
sys 0.53
For this bc
:
real 10.09
user 10.08
sys 0.01
The command for the next script, the for
loop script, was:
time -p [bc] ../test.bc > /dev/null
For GNU bc
:
real 14.76
user 14.75
sys 0.00
For this bc
:
real 17.95
user 17.94
sys 0.00
The command for the next script, the while
loop script, was:
time -p [bc] ../test2.bc > /dev/null
For GNU bc
:
real 14.84
user 14.83
sys 0.00
For this bc
:
real 13.53
user 13.52
sys 0.00
Link-Time Optimization Only
Just for kicks, let's see if -march=native
is even useful.
The optimizations I used for both bc
's were -O3 -flto
.
Addition
The command used was:
tests/script.sh bc add.bc 1 0 1 1 [bc]
For GNU bc
:
real 2.41
user 1.05
sys 1.35
For this bc
:
real 0.58
user 0.52
sys 0.05
Subtraction
The command used was:
tests/script.sh bc subtract.bc 1 0 1 1 [bc]
For GNU bc
:
real 2.39
user 1.10
sys 1.28
For this bc
:
real 0.65
user 0.57
sys 0.07
Multiplication
The command used was:
tests/script.sh bc multiply.bc 1 0 1 1 [bc]
For GNU bc
:
real 6.82
user 4.30
sys 2.51
For this bc
:
real 1.57
user 1.49
sys 0.08
Division
The command used was:
tests/script.sh bc divide.bc 1 0 1 1 [bc]
For GNU bc
:
real 3.25
user 1.81
sys 1.43
For this bc
:
real 1.27
user 1.23
sys 0.04
Power
The command used was:
printf '1234567890^100000; halt\n' | time -p [bc] -q > /dev/null
For GNU bc
:
real 10.50
user 10.49
sys 0.00
For this bc
:
real 0.72
user 0.71
sys 0.00
Scripts
The command for the ../timeconst.bc
script was:
time -p [bc] ../timeconst.bc > /dev/null
For GNU bc
:
real 15.50
user 14.81
sys 0.68
For this bc
:
real 10.17
user 10.15
sys 0.01
The command for the next script, the for
loop script, was:
time -p [bc] ../test.bc > /dev/null
For GNU bc
:
real 14.99
user 14.99
sys 0.00
For this bc
:
real 16.85
user 16.84
sys 0.00
The command for the next script, the while
loop script, was:
time -p [bc] ../test2.bc > /dev/null
For GNU bc
:
real 14.92
user 14.91
sys 0.00
For this bc
:
real 12.75
user 12.75
sys 0.00
It turns out that -march=native
can be a problem. As such, I have removed the
recommendation to build with -march=native
.
Recommended Compiler
When I ran these benchmarks with my bc
compiled under clang
vs. gcc
, it
performed much better under clang
. I recommend compiling this bc
with
clang
.