gh-140009: Optimize dict object by replacing PyTuple_Pack with PyTuple_FromArray#144531
gh-140009: Optimize dict object by replacing PyTuple_Pack with PyTuple_FromArray#144531rashes2006 wants to merge 3 commits intopython:mainfrom
Conversation
|
@rashes2006 Do you have any benchmarks showing the performance gain from these changes? |
|
Also don't update the branch if nothing needs to be updated: https://devguide.python.org/getting-started/pull-request-lifecycle/#update-branch-button. |
|
Tested on CPython 3.15.0a0 (macOS arm64) Test Case:
By switching from the variadic |
|
Please show us the benchmark script itself as well. Is it on a PGO+LTO build? |
|
@picnixz The benchmarks were run on a standard development build of CPython, not a PGO+LTO build. While PGO+LTO may change absolute timings, the relative speedup is expected to be similar, since the improvement comes from reducing per-iteration C call overhead rather than from whole-program optimizations. |
|
Benchmarks on DEBUG builds are not relevant to us. Please do so on a PGO+LTO. There can be lots of changes between them, especially since the functions being invoked are different and because users won't have a DEBUG build. |
|
Also, we need to see the stdev and the geometric mean. Look at #140010 for how we want benchmarks to be reported. |
Benchmarks (PGO+LTO)All performance results below were collected on a full PGO+LTO build of Platform
Benchmark Scriptimport timeit
import sys
def run_benchmark(setup_stmt, run_stmt, label):
n = 5_000_000
t = timeit.Timer(run_stmt, setup=setup_stmt)
results = t.repeat(repeat=5, number=n)
best = min(results) / n
print(f"{label:10} | Best of 5: {best * 1e9:6.2f} nsec per loop")
if __name__ == "__main__":
print(f"Python Build: {sys.version}")
run_benchmark("d = {0:0}", "for _ in d.items(): pass", "Size 1")
run_benchmark("d = {i: i for i in range(10)}", "for _ in d.items(): pass", "Size 10")##Result
|
|
Can you use |
|
Please pay attention to review comments: "Also, we need to see the stdev and the geometric mean." I suggest you using pyperf instead of reinventing own poor framework for benchmarking. |
gh-140009: Optimize dict object by replacing PyTuple_Pack with PyTuple_FromArray
Summary
This PR replaces
PyTuple_PackwithPyTuple_FromArrayinObjects/dictobject.cfor creating small tuples (size 2).PyTuple_FromArrayis more efficient thanPyTuple_Packbecause it avoids the overhead of variadic arguments (va_args) processing by taking a pointer to a pre-allocated array ofPyObject*.Changes
PyTuple_Pack(2, Py_None, Py_None)withPyTuple_FromArrayusing a stack-allocated array.PyTuple_Pack(2, key, val2)withPyTuple_FromArray.Performance Impact
This is part of a general effort to optimize small tuple creation across the codebase. Replacing
PyTuple_PackwithPyTuple_FromArrayfor small, fixed-size tuples reduces call overhead.