Skip to content

expand: cgu=3 for binary size#10863

Open
oech3 wants to merge 1 commit intouutils:mainfrom
oech3:expand3
Open

expand: cgu=3 for binary size#10863
oech3 wants to merge 1 commit intouutils:mainfrom
oech3:expand3

Conversation

@oech3
Copy link
Contributor

@oech3 oech3 commented Feb 10, 2026

1349360 -> 1210472 byte. cgu=2 caused performance regression.
#10863 (comment)

@github-actions
Copy link

GNU testsuite comparison:

Skipping an intermittent issue tests/tail/follow-name (passes in this run but fails in the 'main' branch)
Congrats! The gnu test tests/pr/bounded-memory is no longer failing!

@oech3 oech3 marked this pull request as ready for review February 10, 2026 16:53
@xtqqczze
Copy link
Contributor

why not codegen-units = 1?

@oech3
Copy link
Contributor Author

oech3 commented Feb 11, 2026

see comment at Cargo.toml

@xtqqczze
Copy link
Contributor

xtqqczze commented Feb 11, 2026

Looking at rust-lang/rust#148670, it looks like the issue is due to stack spill on x86? if so can we use codegen-units = 1 for aarch64?

@oech3
Copy link
Contributor Author

oech3 commented Feb 11, 2026

I cannot bench it. We should fix perf regression with cgu=1 for all targets instead.

@github-actions
Copy link

GNU testsuite comparison:

Congrats! The gnu test tests/pr/bounded-memory is no longer failing!

@oech3
Copy link
Contributor Author

oech3 commented Feb 11, 2026

I tried putting #inline(never) to some funcs, but it regressed performance at most cases.

@xtqqczze
Copy link
Contributor

if so can we use codegen-units = 1 for aarch64?

Unfortunately it doesn't appear to be possible to specify target_arch cfg with profile.release

@oech3
Copy link
Contributor Author

oech3 commented Feb 13, 2026

Also we can't have cgu=N workaround for coreutils binary.

@xtqqczze
Copy link
Contributor

I tried putting #inline(never) to some funcs, but it regressed performance at most cases.

We would likely need to manually outline cold code.

@xtqqczze
Copy link
Contributor

xtqqczze commented Feb 13, 2026

// Fast path: if there are no tabs, backspaces, and (in UTF-8 mode or no carriage returns),

Here: might be worth moving cold code into an inner slow_path function.

@ChrisDryden
Copy link
Collaborator

I'm trying to go through these compiler flag changing PR's and my general take is that it is just a outcome driven decision where if we have measurements an benchmarks that show an improvement should always go for it, but if we don't have benchmarks its probably not the best use of time at the moment relative to other things to spend time on these flags because we are still changing a bunch of functionality. I'm seeing in a bunch of PR's now where code is changing in a function that has compiler hints and there's no easy way to tell whether they should stay or not since its not benchmarked.

@xtqqczze
Copy link
Contributor

Also to outline cold code, we should should refactor error handling, for example:

map_err_context(|| translate!("expand-error-failed-to-write-output"))

should be:

map_err_context(ExpandError::FailedToWriteOutput)

@oech3
Copy link
Contributor Author

oech3 commented Feb 13, 2026

This reduces binary size.

@xtqqczze
Copy link
Contributor

This reduces binary size.

what is binary size with codegen-units = 1?

@oech3
Copy link
Contributor Author

oech3 commented Feb 13, 2026

16: 1211048 byte
3: 1210536 byte
1: 1208152 byte (bad perf)

@oech3
Copy link
Contributor Author

oech3 commented Feb 13, 2026

Performance with cgu=1 is important because #10863 (comment)

@github-actions
Copy link

GNU testsuite comparison:

Skip an intermittent issue tests/tty/tty-eof (fails in this run but passes in the 'main' branch)
Congrats! The gnu test tests/factor/t27 is no longer failing!
Congrats! The gnu test tests/factor/t29 is no longer failing!
Congrats! The gnu test tests/factor/t34 is no longer failing!
Congrats! The gnu test tests/pr/bounded-memory is no longer failing!
Note: The gnu test tests/cut/cut-huge-range is now being skipped but was previously passing.
Note: The gnu test tests/dd/no-allocate is now being skipped but was previously passing.
Note: The gnu test tests/tail/tail-n0f is now being skipped but was previously passing.
Congrats! The gnu test tests/csplit/csplit-heap is now passing!
Congrats! The gnu test tests/cut/bounded-memory is now passing!

@ChrisDryden
Copy link
Collaborator

Its more of a question of where to focus the effort on, some of the fixes to reduce the need for a dependency especially for the locale stuff would probably have orders of magnitude more impact on binary sizes than working on compiler flags. The flag stuff is more fragile in the long run than the dependency stuff because things can change in the compiler and when we update rust where the fixes may no longer apply whereas the work on streamlining the dependency chain will be the same across versions. Just as an example, according the cargo bloat, only 2.6% of the release size is coming from uu_expand and the rest is coming from other dependencies.

@oech3
Copy link
Contributor Author

oech3 commented Feb 13, 2026

If someone found the code imprving both of readability and cgu1, is it OK for you?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants