Skip to content

Commit

Permalink
Merge branch 'sharkdp:master' into master
Browse files Browse the repository at this point in the history
  • Loading branch information
deep-soft authored Dec 19, 2023
2 parents cbcb440 + 07343b5 commit 3310826
Show file tree
Hide file tree
Showing 8 changed files with 91 additions and 83 deletions.
2 changes: 1 addition & 1 deletion .github/workflows/CICD.yml
Original file line number Diff line number Diff line change
Expand Up @@ -209,7 +209,7 @@ jobs:
DPKG_BASENAME=${{ needs.crate_metadata.outputs.name }}
DPKG_CONFLICTS=${{ needs.crate_metadata.outputs.name }}-musl
case ${{ matrix.job.target }} in *-musl) DPKG_BASENAME=${{ needs.crate_metadata.outputs.name }}-musl ; DPKG_CONFLICTS=${{ needs.crate_metadata.outputs.name }} ;; esac;
case ${{ matrix.job.target }} in *-musl*) DPKG_BASENAME=${{ needs.crate_metadata.outputs.name }}-musl ; DPKG_CONFLICTS=${{ needs.crate_metadata.outputs.name }} ;; esac;
DPKG_VERSION=${{ needs.crate_metadata.outputs.version }}
unset DPKG_ARCH
Expand Down
42 changes: 37 additions & 5 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,20 +2,52 @@

## Features


## Bugfixes


## Changes


## Other




# v9.0.0

## Performance

- Performance has been *significantly improved*, both due to optimizations in the underlying `ignore`
crate (#1429), and in `fd` itself (#1422, #1408, #13620) - @tavianator.
[Benchmarks results](https://gist.github.com/tavianator/32edbe052f33ef60570cf5456b59de81) show gains
of 6-8x for full traversals of smaller directories (100k files) and up to 13x for larger directories (1M files).

- The default number of threads is now constrained to be at most 64. This should improve startup time on
systems with many CPU cores. (#1203, #1410, #1412, #1431) - @tmccombs and @tavianator

- New flushing behavior when writing output to stdout, providing better performance for TTY and non-TTY
use cases, see #1452 and #1313 (@tavianator).

## Features

- Support character and block device file types, see #1213 and #1336 (@cgzones)
- Breaking: `.git/` is now ignored by default when using `--hidden` / `-H`, use `--no-ignore` / `-I` or
`--no-ignore-vcs` to override, see #1387 and #1396 (@skoriop)


## Bugfixes

- Fix `NO_COLOR` support, see #1421 (@acuteenvy)

## Changes
## Other

- Fixed documentation typos, see #1409 (@marcospb19)

## Thanks

Special thanks to @tavianator for his incredible work on performance in the `ignore` crate and `fd` itself.

- The default number of threads is now constrained to be at most 16. This should improve startup time on
systems with many CPU cores. (#1203)

## Other

# v8.7.1

Expand Down
2 changes: 1 addition & 1 deletion Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ license = "MIT OR Apache-2.0"
name = "fd-find"
readme = "README.md"
repository = "https://github.com/sharkdp/fd"
version = "8.7.1"
version = "9.0.0"
edition= "2021"
rust-version = "1.70.0"

Expand Down
90 changes: 36 additions & 54 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -314,8 +314,8 @@ Options:
-d, --max-depth <depth> Set maximum search depth (default: none)
-E, --exclude <pattern> Exclude entries that match the given glob pattern
-t, --type <filetype> Filter by type: file (f), directory (d), symlink (l),
executable (x), empty (e), socket (s), pipe (p),
block-device (b), char-device (c)
executable (x), empty (e), socket (s), pipe (p), char-device
(c), block-device (b)
-e, --extension <ext> Filter by file extension
-S, --size <size> Limit results based on the size of files
--changed-within <date|dur> Filter by file modification time (newer than)
Expand All @@ -331,64 +331,57 @@ Options:

## Benchmark

Let's search my home folder for files that end in `[0-9].jpg`. It contains ~190.000
subdirectories and about a million files. For averaging and statistical analysis, I'm using
Let's search my home folder for files that end in `[0-9].jpg`. It contains ~750.000
subdirectories and about a 4 million files. For averaging and statistical analysis, I'm using
[hyperfine](https://github.com/sharkdp/hyperfine). The following benchmarks are performed
with a "warm"/pre-filled disk-cache (results for a "cold" disk-cache show the same trends).

Let's start with `find`:
```
Benchmark #1: find ~ -iregex '.*[0-9]\.jpg$'
Time (mean ± σ): 7.236 s ± 0.090 s
Range (min … max): 7.133 s … 7.385 s
Benchmark 1: find ~ -iregex '.*[0-9]\.jpg$'
Time (mean ± σ): 19.922 s ± 0.109 s
Range (min … max): 19.765 s … 20.065 s
```

`find` is much faster if it does not need to perform a regular-expression search:
```
Benchmark #2: find ~ -iname '*[0-9].jpg'
Time (mean ± σ): 3.914 s ± 0.027 s
Range (min … max): 3.876 s … 3.964 s
```

Now let's try the same for `fd`. Note that `fd` *always* performs a regular expression
search. The options `--hidden` and `--no-ignore` are needed for a fair comparison,
otherwise `fd` does not have to traverse hidden folders and ignored paths (see below):
Benchmark 2: find ~ -iname '*[0-9].jpg'
Time (mean ± σ): 11.226 s ± 0.104 s
Range (min … max): 11.119 s … 11.466 s
```
Benchmark #3: fd -HI '.*[0-9]\.jpg$' ~

Time (mean ± σ): 811.6 ms ± 26.9 ms
Range (min … max): 786.0 ms … 870.7 ms
Now let's try the same for `fd`. Note that `fd` performs a regular expression
search by defautl. The options `-u`/`--unrestricted` option is needed here for
a fair comparison. Otherwise `fd` does not have to traverse hidden folders and
ignored paths (see below):
```
For this particular example, `fd` is approximately nine times faster than `find -iregex`
and about five times faster than `find -iname`. By the way, both tools found the exact
same 20880 files :smile:.

Finally, let's run `fd` without `--hidden` and `--no-ignore` (this can lead to different
search results, of course). If *fd* does not have to traverse the hidden and git-ignored
folders, it is almost an order of magnitude faster:
Benchmark 3: fd -u '[0-9]\.jpg$' ~
Time (mean ± σ): 854.8 ms ± 10.0 ms
Range (min … max): 839.2 ms … 868.9 ms
```
Benchmark #4: fd '[0-9]\.jpg$' ~
For this particular example, `fd` is approximately **23 times faster** than `find -iregex`
and about **13 times faster** than `find -iname`. By the way, both tools found the exact
same 546 files :smile:.

Time (mean ± σ): 123.7 ms ± 6.0 ms
Range (min … max): 118.8 ms … 140.0 ms
```

**Note**: This is *one particular* benchmark on *one particular* machine. While I have
performed quite a lot of different tests (and found consistent results), things might
be different for you! I encourage everyone to try it out on their own. See
**Note**: This is *one particular* benchmark on *one particular* machine. While we have
performed a lot of different tests (and found consistent results), things might
be different for you! We encourage everyone to try it out on their own. See
[this repository](https://github.com/sharkdp/fd-benchmarks) for all necessary scripts.

Concerning *fd*'s speed, the main credit goes to the `regex` and `ignore` crates that are also used
in [ripgrep](https://github.com/BurntSushi/ripgrep) (check it out!).
Concerning *fd*'s speed, a lot of credit goes to the `regex` and `ignore` crates that are
also used in [ripgrep](https://github.com/BurntSushi/ripgrep) (check it out!).

## Troubleshooting

### `fd` does not find my file!

Remember that `fd` ignores hidden directories and files by default. It also ignores patterns
from `.gitignore` files. If you want to make sure to find absolutely every possible file, always
use the options `-u`/`--unrestricted` option (or `-HI` to enable hidden and ignored files):
``` bash
> fd -u …
```

### Colorized output

`fd` can colorize files by extension, just like `ls`. In order for this to work, the environment
Expand All @@ -402,15 +395,6 @@ for alternative, more complete (or more colorful) variants, see [here](https://g

`fd` also honors the [`NO_COLOR`](https://no-color.org/) environment variable.

### `fd` does not find my file!

Remember that `fd` ignores hidden directories and files by default. It also ignores patterns
from `.gitignore` files. If you want to make sure to find absolutely every possible file, always
use the options `-u`/`--unrestricted` option (or `-HI` to enable hidden and ignored files):
``` bash
> fd -u …
```

### `fd` doesn't seem to interpret my regex pattern correctly

A lot of special regex characters (like `[]`, `^`, `$`, ..) are also special characters in your
Expand Down Expand Up @@ -543,7 +527,7 @@ Make sure that `$HOME/.local/bin` is in your `$PATH`.
If you use an older version of Ubuntu, you can download the latest `.deb` package from the
[release page](https://github.com/sharkdp/fd/releases) and install it via:
``` bash
sudo dpkg -i fd_8.7.1_amd64.deb # adapt version number and architecture
sudo dpkg -i fd_9.0.0_amd64.deb # adapt version number and architecture
```

### On Debian
Expand Down Expand Up @@ -677,7 +661,7 @@ With Rust's package manager [cargo](https://github.com/rust-lang/cargo), you can
```
cargo install fd-find
```
Note that rust version *1.64.0* or later is required.
Note that rust version *1.70.0* or later is required.

`make` is also needed for the build.

Expand Down Expand Up @@ -708,8 +692,6 @@ cargo install --path .

## License

Copyright (c) 2017-2021 The fd developers

`fd` is distributed under the terms of both the MIT License and the Apache License 2.0.

See the [LICENSE-APACHE](LICENSE-APACHE) and [LICENSE-MIT](LICENSE-MIT) files for license details.
2 changes: 1 addition & 1 deletion doc/release-checklist.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ necessary changes for the upcoming release.
- [ ] Update version in `Cargo.toml`. Run `cargo build` to update `Cargo.lock`.
Make sure to `git add` the `Cargo.lock` changes as well.
- [ ] Find the current min. supported Rust version by running
`grep '^\s*MIN_SUPPORTED_RUST_VERSION' .github/workflows/CICD.yml`.
`grep rust-version Cargo.toml`.
- [ ] Update the `fd` version and the min. supported Rust version in `README.md`.
- [ ] Update `CHANGELOG.md`. Change the heading of the *"Upcoming release"* section
to the version of this release.
Expand Down
26 changes: 8 additions & 18 deletions src/cli.rs
Original file line number Diff line number Diff line change
Expand Up @@ -715,24 +715,14 @@ impl Opts {
fn default_num_threads() -> NonZeroUsize {
// If we can't get the amount of parallelism for some reason, then
// default to a single thread, because that is safe.
// Note that the minimum value for a NonZeroUsize is 1.
// Unfortunately, we can't do `NonZeroUsize::new(1).unwrap()`
// in a const context.
const FALLBACK_PARALLELISM: NonZeroUsize = NonZeroUsize::MIN;
// As the number of threads increases, the startup time suffers from
// initializing the threads, and we get diminishing returns from additional
// parallelism. So set a maximum number of threads to use by default.
//
// This value is based on some empirical observations, but the ideal value
// probably depends on the exact hardware in use.
//
// Safety: The literal "20" is known not to be zero.
const MAX_DEFAULT_THREADS: NonZeroUsize = unsafe { NonZeroUsize::new_unchecked(20) };

std::cmp::min(
std::thread::available_parallelism().unwrap_or(FALLBACK_PARALLELISM),
MAX_DEFAULT_THREADS,
)
let fallback = NonZeroUsize::MIN;
// To limit startup overhead on massively parallel machines, don't use more
// than 64 threads.
let limit = NonZeroUsize::new(64).unwrap();

std::thread::available_parallelism()
.unwrap_or(fallback)
.min(limit)
}

#[derive(Copy, Clone, PartialEq, Eq, ValueEnum)]
Expand Down
8 changes: 6 additions & 2 deletions src/walk.rs
Original file line number Diff line number Diff line change
Expand Up @@ -214,7 +214,6 @@ impl<'a, W: Write> ReceiverBuffer<'a, W> {
}
ReceiverMode::Streaming => {
self.print(&dir_entry)?;
self.flush()?;
}
}

Expand All @@ -232,6 +231,11 @@ impl<'a, W: Write> ReceiverBuffer<'a, W> {
}
}
}

// If we don't have another batch ready, flush before waiting
if self.mode == ReceiverMode::Streaming && self.rx.is_empty() {
self.flush()?;
}
}
Err(RecvTimeoutError::Timeout) => {
self.stream()?;
Expand Down Expand Up @@ -285,7 +289,7 @@ impl<'a, W: Write> ReceiverBuffer<'a, W> {

/// Flush stdout if necessary.
fn flush(&mut self) -> Result<(), ExitCode> {
if self.config.interactive_terminal && self.stdout.flush().is_err() {
if self.stdout.flush().is_err() {
// Probably a broken pipe. Exit gracefully.
return Err(ExitCode::GeneralError);
}
Expand Down

0 comments on commit 3310826

Please sign in to comment.