NOTE: This project is still being developed. At the moment, as shown in the screenshot below, deduplicator is able to scan through and list duplicates with and without caching. Contributions are welcome.
9
-
</p>
10
-
11
-
<h2 align="center">Usage</h2>
7
+
## Usage
12
8
13
9
```bash
14
10
Usage: deduplicator [OPTIONS]
skipped 7 lines
22
18
-V, --version Print version information
23
19
```
24
20
25
-
<h2 align="center">Installation</h2>
21
+
## Installation
26
22
27
-
<p align="center">Currently, deduplicator is only installable via rust's cargo package manager</p>
23
+
### Cargo Install
24
+
25
+
#### Stable
28
26
27
+
```bash
28
+
$ cargo install deduplicator
29
29
```
30
-
cargo install deduplicator
30
+
31
+
#### Nightly
32
+
33
+
if you'd like to install with nightly features, you can use
note that if you use a version manager to install rust (like asdf), you need to reshim (`asdf reshim rust`).
34
-
</p>
38
+
Please note that if you use a version manager to install rust (like asdf), you need to reshim (`asdf reshim rust`).
35
39
36
-
<h2 align="center">Performance</h2>
40
+
### Linux (Pre-built Binary)
37
41
38
-
<p align="center">
39
-
Deduplicator uses fxhash (a non-cryptographic hashing algorithm) which is extremely fast. As a result, deduplicator is able to process huge amounts of data in a <del>couple of seconds.</del> few milliseconds.</p>
42
+
you can download the pre-built binary from the [Releases](https://github.com/sreedevk/deduplicator/releases) page.
43
+
download the `deduplicator-x86_64-unknown-linux-gnu.tar.gz` for linux. Once you have the tarball file with the executable,
44
+
you can follow these steps to install:
40
45
41
-
<p align="center">
42
-
<del>While testing, Deduplicator was able to go through 8.6GB of pdf files and detect duplicates in 2.9 seconds</del>
43
-
As of version 0.1.1, on testing locally, deduplicator was able to process and find duplicates in 120GB of files (Videos, PDFs, Images) in ~300ms
44
-
</p>
46
+
```bash
47
+
$ tar -zxvf deduplicator-x86_64-unknown-linux-gnu.tar.gz
you can download the pre-build binary from the [Releases](https://github.com/sreedevk/deduplicator/releases) page.
54
+
download the `deduplicator-x86_64-apple-darwin.tar.gz` tarball for mac os. Once you have the tarball file with the executable, you can follow these steps to install:
55
+
56
+
```bash
57
+
$ tar -zxvf deduplicator-x86_64-unknown-linux-gnu.tar.gz
58
+
$ sudo mv deduplicator /usr/bin/
59
+
```
60
+
61
+
### Windows (Pre-built Binary)
62
+
63
+
you can download the pre-build binary from the [Releases](https://github.com/sreedevk/deduplicator/releases) page.
64
+
download the `deduplicator-x86_64-pc-windows-msvc.zip` zip file for windows. unzip the `zip` file & move the `deduplicator.exe` to a location in the PATH system environment variable.
65
+
66
+
Note: If you Run into an msvc error, please install MSCV from [here](https://learn.microsoft.com/en-us/cpp/windows/latest-supported-vc-redist?view=msvc-170)
67
+
68
+
## Performance
69
+
70
+
Deduplicator uses size comparison and fxhash (a non non-cryptographic hashing algo) to quickly scan through large number of files to find duplicates. its also highly parallel (uses rayon and dashmap). I was able to scan through 120GB of files (Videos, PDFs, Images) in ~300ms. checkout the benchmarks
71
+
72
+
## benchmarks
73
+
74
+
| Command | Dirsize | Mean [ms] | Min [ms] | Max [ms] | Relative |
* The last entry is lower because of the number of files deduplicator had to go through (~660895 Files). The average size of the files rarely affect the performance of deduplicator.
82
+
83
+
These benchmarks were run using [hyperfine](https://github.com/sharkdp/hyperfine). Here are the specs of the machine used to benchmark deduplicator: