Projects STRLCPY deduplicator Commits 31d35aae
🤬
  • ■ ■ ■ ■ ■ ■
    .github/workflows/build_and_release.yml
    1  -name: Build and release
    2  - 
    3  -on:
    4  - push:
    5  - branches:
    6  - - main
    7  - release:
    8  - types: [created]
    9  - 
    10  -jobs:
    11  - build:
    12  - runs-on: ubuntu-latest
    13  - steps:
    14  - - name: Checkout code
    15  - uses: actions/checkout@v2
    16  - 
    17  - - name: Setup Rust
    18  - uses: actions-rs/toolchain@v1
    19  - with:
    20  - toolchain: stable
    21  - profile: minimal
    22  - override: true
    23  - 
    24  - - name: Build for Windows
    25  - run: |
    26  - cargo build --release --target x86_64-pc-windows-gnu
    27  - 
    28  - - name: Build for Linux
    29  - run: |
    30  - cargo build --release --target x86_64-unknown-linux-gnu
    31  - 
    32  - - name: Build for MacOS
    33  - run: |
    34  - cargo build --release --target x86_64-apple-darwin
    35  - 
    36  - - name: Create release
    37  - uses: actions/create-release@v2
    38  - env:
    39  - GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
    40  - with:
    41  - tag_name: ${{ github.ref }}
    42  - release_name: Release ${{ github.ref }}
    43  - draft: false
    44  - prerelease: false
    45  - 
  • ■ ■ ■ ■ ■ ■
    .github/workflows/release.yml
     1 +name: Release
     2 + 
     3 +env:
     4 + PROJECT_NAME: deduplicator
     5 + PROJECT_DESC: "Filter, Sort & Delete Duplicate Files Recursively"
     6 + PROJECT_AUTH: "sreedevk"
     7 + 
     8 +on:
     9 + release:
     10 + types:
     11 + - created
     12 + 
     13 +jobs:
     14 + upload-assets:
     15 + strategy:
     16 + matrix:
     17 + os:
     18 + - ubuntu-latest
     19 + - macos-latest
     20 + - windows-latest
     21 + runs-on: ${{ matrix.os }}
     22 + steps:
     23 + - uses: actions/checkout@v3
     24 + - uses: taiki-e/upload-rust-binary-action@v1
     25 + with:
     26 + bin: deduplicator
     27 + tar: unix
     28 + zip: windows
     29 + token: ${{ secrets.GITHUB_TOKEN }}
     30 + env:
     31 + GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
     32 + 
  • ■ ■ ■ ■ ■
    .gitignore
    1 1  /target
    2 2  /test_data
     3 +.envrc
    3 4   
  • ■ ■ ■ ■ ■
    README.md
    skipped 3 lines
    4 4   Find, Sort, Filter & Delete duplicate files
    5 5  </p>
    6 6   
    7  -<p align="center">
    8  -NOTE: This project is still being developed. At the moment, as shown in the screenshot below, deduplicator is able to scan through and list duplicates with and without caching. Contributions are welcome.
    9  -</p>
    10  - 
    11  -<h2 align="center">Usage</h2>
     7 +## Usage
    12 8   
    13 9  ```bash
    14 10  Usage: deduplicator [OPTIONS]
    skipped 7 lines
    22 18   -V, --version Print version information
    23 19  ```
    24 20   
    25  -<h2 align="center">Installation</h2>
     21 +## Installation
    26 22   
    27  -<p align="center">Currently, deduplicator is only installable via rust's cargo package manager</p>
     23 +### Cargo Install
     24 + 
     25 +#### Stable
    28 26   
     27 +```bash
     28 +$ cargo install deduplicator
    29 29  ```
    30  -cargo install deduplicator
     30 + 
     31 +#### Nightly
     32 + 
     33 +if you'd like to install with nightly features, you can use
     34 + 
     35 +```bash
     36 +$ cargo install --git https://github.com/sreedevk/deduplicator
    31 37  ```
    32  -<p align="center">
    33  - note that if you use a version manager to install rust (like asdf), you need to reshim (`asdf reshim rust`).
    34  -</p>
     38 +Please note that if you use a version manager to install rust (like asdf), you need to reshim (`asdf reshim rust`).
    35 39   
    36  -<h2 align="center">Performance</h2>
     40 +### Linux (Pre-built Binary)
    37 41   
    38  -<p align="center">
    39  - Deduplicator uses fxhash (a non-cryptographic hashing algorithm) which is extremely fast. As a result, deduplicator is able to process huge amounts of data in a <del>couple of seconds.</del> few milliseconds.</p>
     42 +you can download the pre-built binary from the [Releases](https://github.com/sreedevk/deduplicator/releases) page.
     43 +download the `deduplicator-x86_64-unknown-linux-gnu.tar.gz` for linux. Once you have the tarball file with the executable,
     44 +you can follow these steps to install:
    40 45   
    41  -<p align="center">
    42  - <del>While testing, Deduplicator was able to go through 8.6GB of pdf files and detect duplicates in 2.9 seconds</del>
    43  - As of version 0.1.1, on testing locally, deduplicator was able to process and find duplicates in 120GB of files (Videos, PDFs, Images) in ~300ms
    44  -</p>
     46 +```bash
     47 +$ tar -zxvf deduplicator-x86_64-unknown-linux-gnu.tar.gz
     48 +$ sudo mv deduplicator /usr/bin/
     49 +```
    45 50   
    46  -<h2 align="center">Screenshots</h2>
     51 +### Mac OS (Pre-built Binary)
    47 52   
    48  -<img src="https://user-images.githubusercontent.com/36154121/213618143-e5182e39-731e-4817-87dd-1a6a0f38a449.gif" />
     53 +you can download the pre-build binary from the [Releases](https://github.com/sreedevk/deduplicator/releases) page.
     54 +download the `deduplicator-x86_64-apple-darwin.tar.gz` tarball for mac os. Once you have the tarball file with the executable, you can follow these steps to install:
     55 + 
     56 +```bash
     57 +$ tar -zxvf deduplicator-x86_64-unknown-linux-gnu.tar.gz
     58 +$ sudo mv deduplicator /usr/bin/
     59 +```
     60 + 
     61 +### Windows (Pre-built Binary)
     62 + 
     63 +you can download the pre-build binary from the [Releases](https://github.com/sreedevk/deduplicator/releases) page.
     64 +download the `deduplicator-x86_64-pc-windows-msvc.zip` zip file for windows. unzip the `zip` file & move the `deduplicator.exe` to a location in the PATH system environment variable.
     65 + 
     66 +Note: If you Run into an msvc error, please install MSCV from [here](https://learn.microsoft.com/en-us/cpp/windows/latest-supported-vc-redist?view=msvc-170)
     67 + 
     68 +## Performance
     69 + 
     70 +Deduplicator uses size comparison and fxhash (a non non-cryptographic hashing algo) to quickly scan through large number of files to find duplicates. its also highly parallel (uses rayon and dashmap). I was able to scan through 120GB of files (Videos, PDFs, Images) in ~300ms. checkout the benchmarks
     71 + 
     72 +## benchmarks
     73 + 
     74 +| Command | Dirsize | Mean [ms] | Min [ms] | Max [ms] | Relative |
     75 +|:---|:---|---:|---:|---:|---:|
     76 +| `deduplicator --dir ~/Data/tmp` | (~120G) | 27.5 ± 1.0 | 26.0 | 32.1 | 1.70 ± 0.09 |
     77 +| `deduplicator --dir ~/Data/books` | (~8.6G) | 21.8 ± 0.7 | 20.5 | 24.4 | 1.35 ± 0.07 |
     78 +| `deduplicator --dir ~/Data/books --minsize 10M` | (~8.6G) | 16.1 ± 0.6 | 14.9 | 18.8 | 1.00 |
     79 +| `deduplicator --dir ~/Data/ --types pdf,jpg,png,jpeg` | (~290G) | 1857.4 ± 24.5 | 1817.0 | 1895.5 | 115.07 ± 4.64 |
     80 + 
     81 +* The last entry is lower because of the number of files deduplicator had to go through (~660895 Files). The average size of the files rarely affect the performance of deduplicator.
     82 + 
     83 +These benchmarks were run using [hyperfine](https://github.com/sharkdp/hyperfine). Here are the specs of the machine used to benchmark deduplicator:
     84 + 
     85 +```
     86 +OS: Arch Linux x86_64
     87 +Host: Precision 5540
     88 +Kernel: 5.15.89-1-lts
     89 +Uptime: 4 hours, 44 mins
     90 +Shell: zsh 5.9
     91 +Terminal: kitty
     92 +CPU: Intel i9-9880H (16) @ 4.800GHz
     93 +GPU: NVIDIA Quadro T2000 Mobile / Max-Q
     94 +GPU: Intel CoffeeLake-H GT2 [UHD Graphics 630]
     95 +Memory: 31731MiB (~32GiB)
     96 +```
     97 + 
     98 +## Screenshots
     99 + 
     100 +![](https://user-images.githubusercontent.com/36154121/213618143-e5182e39-731e-4817-87dd-1a6a0f38a449.gif)
    49 101   
Please wait...
Page is in error, reload to recover