remove duplicate files         
DOWNLOAD PEAZIP         
ONLINE SUPPORT         
SCREENSHOTS         
BENCHMARKS         
DONATE




find duplicate files






Find and remove duplicate files


How to compare multiple CRC MD5 SHA hash, checksum values
Software to find redundant data to remove (deduplicate)

FAQ, HOW TO
ONLINE TUTORIAL
ISSUE TRACKER, CVE

CHANGE LOG

TOS, PRIVACY

WHAT IS PEAZIP

REVIEWS


how to detect identical files

Data deduplication, to identify and (possibly) remove duplicate content, is important to reduce disk occupation without loss of information (the data being removed exists in other copies), in order to keep under control the size of backup - possibly speeding up the process and sparing space on backup media supports - and to reduce the final size of compressed archives. Some compressors pushes the principle further and integrate mechanisms to identify / remove duplicate data blocks in order to improve compression ratio.

ho to find duplicate content


Search for duplicate files


When browsing a filesystem the file browser can show file checksum / hash value on demand in last column, allowing to identify binary identical files which have same checksum/hash value.
Clicking the name of the function (after rightclicking the file manager colum header) PeaZip file manager will display hash or checksum value for all (or selected) files. Clicking "Find duplicates" PeaZip file manager will work as duplicate finder utility, displaying size and hash or checksum value only for duplicate files - same binary identical content featured in two or more distinct files - and will report the number of non-unique files identified.

remove duplicate files

In both cases, sorting for CRC column allows to group all files (in same folder, or same search filter) with identical hash or checksum, making easier to detect and remove (if necessary) binary identical files.


Set the algorithm to detect duplicates
The default verification function used to deduplicate files can be set in main application's menu: Organize, Browser, Checksum/hash), a wide selection of algorithms can be selected, ranging from simple checksum functions as Adler32, CRC family (CRC16/24/32, and CRC64) to hash functions like eDonkey/eMule, MD4, MD5, and cryptographically strong hash as Ripemd160, SHA-1, SHA-2 (SHA256 and SHA512), SHA-3 256 and 512 bit, BLAKE2S and BLAKE 2B, and Whirlpool512.


Detect duplicate files in archives


When browsing an archive this on demand verification is not available, but some archive types provides the same integrity-checking information, saving for each archived object the pre-computed checksum or hash value depending on the archive format, and on the archival settings employed - i.e. CRC32 in ZIP archives - allowing to sort archive content by CRC column to group identical files and find out duplicates.


Find similar images


When browsing a filesystem, PeaZip file manager can display image thumbnails to help deduplication: in context menu, organize, check show picture thumbnails, or select a file browser's preset style showing thumbnails.
While checksum/hash based inspection allows to search for exactly identical files (and images), thumbnails allows the user to visually detect similar images (i.e. same picture or graphic saved in different formats, or with different color depth or compression settings, or scaled to different sizes), to help in deciding if the (pseudo) duplication is acceptable, and what copy (or version) to keep or delete.
As role of thumb for deleting extra versions, the best quality image (larger resolution, lower compression or possibly lossless format as RAW, BMP, TIFF, PNG) should be kept, discarding lower quality copies: once lost, information/quality cannot be recreated.


Compare multiple checksum and hash values at once


Check files launches separate duplicate finder utility, from "File tools" submenu (context menu) or "Test" button dropdown, which allows to verify multiple hash and checksum algorithms of multiple files at once.

find duplicate files

Employing multiple functions, and relying on cryptographically strong hash algorithms as Ripemd, SHA-2, Whirlpool, can identify even malicious attempt of forging identical-looking files, detecting differences that would go undetected to weaker algorithms, subject to easier found collisions.


Byte-to-byte comparison (alternative deduplication method)


Compare files utility in "File tools" submenu performs byte to byte comparison between two files; unlike checksum/hash method it is not subject of collisions under any circumstance, and can find out and report exactly what the different bytes are - so it not only tells if two files are not identical, but also what changes were made to content between the two versions.

Read more: checksumvalidate data integrity, and hash functionsfind hash value definitions on Wikipedia.

Synopsis: Detect duplicate files with PeaZip file manager. Search for identical content. How to compare multiple CRC MD5 SHA hash, checksum values at once. Free software to find redundant data to remove (deduplicate) reduntant files.

Topics: find duplicate files, detect duplicate content by hash

PeaZip > FAQ > Free duplicate finder utility, remove identical files

delete duplicate filesFILE MANAGER

Find duplicate files

Optimize compression of graphic files

Split and join file

Verify hash and checksum values






How to detect redundant files with identical checksum, hash value
DOWNLOADS
detect identical files
All PeaZip downloads
PeaZip for Linux
PeaZip for macOS
PeaZip for Windows
PeaZip sources
Search for duplicate files
SUPPORT
find duplicate content
Online help
Frequently Asked Questions

peazip file compression software
ABOUT
peazip free archiver utility
PeaZip project: TOS, Privacy
Releases Feed identical file finder
PeaZip Wiki duplicate files finder
Developer email find duplicate files
Search knowledge-base
how to find duplicate files