Searching for duplicate Office documents on SharePoint
Question / Problem
I want to search for duplicate files on a SharePoint site. I know there are duplicate Office files, but TreeSize doesn't show them.
Other file types are being found as expected.
Answer / Solution
When uploading an Office file to SharePoint, the SharePoint itself alters the file. As such, they differ binary and can no longer be considered duplicates when comparing their MD5 checksums.
This can be verified by uploading a file twice, download both files again and compare their binary values and/or checksums.
Other file formats (e.g. PDF, PNG, ..) are - by default - not being altered by SharePoint and work as expected.