How to search by file hash using OneDrive SDK

1.3k Views Asked by At

Part of a program I'm writing needs to connect to OneDrive and de-duplicate files (there's a folder that has lots of files, many of which are also found elsewhere in the user's OneDrive, possibly under a different file name).

So given a particular file, I need a way to search for duplicates (and if they exist, I'll delete the first file). OneDrive provides a file hash; I just need to be able to search by that to find dupes.

The OneDrive Explorer C# sample (https://github.com/OneDrive/onedrive-explorer-win) shows how to implement search, but the search seems only index file names, contents, and tags -- not hashes.

Any way to search by hash? Otherwise I suppose I'll need to recursively go through every item in the user's OneDrive and compare the hash...

1

There are 1 best solutions below

0
On

OneDrive does not support searching by hash.

If you are looking to de-dupe multiple times I would recommend using the view.changes api to see the files that have been updated.

GET https://api.onedrive.com/v1.0/drive/root/view.changes?select=id,file

This query has been altered so it will just return the item ids and the file facets for all items in the drive

{ ... "value": [{ "id": "DA56136E!124" }, { "id": "DA56136E!178", "file": { "hashes": { "crc32Hash": "838920AE", "sha1Hash": "23DCC6D4B5BFE00357FD0248BB5955B8E36BB8F1" }, "mimeType": "image/gif" } }, ...

After following the @odata.nextLink until you've enumerated the entire set of files you should have all of the item id's in the drive along with the sha1 or crc32 that matches those files. Then you can perform your cleanup process to remove the files that have been duplicated.

If you preserve the @changes.token then you can make future calls and only see the files that have changed since the last time you preformed a de-dupe and be confidant of the clean state of the drive.