BitBucket remote repo larger than clone

91 Views Asked by At

Cleaning private repo history:

$ git gc
$ git filter-repo --replace-refs delete-no-add --strip-blobs-bigger-than $ 10M
$ git reflog expire --expire=now --all
$ git gc --aggressive --prune=now
$ git commit
$ git push origin master --force

du -sh .git returns 2 GB before and 25 MB after the operation, but the size on bitbucket remains 2GB

when I git clone --mirror repo.git the folder repo.git is also 25MB

Questions:

  1. What am I doing wrong?
  2. What is the size difference between local and remote repo?
2

There are 2 best solutions below

0
On

Needed to go through support as indicated by @phd, nothing to do on my own. Sharing the Reply from Support In case it helps someone at some point:


Large files of a total size of 4.3 GB are preserved as part of dark references. we preserve them due to the recent changes made in the PR diff view (https://bitbucket.org/blog/improving-performance-on-complex-diffs).

 4.3GB  dark-refs/pull-requests/52/from
 4.0GB  dark-refs/pull-requests/81/from
 4.0GB  dark-refs/pull-requests/56/from
 3.9GB  dark-refs/pull-requests/75/from
 3.9GB  dark-refs/pull-requests/73/from
 3.9GB  dark-refs/pull-requests/89/from
 3.9GB  dark-refs/pull-requests/88/from
 3.9GB  dark-refs/pull-requests/105/from
 3.9GB  dark-refs/pull-requests/85/from
 3.9GB  dark-refs/pull-requests/80/from
 3.9GB  dark-refs/pull-requests/79/from
 3.9GB  dark-refs/pull-requests/70/from
 3.9GB  dark-refs/pull-requests/90/from
 3.9GB  dark-refs/pull-requests/83/from
 3.9GB  dark-refs/pull-requests/82/from

These cannot be deleted with a GC run, we need to manually delete the PRs and the file will be deleted and then the repository size will reduce.

Please provide your approval so that we can go ahead and delete the pull requests, which should reduce the repository size.

You can export the pull request data using API before we delete them. API: https://developer.atlassian.com/cloud/bitbucket/rest/api-group-pullrequests/#api-repositories-workspace-repo-slug-pullrequests-get

You can also use the below python script to export all your PRs.

Script:

import requests
from requests.auth import HTTPBasicAuth

##Login
username = 'your_bitbucket_username'
password = 'your_bitbucket_app_password_here'
repository = 'repo/repo'


# Request 100 repositories per page (and only their slugs), and the next page URL
next_page_url = 'https://api.bitbucket.org/2.0/repositories/%s/pullrequests?fields=next,values.id,values.created_on,values.state,values.author,values.source.branch.name,values.destination.branch.name,values.source.commit.hash,values.destination.commit.hash,values.description&&q=created_on+%%3E+2022-07-01T00%%3A00%%3A00-07%%3A00&&pagelen=20' % repository
f = open('pr_stats.csv','a')
print("PR Author"+","+"PR Status"+","+"PR Number"+","+"PR created date"+","+"PR Source Branch"+","+"PR Destination Branch"+","+"PR Source Branch Commit"+","+"PR Destination Branch Commit"+","+"PR Description", file=f)

# Keep fetching pages while there's a page to fetch
while next_page_url is not None:
  response = requests.get(next_page_url, auth=HTTPBasicAuth(username, password))
  page_json = response.json()

  # Parse repositories from the JSON
  for repo in page_json['values']:
    author=repo['author']['display_name']
    state=repo['state']
    PR_ID=str(repo['id'])
    created_date=str(repo['created_on'])
    PR_SourceBranch=repo['source']['branch']['name']
    PR_DestinationBranch=repo['destination']['branch']['name']
    PR_SourceBranch_Commit=repo['source']['commit']['hash']
    PR_DestinationBranch_Commit=repo['destination']['commit']['hash']
    PR_Description=repo['description']
    print(author+","+state+","+PR_ID+","+created_date+","+PR_SourceBranch+","+PR_DestinationBranch+","+PR_SourceBranch_Commit+","+PR_DestinationBranch_Commit+","+PR_Description, file=f)
  next_page_url = page_json.get('next', None)
1
On

First of all you have to know what is the actual size of your repository.

  git count-objects -v

This displays your repository size. Use this to forecefully push to master and rewrite the history

git gc — aggressive — prune=all # remove the old files

Git stores every commit, every file related and then overhead data.