Getting started with Git Python

6.1k Views Asked by At

My goal is to access existing Git repos from Python. I want to get repo history and on demand diffs.

In order to do that I started with dulwich. So I tried:

from dulwich.repo import Repo
Repo.init('/home/umpirsky/Projects/my-exising-git-repo')

and got OSError: [Errno 17] File exists: '/home/umpirsky/Projects/my-exising-git-repo/.git

The doc says You can open an existing repository or you can create a new one..

Any idea how to do that? Can I fetch history and diffs with dulwich? Can you recommand any other lib for Git access? I am developing Ubuntu app, so it would be appriciated to have ubuntu package for easier deployment.

I will also check periodically to detect new changes in repo, so I would rather work with remote so I can detect changes that are not pulled to local yet. I'm not sure how this should work, so any help will be appriciated.

Thanks in advance.

2

There are 2 best solutions below

13
On BEST ANSWER

Most of Dulwich' documentation assumes a fair bit of knowledge of the Git file formats/protocols.

You should be able to open an existing repository with Repo:

from dulwich.repo import Repo
x = Repo("/path/to/git/repo")

or create a new one:

x = Repo.init("/path/to/new/repo")

To get the diff for a particular commit (the diff with its first parent)

from dulwich.patch import write_tree_diff
commit = x[commit_id]
parent_commit = x[commit.parents[0]]
write_tree_diff(sys.stdout, x.object_store, parent_commit.tree, commit.tree)

The Git protocol only allows fetching/sending packs, it doesn't allow direct access to specific objects in the database. This means that to inspect a remote repository you first have to fetch the relevant commits from the remote repo and then you can view them:

from dulwich.client import get_transport_and_path
client, path = get_transport_and_path(remote_url)
remote_refs = client.fetch(path, x)
print x[remote_refs["refs/heads/master"]]
5
On

I think that init method is used to create a new repository, to open an existing one you just pass the path to it this way:

from dulwich.repo import Repo
repo = Repo(<path>)

For a summary of alternative libraries please have a look at this answer. Basically, it suggests that it's easier to use subprocess module because it's the best way to use the interface you already know.