I'm using Git version 2.28.0.windows.1 within a Cygwin shell on Windows 10. After I do my clone of my repository, I can see this

$ cat .gitattributes 
* text=auto
*.sh text eol=lf

I set this up thinking that it would correct bad line endings (I want to eliminate the "\r" line endings from getting automatically included). However, after I do my clone

git clone https://github.com/chicommons/maps.git
cd maps

I can still see the line endings I don't want ...

$ grep '\r' web/entrypoint.sh
python manage.py migrate
python manage.py migrate directory
python manage.py docker_init_db_data

What can I do with my ".gitattributes" (or possibly another file?) to prevent these line endings from appearing?

2

There are 2 best solutions below

0
On

The eol=lf directive will prevent Git from adding carriage returns, but it won't prevent Git from keeping existing carriage returns.

Truly understanding what's going on here requires a little bit of knowledge about how Git stores files inside commits. The keys to this are:

  • Each commit stores a full snapshot of every file, in a read-only, compressed, Git-only, and de-duplicated format. This means the files that you actually see and work on / with are not in the repository: the files you use are in your working tree or work-tree.

  • All parts of any commit, including all of its files, are literally unchangeable. If you take a Git internal object (including a commit) out of the repository, modify it somehow, and put it back, you haven't changed the original; instead, you have just added another one, and this new one gets a different hash ID.

  • To get files from a commit into your work-tree, Git must copy them out. That much is obvious; what's not obvious is that there's a third "copy" of each file. This third—or middle, really—copy lives in what Git calls the index, or the staging area, or—rarely these days—the cache. All three names refer to the same entity.

That is, suppose HEAD is attached to the branch name master and master currently represents a commit whose hash ID is a123456.... In other words, this commit—with its big ugly hash ID—is your current commit. Inside this commit, we have files named README.md and main.py and—in your case—web/migrate.sh. There are three "copies" of this file. "Copies" here is in quotes because two of them are in the automatically-de-duplicated format, so there is actually only one underlying copy.

We can illustrate these three copies in a table, using the special name HEAD to refer to commit a123456... (the current commit):

    HEAD              index           work-tree
--------------    --------------    --------------
README.md         README.md         README.md
main.py           main.py           main.py
web/migrate.sh    web/migrate.sh    web/migrate.sh

Where did these files come from? Well, when you first clone the repository, your Git gets all the commits from some other Git. Those commits are exactly the same in every Git, and have the same hash IDs across every Git. Your Git then copies one of those commits—the one you're checking out—to your Git's index, and copies the files from its index to your work-tree. So that's where you got three copies of each file.

The work-tree files are ordinary everyday files, which you can read and write with any program on your computer. The other files are not. When (or after) you have done some work on the work-tree copy of one of your files, you run git add on it. The reason for this is that git add tells Git: make the index copy match the work-tree copy. So if you have changed main.py, for instance, the version of main.py in the index is now different from the version of main.py in the repository:

    HEAD              index           work-tree
--------------    --------------    --------------
README.md(1)      README.md(1)      README.md
main.py(1)        main.py(2)        main.py
web/migrate.sh(1) web/migrate.sh(1) web/migrate.sh

The copy that's in a commit is literally unchangeable, so HEAD—which is short for commit a123456... at the moment—is always going to contain these three versions of the files. But the index, while it uses the internal format, is not a commit1 and is not read-only. So git add can replace the index copy.

(Running git commit takes whatever is in the index and uses that to make a new commit. The new commit then becomes the current commit, so that the name HEAD, and the current branch name, now refer to the new commit, instead of commit a123456.... But we don't need to go that far yet.)


1What it is, is a bit complicated, but to a first approximation, you can think of the index as holding your proposed next commit. Every time you check out some commit, Git must set up the index to be ready for the next commit: normally, by filling it in from the commit you just checked out.


Copying from or to the index is when Git adjusts line endings

A copy of a file in Git's index is in the compressed, Git-only, de-duplicated format. A copy of a file in your work-tree is in ordinary everyday computer format. So any time Git copies from Git's index to your work-tree, it has to expand the file; and any time Git copies from your work-tree to its index, it has to compress and de-duplicate the file.

This copying process is the ideal time to make any changes you want to the file. So this is where .gitattributes and line-ending stuff comes into play. Suppose the file in the index, which got there by being in the repository, has newline-terminated lines, with \n only. Suppose though that you'd like your work-tree copy of the file to have \r\n or CRLF line endings.

If Git turns \n into \r\n on the way out of the index, and turns \r\n into \n on the way in to to index, this accomplishes your goal. That's what * text eol=crlf will do.

But what if you don't want that? What if you want \n endings to remain \n endings? That's what * text eol=lf will do. How do \n endings remain \n endings? By not making any changes.

So * text eol=lf means do not make changes. But what if the file that's inside the repository, which is therefore copied into the index, has \r\n (CRLF) line endings? Well, then, so does your work-tree file.

To make some files in the repository have \n-only line endings, you will need to:

  1. remove the \r from the work-tree copies;
  2. git add the resulting files; and
  3. git commit to make a new commit.

This new commit can then be distributed out to all other copies of this repository, and used in place of the existing (bad) commit that has \r\n (CRLF) endings for those files.

Note that the bad commit will continue to exist: that's what revision control is all about. We don't eliminate the bad ones, because everyone else has them too and we're going to remember that they are using a bad one.

Now, if no one else has a copy of this repository, or has the bad commit, then we're in a special-case situation. In this case we can drop the bad commit in favor of a new and improved commit. (Precisely how to do this in Git is a topic for another answer.) But in general, we just add a fix, and keep the original.

0
On

grep doesn't do the C-language escapes in its search patterns, it has a very restricted set. You're finding the literal rs in those lines.

Try grep $'\r' web/entrypoint.sh to see what's up, or grep --color=always '\r' web/entrypoint.sh.

Full disclosure: it's been so long since this tripped me up I forgot it too, I had to beat on this a bit before the bulb lit up.