How are Short File Names generated in Windows?

4k Views Asked by At

I am currently using the following P/Invoke signature to get the short filename of a regular Windows file:

[DllImport("kernel32.dll", CharSet = CharSet.Auto)]
public static extern int GetShortPathName([MarshalAs(UnmanagedType.LPTStr)] string path,
                                          [MarshalAs(UnmanagedType.LPTStr)] StringBuilder shortPath,
                                          int shortPathLength);

Currently - it is working without any problems, but I noticed something rather peculiar:
I know that Windows uses the following short filename convention:

Cut the name to 6 characters (without extension)
Append the tilde (~)
Append an unsigned integer number which indicates the match index (starting with 1)
Append the original file extension

Thus, the file name C:\abcdefghijklmn.txt should be accessible under the short name C:\abcdefg~1.txt. (Which is working perfectly fine.)

Now the strange part: I recently performed a small search inside my music directory for specific audio files. This was the result:

.\Rammstein & Tatu - Moscow.mp3
.\Rammstein - Asche zu Asche.mp3
.\Rammstein - Der Meister.mp3
.\Rammstein - Du Hast.mp3
.\Rammstein - Eifersucht.mp3
.\Rammstein - Feuer Frei.mp3
.\Rammstein - Führe Mich.mp3
.\Rammstein - Haifisch.mp3
...

And the same search in short notation:

.\RA8E17~1.MP3
.\RA23A6~1.MP3
.\RAMMST~1.MP3
.\RA0CAE~1.MP3
.\RAMMST~2.MP3
.\RAMMST~3.MP3
.\RAMMST~4.MP3
.\RA6BAA~1.MP3
...

My question is: Why is windows generating such "random" prefixes before the tilde (like RA23A6 or RA0CAE)?

2

There are 2 best solutions below

1
On BEST ANSWER

Microsoft does not document this, but Wikipedia does:

8.3 filename:

Although there is no compulsory algorithm for creating the 8.3 name from an LFN, Windows uses the following convention:

1.If the LFN is 8.3 uppercase, no LFN will be stored on disk at all.

  • Example: TEXTFILE.TXT

2.If the LFN is 8.3 mixed case, the LFN will store the mixed-case name, while the 8.3 name will be an uppercased version of it.

  • Example: TextFile.Txt becomes TEXTFILE.TXT.

3.If the filename contains characters not allowed in an 8.3 name (including space which was disallowed by convention though not by the APIs) or either part is too long, the name is stripped of invalid characters such as spaces and extra periods. Other characters such as + are changed to the underscore _, and uppercased. The stripped name is then truncated to the first 6 letters of its basename, followed by a tilde, followed by a single digit, followed by a period ., followed by the first 3 characters of the extension.

  • Example: TextFile1.Mine.txt becomes TEXTFI~1.TXT (or TEXTFI~2.TXT, should TEXTFI~1.TXT already exist). ver +1.2.text becomes VER_12~1.TEX.

4.Beginning with Windows 2000, if at least 4 files or folders already exist with the same initial 6 characters in their short names, the stripped LFN is instead truncated to the first 2 letters of the basename (or 1 if the basename has only 1 letter), followed by 4 hexadecimal digits derived from an undocumented hash of the filename, followed by a tilde, followed by a single digit, followed by a period ., followed by the first 3 characters of the extension.

  • Example: TextFile.Mine.txt becomes TE021F~1.TXT.

As Joey mentioned, the undocumented hash of the filename has been reverse engineered.

0
On

That's because the very primitive scheme of using a counter and a prefix only works up to a certain number of files. With increasing numbers of files Windows switches to a shorter prefix and a hash. Someone actually reverse-engineered the hash along with a bit of explanation:

In case you aren’t aware of how 8.3 file names work, here’s a quick run-down.

  • All periods other than the one separating the filename from the extension are dropped - a.testing.file.bat turns into atestingfile.bat.
  • Certain special characters like + are turned into underscores, and others are dropped. The file name is upper-cased. 1+2+3 Hello World.exe turns into 1_2_3HELLOWORLD.EXE.
  • The file extension is truncated to 3 characters, and (if longer than 8 characters) the file name is truncated to 6 characters followed by ~1. SomeStuff.aspx turns into SOMEST~1.ASP.
  • If these would cause a collision, ~2 is used instead, followed by ~3 and ~4.
  • Instead of going to ~5, the file name is truncated down to 2 characters, with the replaced replaced by a hexadecimal checksum of the long filename - SomeStuff.aspx turns into SOBC84~1.ASP, where BC84 is the result of the (previously-)undocumented checksum function.