When does the kernel decide to have multiple vnode entries for a single file?

49 Views Asked by At

I found this code here and have several follow up questions.

#include <fcntl.h>

int main()
{
  // have kernel open two connection to file alphabet.txt which contains letters from a to z
  int fd1 = open("alphabet.txt",O_RDONLY);
  int fd2 = open("alphabet.txt",O_RDONLY);


  // read a char & write it to stdout alternately from connections fs1 & fd2
  while(1)
  {
    char c;
    if (read(fd1,&c,1) != 1) break;
    write(1,&c,1);
    if (read(fd2,&c,1) != 1) break;
    write(1,&c,1);
  }

  return 0;
}
  1. If I open the same file (under the same directory) independently in two processes, will it create two vnode entries? Or it depends on the OS?
  // PID 1
  int fd1 = open("alphabet.txt",O_RDONLY);
  // PID 2
  int fd2 = open("alphabet.txt",O_RDONLY);
  1. How does read() know where is the file? Does C first ask for the vnode entry of fd1 and return the address of it?

  2. Following 2., does read() create new vnode entry or is it using the same one?

2

There are 2 best solutions below

0
Kozydot On
  1. Opening the same file in two processes does not create two vnode entries. The vnode is a data structure in the kernel that represents a file. If you open the same file twice, either within the same process or in two different processes, you get two different file descriptors. These file descriptors are independent of each other, but they both point to the same vnode.
  2. The read() function knows the location of the file through the file descriptor. When you call open(), the operating system returns a file descriptor, which is an integer. This file descriptor is an index into an array in the kernel called the file descriptor table. This table contains the information about all open files, including the vnode of the file. So when read() is called with a file descriptor, it looks up the file descriptor table to find the vnode and thus the location of the file.
  3. The read() function does not create a new vnode entry. The vnode is created when the file is first opened, and subsequent calls to open() on the same file will return a new file descriptor pointing to the same vnode. The read() function only reads data from the file, it does not change the vnode or the file descriptor table.
0
Luis Colorado On

When you open(2) a file, the kernel creates a new file table entry, which includes a file pointer to know where the next read(2)/write(2) will go.

When you dup(2) a file, the second descriptor points to the same file table entry, so they share the same descriptor, and the reads/writes serialize, based on the same pointer.

This is not only a linux characteristici, it is incorporated to posix and belongs to the original UNIX design. When a process fork(2)s, the descriptors are dup(2)ed and so parent and child share the same descriptors and file pointers.

The per process file descriptor table has only file entry pointers, while the file entry has the file open flags, the file pointer and a reference to the inode (which is unique in the kernel per file)

This makes that two processes that, independently open the same file, to have different file pointers, while a parent process that passes an open file to a child proces, actually share de file pointer, because the file descriptor has been dup()ed, instead.

When I was first studying the system, I learned about a system call (that I don't remember it's name now) similar to dup(2) but actually reopening the file descriptor (and so, obtaining a different pointer) the idea of it is to have a completely independent opened file (as if I had opened it) but using as input, not the filename (that, because of redirections, could be unknown to the process) I don't remember the name of that system call (I thought it was something like int reopen(int fd, int open_flags); but I don't remember the name. If somebody knows about it, please drop me a comment on that.