Does sleep/nanosleep work by utilizing a busy wait scheme?

6.7k Views Asked by At

I am wondering how is sleep/nanosleep internally implemented? Consider this code:

{ // on a thread other than main() thread
  while(1)
  {
    //do something
    sleep(1);
  }
}

would the CPU be doing constant context switching to check if sleep of 1 sec is done (i.e. an internal busy wait).

I doubt it works this way, too much inefficiency. But then how does it work?

Same question applies to nanosleep.

Note: If this is implementation/OS specific, then how can I possibly implement a more efficient scheme that doesn't lead to a constant context switching?

4

There are 4 best solutions below

4
On BEST ANSWER

Exact implementation is not guaranteed here but you can expect some properties.

Usually sleep (3) is quite inaccurate and as Linux 'man sleep 3' states could be even implemented using SIGALM (signals). So it is definitely not about performance. It is definitely not about spin locks too so cannot be CPU intensive.

nanosleep is quite different animal which could be implemented even using spinlocks. Which is more important, at least in Linux nanosleep man is in section 2 which stands it is system call so at least it should include switch to kernel mode. Do you really need its high resolution?

UPDATE

As I see your comment I do recommend select() usage as man select 3 states:

   #include <stdio.h>
   #include <stdlib.h>
   #include <sys/time.h>
   #include <sys/types.h>
   #include <unistd.h>

   int
   main(void)
   {
       fd_set rfds;
       struct timeval tv;
       int retval;

       /* Watch stdin (fd 0) to see when it has input. */
       FD_ZERO(&rfds);
       FD_SET(0, &rfds);

       /* Wait up to five seconds. */
       tv.tv_sec = 5;
       tv.tv_usec = 0;

       retval = select(1, &rfds, NULL, NULL, &tv);
       /* Don't rely on the value of tv now! */

       if (retval == -1)
           perror("select()");
       else if (retval)
           printf("Data is available now.\n");
           /* FD_ISSET(0, &rfds) will be true. */
       else
           printf("No data within five seconds.\n");

       exit(EXIT_SUCCESS);
   }

It is proven mechanics if you need to sleep in thread for some event and this event could be linked to file descriptor.

0
On

The POSIX specification of sleepand nanosleep say (emphasis mine)

The sleep() function shall cause the calling thread to be suspended from execution until either the number of realtime seconds specified by the argument seconds has elapsed or a signal is delivered to the calling thread and its action is to invoke a signal-catching function or to terminate the process. The suspension time may be longer than requested due to the scheduling of other activity by the system.

(Source: http://pubs.opengroup.org/onlinepubs/9699919799/functions/sleep.html.)

and

The nanosleep() function shall cause the current thread to be suspended from execution until either the time interval specified by the rqtp argument has elapsed or a signal is delivered to the calling thread, and its action is to invoke a signal-catching function or to terminate the process. The suspension time may be longer than requested because the argument value is rounded up to an integer multiple of the sleep resolution or because of the scheduling of other activity by the system. But, except for the case of being interrupted by a signal, the suspension time shall not be less than the time specified by rqtp, as measured by the system clock CLOCK_REALTIME.

(Source: http://pubs.opengroup.org/onlinepubs/9699919799/functions/nanosleep.html.)

I read that to say that a POSIX-compliant system cannot use a busy loop for sleep or nanosleep. The calling thread needs to be suspended from execution.

1
On

The typical way to implement sleep() and nanosleep() is to convert the argument into whatever scale the OS's scheduler uses (while rounding up) and add the current time to it to form an "absolute wake up time"; then tell the scheduler not to give the thread CPU time until after that "absolute wake up time" has been reached. No busy waiting is involved.

Note that whatever scale the OS's scheduler uses typically depends on what hardware is available and/or being used for time keeping. It can be smaller than a nanosecond (e.g. local APIC on 80x86 being used in "TSC deadline mode") or as large as 100 ms.

Also note that the OS guarantees that the delay won't be less than what you ask for; but there's typically no guarantee that it won't be longer and in some cases (e.g. low priority thread on a heavily loaded system) the delay can be much much larger than requested. For example, if you ask to sleep for 123 nanoseconds then you might sleep for 2 ms before the scheduler decides it can give you CPU time, and then it might be another 500 ms before the scheduler actually does give you CPU time (e.g. because other threads are using the CPU).

Some OSs may try to reduce this "slept much longer than requested" problem, and some OSs (e.g. designed for hard-real time) may provide some sort of guarantee (with restrictions - e.g. subject to thread priority) for the minimum time between delay expiry and getting CPU back. To do this, the OS/kernel would convert the argument into whatever scale the OS's scheduler uses (while rounding down and not rounding up) and may subtract a tiny amount "just in case"; so that the scheduler wakes the thread up just before the requested delay expires (and not after); and then when the thread is given CPU time (after the cost of the context switch to the thread, and possibly after pre-fetching various cache lines the thread is guaranteed to use) the kernel would busy wait briefly until the delay has actually expired. This allows the kernel to pass control back to the thread extremely close to delay expiry.

For example, if you ask to sleep for 123 nanoseconds, then scheduler might not give you CPU time for 100 nanoseconds, then it might spend 10 nanoseconds switching to your thread, then it might busy wait for the remaining 13 nanoseconds. Even in this case (where busy waiting is done) it normally won't busy wait for the full duration of the delay. However, if the delay is extremely short the kernel would only do the final busy waiting.

Finally, there is a special case that may be worth mentioning. On POSIX systems sleep(0); is typically abused as a yield(). I'm not too sure how legitimate this practice is - it's impossible for a scheduler to support something like yield() unless that scheduler is willing to waste CPU time doing unimportant work while more important work waits.

0
On

"I am wondering how is sleep/nanosleep internally implemented?"

There's not the one implementation for it, but each OS and POSIX compliant implementation of sleep() and nanosleep() are free in how they're actually implementing this feature.

So asking about how it's actually done is pretty useless, without more context of a particular OS/POSIX library implementation.