I am working in Core2Duo,2.20GHz system which has Ubuntu-12.04 OS, 3.18.26 kernel.
I made some changes in Linux kernel source code.
To get all the processes involved (which gets scheduled and which is de-scheduled) in context switching , I made changes in kernel (kernel/sched/core.c and added following print statement inside context_switch function).
trace_printk(KERN_INFO
"**$$,context_switch,%d,%llu,%llu,%d,%llu,%llu\n",
(int)(prev->pid),
prev->se.vruntime,
prev->se.sum_exec_runtime,
(int)(next->pid),
next->se.vruntime,
next->se.sum_exec_runtime);`
I am running two different Process P1 ( having 100 threads - T0,T1,...,T99) and P2 in same CPU core. P2 has already run for long , so its vruntime is high.
Inside P1, first 100 threads are created , all threads except T0 are in blocking state (waiting for semaphore).
- T0 performs some task, then sets a timer with duration 2000 nanosec and voluntarily releases CPU. As no threads are available, P2 gets scheduled.
- After 2000 nanosec, timer expired and it awaken next thread T1 which preempt P2 immediately.
- T1 performs some task, then sets a timer with duration 2000 nanosec and voluntarily releases CPU. As no threads are available, P2 gets scheduled.
- After 2000 nanosec, timer expired and it awaken next thread T2 which preempt P2 immediately.
This repeats and Threads T0,T1,...T99 executes in round-robin fashion.
So, the execution sequence like below
T0-P2-T1-P2-T2-P2-T3-......T99-P2-T0-P2.....
My experimental results shows,
when I set timer interval 1800 nano sec, P2 process gets average 1450 nano sec.
when I set timer interval 2000 nano sec, P2 process gets average 1600 nano sec.
when I set timer interval 2500 nano sec, P2 process gets average 2050 nano sec.
when I set timer interval 3000 nano sec, P2 process gets average 2600 nano sec.
So, I conclude in my Core2Duo system, context switch time is around 350-450ns. Am I right to say that ?
Another observation is that, when I set timer interval 1600 nano sec or 1700 nano sec, P2 process don't gets scheduled between two threads although CPU is free - that means CPU becomes free for around 1200 -1300 nano sec although P2 is in ready queue, ready to run. Why is this happening ?
Here is my snippet of code :
// Program - P2
int main(int argc, char *argv[])
{
cpu_set_t my_set;
CPU_ZERO(&my_set);
CPU_SET(1, &my_set);
sched_setaffinity(0, sizeof(cpu_set_t), &my_set);
while(1){
// does some task
}
}
// Program - P1
// timer handler awakening next thread
static void handler(int sig, siginfo_t *si, void *uc)
{
thread_no++;
ret = sem_post(&sem[(thread_no)%NUM_THREADS]);
if (ret)
{
printf("Error in Sem Post\n");
}
}
void *threadA(void *data_)
{
int turn = (intptr_t)data_;
cpu_set_t my_set;
CPU_ZERO(&my_set);
CPU_SET(1, &my_set);
sched_setaffinity(0, sizeof(cpu_set_t), &my_set);
while(1)
{
ret = sem_wait(&sem[turn]);
if (ret)
{
printf("Error in Sem Post\n");
}
// does some work here
its.it_value.tv_sec = 0;
its.it_value.tv_nsec = DELAY1;
its.it_interval.tv_sec = 0;
its.it_interval.tv_nsec = 0;
ret = timer_settime(timerid, 0, &its, NULL);
if ( ret < 0 )
perror("timer_settime");
}
}
int main(int argc, char *argv[])
{
sa.sa_flags = SA_RESTART;
sa.sa_sigaction = handler;
sigemptyset(&sa.sa_mask);
err = sigaction(SIG, &sa, NULL);
if (0 != err) {
printf("sigaction failed\n"); }
sev.sigev_notify = SIGEV_SIGNAL;
sev.sigev_signo = SIG;
sev.sigev_value.sival_ptr = &timerid;
ret = timer_create(CLOCKID, &sev, &timerid);
if ( ret < 0 )
perror("timer_create");
sem_init(&sem[0], 0, 1);
for ( i = 1; i < NUM_THREADS; ++i)
{
sem_init(&sem[i], 0, 0);
}
data=0;
while(data < NUM_THREADS)
{
//create our threads
err = pthread_create(&tid[data], NULL, threadA, (void *)(intptr_t)data);
if(err != 0)
printf("\n can't create thread :[%s]", strerror(err));
data++;
}
}
kernel trace shows, CPU is free, sufficient time available for context switch - thread Ti to P2, still P2 not scheduled, later context switch happens between Ti and T(i+1). Why linux CFS is not selecting next process for scheduling in this case when timer duration is less than 1700 nano sec?