How should I distribute cores for inter and intra parallelization in TensorFlow?

10 Views Asked by At

I have been trying to run my neural network training on a cluster computer system. I am given a single node with 128 cores to use. I wanted to run 4 trainings in parallel, each using 32 cores at a time. I used to assume that TensorFlow takes care of parallelization by itself, as it used to do on my desktop, using all 8 cores. And I never had to see the settings 'inter_op_parallelism_threads' or 'intra_op_parallelism_threads' before. For runningthe code on cluster, I do specify number of cores-per-task, etc. in job-submission script indeed. However, when I heard about intra/inter, I wanted to make sure that I use all 32 cores per process I have on the cluster. And I found that, If I specify this it makes the program faster, nearly two times. However I am not sure how many threads I should give for intra and inter parallelization. I just did like

tf.config.threading.set_inter_op_parallelism_threads(32)
tf.config.threading.set_intra_op_parallelism_threads(32)

Is it okay to do like that? can it be shared? or are they exclusive? should I give 16 and 16 or say more for intra and less for inter or something like that? i.e I split all 32 cores available for some doing intra parallelization and the rest doing inter parallelization?

0

There are 0 best solutions below