I have a pandas data frame (with ~57 million rows of floats) that I want to undergo two transformations.
These are the two functions to apply the transformations:
def apply_feature_aggregation(df, weights, scale, shift, cores): #This runs without a problem
t1 = time.time()
pandarallel.initialize(nb_workers=cores, progress_bar=False)
new_df = df[['id']].copy()
new_df['weight'] = df.parallel_apply(calculate_weight, axis=1, weights=weights, scale=scale, shift=shift)
t2 = time.time()
print(f'init weights {t2-t1}')
return new_df
def apply_weight_scaling(df, cores): #The apply section of it only works if I click run
pandarallel.initialize(nb_workers=cores, progress_bar=False) # initialize(36) or initialize(os.cpu_count()-1)
t2 = time.time()
new_df, second_min, current_max = interval_mapping_preprocessing(df, 'weight')
t3 = time.time()
print(f'initial mapping preprocessing finished {t3-t2}')
new_df['weight'] = new_df.parallel_apply(apply_linear_transformation, axis=1, second_min=second_min, current_max= current_max) #This is not run
t4 = time.time()
print(f'I calculated second weights {t4-t3}') #This is not printed
The problem is whenever I'm running my code on PyCharm by clicking execute, the two transformations are applied successfully. But whenever I try to run with nohup, although on top command I can see parallel workers twice, but the second run never ends.
My question is how to run two subsequent transformations? I even tried to have the two transformations on the same wrapper function, but I encountered the same problem.
This is the output I get in nohup:
INFO: Pandarallel will run on 36 workers.
INFO: Pandarallel will use Memory file system to transfer data between the main process and workers.
init weights 135.39842891693115
INFO: Pandarallel will run on 36 workers.
INFO: Pandarallel will use Memory file system to transfer data between the main process and workers.
initial mapping preprocessing finished 2.7095065116882324
This is the output when I run it with PyCharm:
INFO: Pandarallel will run on 36 workers.
INFO: Pandarallel will use Memory file system to transfer data between the main process and workers.
init weights 143.19737672805786
INFO: Pandarallel will run on 36 workers.
INFO: Pandarallel will use Memory file system to transfer data between the main process and workers.
initial mapping preprocessing finished 2.6010115146636963
I calculated second weights 117.14078521728516
Once the parallel executors of the second function finish with the nohup case, there is only one memory-intensive job and nothing else happens
Thanks.