Running the same code but with two different datasets (inputs)

614 Views Asked by At

I have a code in JupyterLab that consists of several functions spread in several cells. The first function generates a dataset that is used in all other functions after it.

What I have to do is run the same code twice but with one of the functions modified. So it would look like this:

data_generating_function() # this function should only be ran once so it generates the same dataset for both trials 
function_1() # this is the function that is to be modified once, so there are two version of this function
function_2() # this function and all functions below it stay the same but should be ran twice
function_3()
function_4()
function_5()

So I would run data_generating_function() once and generate the dataset. Then I would run one version of function1() and all the functions below it, then I would run another version of function1() and all the other functions below it.

What would be a good way to implement this? I could obviously duplicate the code and just change some function names, I could also put it all into a single cell and create a for loop. However is there a better way that would ideally preserve multiple cells too?

Thank you

3

There are 3 best solutions below

0
On BEST ANSWER

Simply iterate over your two choices for the first function:

data_generating_function() 
for func1 in (function1a, function1b):
    func1()
    function_2()
    function_3()
    function_4()
    function_5()
0
On

You should try to avoid modifying or directly iterating over functions whenever possible. The best thing to do in this case would be to add a boolean parameter to function1 specifying which version of the function you want to run. It would look something like this:

def function1(isFirstTime):
  if isFirstTime:
    # do stuff the first time
    pass
  else:
    # do stuff the second time
    pass

You could then iterate over the functions:

data_generating_function()
for b in (True, False):
  function1(b)
  function2()
  function3()
  # ...
0
On

Apologies if I misunderstood the question, but could you not do the following:

Cell 1:

# define all functions

Cell 2:

dataset = data_generating_function()

Cell 3:

# Run version 1 of function 1 on dataset
result_1_1 = function_1_v1(dataset)
result_2_1 = function_2(result_1_1)
result_3_1 = function_3(result_2_1)
function_4(result_3_1)

Cell 4:

# Run version 2 of function 1 on dataset
result_1_2 = function_1_v2(dataset)
result_2_2 = function_2(result_1_2)
result_3_2 = function_3(result_2_2)
function_4(result_3_2)

This solution assumes that:

  • you define functions with return values
  • that passing around the results is not "expensive"

You can also persist the results in a file if the latter is not the case.

To reduce code duplication in function_1, you can add a parameter that switches between the two versions.