I am in very much confusion.
I have two datasets. One dataset is considered a source domain (Dataset A) and other dataset is considered a target domain (Dataset B).
First, I standardized each column of Dataset A using mean and standard deviation value of respective columns. I have 600 points in the dataset A. Then I splitted my dataset into Training, Validation and Testing dataset. I trained CNN model and then I tested model using testing dataset. I gives pretty accurate results (prediction).
I have calculated mean
and standard deviation
of each column available in Dataset A
as follow,
thicknessMean = np.mean(thick_SD)
MaxForceMean = np.mean(maxF_SD)
MeanForceMean = np.mean(meanF_SD)
thicknessstd = np.std(thick_SD)
MaxForcestd = np.std(maxF_SD)
MeanForcestd = np.std(meanF_SD)
thick_SD_scaled = (thick_SD - thicknessMean)/thicknessstd
maxF_SD_scaled = (maxF_SD - MaxForceMean)/MaxForcestd
meanF_SD_scaled = (meanF_SD - MeanForceMean)/MeanForcestd
Now, I want to make prediction from the model by feeding the Dataset B. Therefore, I saved the already trained model (with .pth file). Then I standardize the dataset B, but this time I have transformed the dataset using 'mean' and 'standard deviation' of the dataset A. After doing this, I evaluate the already trained model using dataset B. But it is giving a worse prediction.
thick_TD_scaled = (thick_TD - thicknessMean)/thicknessstd
maxF_TD_scaled = (maxF_TD - MaxForceMean)/MaxForcestd
meanF_TD_scaled = (meanF_TD - MeanForceMean)/MeanForcestd
You can see, to scale my dataset B, I have used mean value for eg.thicknessMean
and standard deviation for eg. thicknessstd
value of the Dataset A .
My question is:
(1) where I am doing wrong? What should I do to make my prediction near to accurate?
(2) When I check prediction's accuracy on two different dataset, should I standardize the second dataset at a same scaling as in the first dataset?