I have a basic question about ML.net. Without showing alot of code, I wonder some basics if it is possible to give recent data more Weight importance to it?
For example, if our datatable consists of data where each row has a date like below, I would like to give the recent data more weight in order to tell the training algorithm like "FastTree" or "FastForest" to give those datasamples more importance.
a) I know it would be possible to duplicate the rows to give more importance which I would like
to avoid if possible because it would mean longer training time.
b) Also I think "Window Slides" where training a subset of the data going forward in time, I would
also trying to avoid for this question.
Is there any method to give actual weights for each row in the table?
In below example table, I just show a weight number between 1-10 for illustrating purposes to understand what I mean:
Sample code:
var context = new MLContext(seed: 0);
//Load the data
vardata = context.Data.LoadFromTextFile<Input>("C:/datafile.csv", hasHeader: true, separatorChar: ',');
var trainTestData = context.Data.TrainTestSplit(data, testFraction: 0.2, seed: 0);
var trainData = trainTestData.TrainSet;
var testData = trainTestData.TestSet;
// Define the data preprocessing pipeline. Concatenate which features to use!
var pipeline = mlContext.Transforms.Concatenate("Features", "feature1", "feature2")
.Append(mlContext.Transforms.Conversion.ConvertType("Features", "Features", DataKind.Single))
.Append(mlContext.Transforms.NormalizeMinMax("Features"))
.Append(mlContext.Regression.Trainers.FastTree());
var model = pipeline.Fit(trainData);
public class Input
{
[LoadColumn(2)] public float feature1;
[LoadColumn(3)] public float feature2;
}
datafile.cv
Date,weight,feature1,feature2
06/30/2023,1,50,42
07/03/2023,2,52,45
07/05/2023,3,50,47
07/06/2023,4,54,43
07/07/2023,5,55,49
07/10/2023,6,57,44
07/11/2023,7,53,47
07/12/2023,8,52,45
07/13/2023,9,57,44
07/14/2023,10,53,42