Good morning!
I am new in the field of AI and we are using C#, tensorflow with tensorflow.net and tensorflow.keras.
A lot of examples are for the old keras or old version of tensorflow or python. I had some problems to get it working for our example. Or they load predefined datasets. But the biggest challenge was to create the dataset that I can use in the fit method. I tried out a lot but if I get it compilable, I get exceptions that I don’t understand… The code at the bottom is the “best looking” one for me.
The goal is a kind of autofill functionality for our software. For this example, we want to extract information out of a id the customer sets. They use typical pattern for the id.
In the first step I want to extract a radius out of the id. Mostly they encode it as R20 or D40 for a radius of 20.
For testing I created (with randomization) typical 5000 ids. Some of them look like:
Drill000334
ToolR20-2023.4.2
D40Tapper
Customer1.352356-5435.R20
….
In our example I want to extract in the last 3 the radius r=20mm to fill out the radius textbox. But to keep it simple first, I just want to check if there is a D20 or R34 etc. in the list.
As a first step I tried out Tokenizer and TextVectorization, but I realized that these functions map each WORD to one integer. In our case we would have just 5000 samples with 1 word and each are different. So I created my own “textvectorization” by mapping: 0..9 -> 0..9 a..z -> 10-36 afterwards I filled it up to the maximum nr of chars in my samples (in a test I had 30). So they have the same length.
So the “mapping” the AI net should look like int[30] -> 0 or 1 (has a radius or not)
My samples are then: List<List> with 5000 samples, each of them 30 integers. Its called ids in my code. I also have a 5000 entries long labels list which is List and has 0 or 1.
So this is my code to get the dataset:
List<List<int>> ids = GetIDs(); //5000 entries, 30 ints per entry
List<int> results = GetResults(ids); //5000 times 0 or 1
var shape = new Shape(ids.Count, 30);
var inputNdArray = new NDArray(shape, TF_DataType.TF_INT32);
for (int i = 0; i < ids.Count; i++)
{
NDArray nd = new NDArray(ids[i].ToArray());
inputNdArray[i] = nd;
}
NDArray labels = np.array(results.ToArray());
IDatasetV2 dataset = tf.data.Dataset.from_tensor_slices(inputNdArray, labels);
This is my model (I don’t know if the layers are good like, that. It would be a first step to get no exception…)
LayersApi layers = new LayersApi();
Tensors inputs = Tensorflow.Binding.tf.keras.Input(shape, dtype:TF_DataType.TF_INT32);
var outputs = layers.Dense(64, activation: Tensorflow.Binding.tf.keras.activations.Relu).Apply(inputs);
outputs = layers.Dense(1,activation: Tensorflow.Binding.tf.keras.activations.Relu).Apply(outputs);
Model model = new Sequential(new SequentialArgs());
model = Tensorflow.Binding.tf.keras.Model(inputs, outputs, name: "test1") as Model;
model.compile(optimizer: "adam", loss: "mse", metrics: new[] { "accuracy" });
model.fit(dataset, epochs: 50);
With this version i get: Tensorflow.InvalidArgumentError: "cannot compute MatMul as input #1(zero-based) was expected to be a int32 tensor but is a float tensor" But I really don’t know where is the float tensor. I marked everything with TF-int32.
What is my problem here?