In my problem I have predefined state and action spaces but when the agent decides to take an action,
- this action can take place as desired
- action can take place partially
- action not applicable at all
So the outcome of the action at each step depends on some other parameter that the agent does not know. Is there a way to model this problem with Q-learning?
Thanks,
I preferred Q-learning instead of MDP because I do not have pre-defined transition matrix.