I am trying to create a fact table and dimension tables from a taxi trips records where I downloaded from https://www.nyc.gov/site/tlc/about/tlc-trip-record-data.page. This is the data dictionary describes the meaning of the fields.
I am not quite sure which columns should store the fact table and which should have dimension tables.
For example, the dataset I am exploring is yellow_tripdata_2023-01.parquet. There are 3,066,766 records. For column mta_tax (description: $0.50 MTA tax that is automatically triggered based on the metered rate in use.), there are only 10 different values
, which are saved in float.
So, I think it would be a good idea to have a dimension table to save the field and use foreign key to represent the value is fact table, which can reduce the disk space.
The count for each value: 
However, from the description of mta_tax, it should be saved in fact table.
What should be a better way to do it?
What questions does the business need to answer that involves the mta_tax? If not, omit the column. Otherwise would it be more convenient to model it as a dimension or a fact table attribute, or perhaps use it in the calculation of some other fact table attribute?
Dimensional modeling is unlike normalization in that it's not an abstract and mathematical exercise. You need to know more than the functional dependencies to create a useful dimensional model. You need to know what information you are trying to extract from the data, and the easiest way to think about that is in terms of the questions that users are trying to answer.