I am having an issue in R when uploading my csv file. For some reason, when using the read.csv function, my null values were not showing up as null after saving the csv file to a data frame. Does anyone know why the read.csv function is not showing null values but the read_csv function preserves the null values. NOTE: I'm a beginner and have been working with large datasets and R for about 5 weeks.
Using read.csv:
#Importing my csv file
(jan_data <- read.csv('202301-divvy-tripdata.csv'))
# Check for NULL values in the entire data frame
missing_values_df <- is.na(jan_data)
# Print the logical matrix indicating NULL values
print(missing_values_df)
# Count the number of NULL values in each column
print(colSums(missing_values_df))
The output:
ride_id rideable_type started_at ended_at start_station_name
0 0 0 0 0
start_station_id end_station_name end_station_id start_lat start_lng
0 0 0 0 0
end_lat end_lng member_casual
127 127 0
Using readr::read_csv():
jan_data <- read_csv('202301-divvy-tripdata.csv')
#Check for NULL values in the entire data frame
missing_values_df <- is.na(jan_data)
#Print the logical matrix indicating NULL values
print(missing_values_df)
#Count the number of NULL values in each column
print(colSums(missing_values_df))```
Output:
ride_id rideable_type started_at ended_at start_station_name
0 0 0 0 26721
start_station_id end_station_name end_station_id start_lat start_lng
26721 27840 27840 0 0
end_lat end_lng member_casual
127 127 0
The difference in character columns when using the default
na.strings=argument inread.csvandna=argument inread_csvis in empty fields. Try the code below where the second field on the last row is empty.The reason for the difference is that by default
na.strings="NA"inread.csvso empty fields result in zero length strings whereas inread_csvthe default isna=c("", "NA")so empty fields result in anNA(not a NULL).