I'm basically working with a massive spreadsheet of over 5 million rows, and made 2 column mutations of 2 character columns to create factors and levels of factors in the dataset - a column with a factor of 3 levels, and a column with a factor of 2 levels. After filtering sets of data from this source, I saved them in separate .csv files to continue working on them later. Now, when reading any of the .csv files back into RStudio, it treats all of those adjusted columns in all the tables as characters again. Do I have to re-do the factor work every time I open up RStudio?
I loaded all previous libraries before using the "read_csv" function except for library(geosphere) as it created a host of conflicts while trying to manage the data.
Libraries currently loaded:
library(data.table)
library(readr)
library(tidyverse)
library(lubridate)
library(dplyr)
It still keeps the dttm, dbl, and int columns, and it saved and read correctly other columns that were added with mutate, so why did my mutated factor columns get reverted? I've been trying out some different ways to read the csv into RStudio (stringsAsFactors = FALSE using read.csv), but I do not know the specific way to read in these .csv files to get back to where I left off without falling into a redundant work routine.
I've tried using different read.csv, read_csv, and data.table::fread import functions, but I feel like I'm shooting in the dark here and thought that just importing a .csv file would get me right back to where I was when I left it. I use glimpse(df) to check if it's being read correctly but it's never as I left it or it gets warped with other import functions. If there's some special function to use in conjunction with "stringsAsFactors = FALSE, UTF - 8" or if there's a special way to initially write the .csv file that I didn't do maybe that's my answer. I'm just trying NOT to have to run all my factor and levels of factors in my now separate data sets every time I open them.
Both Phil's and Onyambu make valid points, but I thought the question was how to properly read in CSV files that would be stacked and have some or all of the character valued columns converted to "stringsAsFactors" as you already appear to understand. The behavior of the
read.*functions was formerly to bring in factors by default, but recent versions of R have changed the default controlling parameter that governed that behavior to FALSE and character valued columns are now read just as factors. If you are considering stacking the results of reading multiple csv files and converting to factors, then by all means do the stacking first and only after that is successful should you convert the columns to factors. Otherwise you will experience the grief of trying to concatenate factor columns that have different labels and numbering systems.I admit that I don't know whether data.table's
freadchanges that default at the same time or later than the change in R'sread.*functions occurred. Should not be difficult to determine by experiment.