I'm trying to figure out how far I can push this command in selecting multiple files of interest. For example I'm using the following wildcard to pick up all files that are of interest across multiple directories, but I'd like to use regular expressions or the like to place limitations on say the length of the directory name.
lines = sc.textFile("/home/spark-1.4.0/A/B_2*/Output/CSV.csv")
But instead of *
, can I restrict the length of the directory name? For example with ^[0-9]{8}$
? Or any way of doing this without resorting to pre-filtering to build a list of valid directories.
Just to keep things straight what you want here is a simple glob no a regular expression. You can do something like this: