Regex: separate class days and times into individual lines

61 Views Asked by At

I have a ~2000 line data set of class dates and times, example:

WRF 9a-5p 
 WR 9-10:15a 
 WR 7:15-8:45p 
 WF 9a-12:10p 
 WF 9:30-11:30a 
 WF 8-10:45a 
 WF 12-2:45p
 Su 9a-6p 
 WF 10-11:50a 
 W 9a-12p 
 W 9a-12p 
 W 9a-12:20p 
 W 9a-12:20p 
 W 9a-12:20p 
 W 9:30a-4:45p 

Where the line says WRF 9a-5p, I'd like it on three lines:

W 9a-5p 
R 9a-5p 
F 9a-5p

and so on for the other lines. As you can imagine there are quite a few different combinations of days throughout the dataset.

I'm using Notepad++.

Haven't the faintest idea how to get started with this!

1

There are 1 best solutions below

7
Cary Swoveland On

Here are two ways that can be done in Ruby. Ruby's and Notepad++'s regex engines are similar (in particular both support lookarounds), so I would think both approaches--but particularly the second--could be translated to Notepad++ without too much trouble.

I have assumed each string begins with zero or more whitespace followed by substring comprised of between one and three of the identifiers in the following array.

["Su", "M", "Tu", "W", "Th", "F", "S", "R"]

The approaches I have suggested do not depend on this particular array. They work so long as each identifier begins with a capital letter and is followed by zero or more lowercase letters.

1. Capture the identifiers when there is more than one, any whitespace that precedes the identifiers and the characters that follow the identifiers, then construct the desired strings from the contents of the capture groups.

def doit(str)
  str.gsub(/^(\s*)((?:Su|M|Tu|W|Th|F|S|R){2,3})(.*)/m) do
    head = $1
    tail = $3.chomp
    $2.split(/(?<=[A-Za-z])(?=[A-Z])/)
      .map { |s| head + s + tail }.join("\n")
  end
end

Capture group 2 holds the substring that is comprised of 2 or 3 identifiers.

Capture group 1 holds zero or more whitespaces that precede the string held in Capture group 2.

Capture group 3 holds the portion of the string that follows the string held in Capture group 2.

The regular expression

/(?<=[A-Za-z])(?=[A-Z])/

matches zero-width strings that are preceded by a letter (asserted by the positive lookbehind (?<=[A-Za-z])) and followed by an uppercase letter (asserted by the positive lookahead (?=[A-Z])).

Ruby code

2. Perform two successive substitutions with the same regular expression and the same expression for the substitution string.

I am using the following regular expression.

rgx = /^(\s*(?:Su|M|Tu|W|Th|F|S|R))((?:Su|M|Tu|W|Th|F|S|R){1,2})(.*)/
str = "WRF 9a-5p"
str1 = str.gsub(rgx, '\1\3' + "\n" + '\2\3')
  #=> "W 9a-5p\nRF 9a-5p"
str2 = str1.gsub(rgx, '\1\3' + "\n" + '\2\3')
  #=> "W 9a-5p\nR 9a-5p\nF 9a-5p"
puts str2
W 9a-5p
R 9a-5p
F 9a-5p

First substitution demo

Second substitution demo