Short version:
I am having a rather hard time understanding two rather complex regular expressions in the ActiveSupport::Inflector::camelize
method.
This is the definition of the camelize
method:
def camelize(term, uppercase_first_letter = true)
string = term.to_s
if uppercase_first_letter
string = string.sub(/^[a-z\d]*/) { inflections.acronyms[$&] || $&.capitalize }
else
string = string.sub(/^(?:#{inflections.acronym_regex}(?=\b|[A-Z_])|\w)/) { $&.downcase }
end
string.gsub(/(?:_|(\/))([a-z\d]*)/i) { "#{$1}#{inflections.acronyms[$2] || $2.capitalize}" }.gsub('/', '::')
end
I have some difficulty understanding:
string = string.sub(/^(?:#{inflections.acronym_regex}(?=\b|[A-Z_])|\w)/) { $&.downcase }
and:
string.gsub(/(?:_|(\/))([a-z\d]*)/i) { "#{$1}#{inflections.acronyms[$2] || $2.capitalize}" }.gsub('/', '::')
Please explain to me what they mean. Thank you.
Long version
This shows me trying to understand the regex and how I interpret them to mean. It would be very helpful if you could go through this and correct my mistakes.
For the first regex
string = string.sub(/^(?:#{inflections.acronym_regex}(?=\b|[A-Z_])|\w)/) { $&.downcase }
Based on what I am seeing, inflections.acronym_regex
is from the Inflections
class in the ActiveSupport::Inflector
module, and in the initialize
method of the Inflections
class,
def initialize
@plurals, @singulars, @uncountables, @humans, @acronyms, @acronym_regex = [], [], [], [], {}, /(?=a)b/
end
acronym_regex
is assigned /(?=a)b/
. From what I understand from http://www.ruby-doc.org/core-2.0.0/Regexp.html#class-Regexp-label-Anchors ,
(?=pat) - Positive lookahead assertion: ensures that the following characters match pat, but doesn't include those characters in the matched text
So /(?=a)b/
ensures that character a
is inside the text, but we dont include character a
inside the matched text, and what immediately follows character a
must be character b
. In other words, "abc"
would match this regex, but "bbc"
would not match this regex, and the matched text for "abc"
would be "b"
(instead of "ab"
).
So combining the value of inflections.acronym_regex
into this regex /^(?:#{inflections.acronym_regex}(?=\b|[A-Z_])|\w)/
, I do not know which of the following two regex results:
A. /^(?:/(?=a)b/(?=\b|[A-Z_])|\w)/
B. /^(?:(?=a)b(?=\b|[A-Z_])|\w)/
although I am thinking it is B. From what I understand, (?:
provides grouping without capturing, (?=
means positive lookahead assertion, \b
matches word boundaries when outside brackets and matches backspace when inside brackets. So in english terms, regex B, when matching against a text, will find a string that begins with an a
character, followed by a b
character, and one of (1. backspace [whatever that may mean] 2. any uppercase character or underscore 3. any english alphabetic character, digit, or underscore).
However, I find it strange that passing upper_case_first_letter = false
to the camelize
function should cause it to match a string starting with the characters ab
, given that that does not seem to be how the camelize
function behaves.
For the second regex
string.gsub(/(?:_|(\/))([a-z\d]*)/i) { "#{$1}#{inflections.acronyms[$2] || $2.capitalize}" }.gsub('/', '::')
The regex is:
/(?:_|(\/))([a-z\d]*)/i
I am guessing that this regex will match a substring that starts with either an _
or /
, followed by 0 or more (upper or lowercase english alpabetic characters or digit). Furthermore, for the first group (?:_|(\/))
, whether we match the _
or /
, the ([a-z\d]*)
capturing group will always be regarded as the second group. I do understand the part where the block tries to look up inflections.acronyms[$2]
and on failure, does $2.captitalize
.
Since (?:
means grouping without capturing, what is the value of $1
when we match _
? Is it still _
? And for the .gsub('/', '::')
portion, I am guessing that it gets applied for each match in the initial gsub
, instead of being applied to the overall string after the outer gsub
call is done?
Apologies for the really long post. Please point out my errors in understanding the 2 regular expressions, or explain them in a better way if you can do it.
Thank you.
?:
acts like a.
here and does match the string (ie. single character) but there is no grouping, therefore the match is in$&
.It's
nil
since there is no capturing. The value is in$2
It's applied to the overall result as
gsub
with block returns a string and thegsub('/', '::')
is outside of a block.