Python regular expression grouping?

125 Views Asked by At

I've been trying to look for the code to match a regular expression on an email for a project. These are the requirements:

Email must be in the form of acct@domain

  • acct is 1 or more characters, and composed of only upper or lowercase alphabetic characters, numeric characters, dashes, periods, underscores and hyphens
  • acct cannot start or end with an underscore, dash, period or hyphen. There must be at least two letters before and after every period.
  • domain is 5 or more characters, and composed of only upper or lowercase alphabetic characters, numeric characters, dashes, periods, and hyphens, underscores
  • domain must have at least one period, and cannot start or end with an underscore, dash, period or hyphen. There must be at least two letters before and after every period.
  • I have figured out the acct part with the code:

    if re.search("^[a-zA-z0-9]+[a-zA-z0-9-_]*$|^[a-zA-z0-9]+[a-zA-z0-9-_]+[\.]{1}[a-zA-z0-9]{2,}$", email):
        print "valid!"
    

    Also the domain:

    if re.search("^[a-zA-z0-9]+[a-zA-z0-9-_]+[\.]{1}[a-zA-z0-9]{2,}$", email):
        print "valid!"
    

    My problem is that i cannot figure out how to group them together and put an @ sign

    I have tried the following but it doesn't seem to work.

    if re.search("(^[a-zA-z0-9]+[a-zA-z0-9-_]*$|^[a-zA-z0-9]+[a-zA-z0-9-_]+[\.]{1}[a-zA-z0-9]{2,}$)@(^[a-zA-z0-9]+[a-zA-z0-9-_]+[\.]{1}[a-zA-z0-9]{2,}$)", email):<br>
        print "valid!
    

    "

    IT DOESN'T WORK! I can't get it to ever match. If you have suggestions that make the code less nooby please do let me know!

    3

    There are 3 best solutions below

    0
    On BEST ANSWER

    Use a non-capturing group to combine both the regexes.

    if re.search(r"^(?:[a-zA-Z0-9]+[a-zA-Z0-9-]*|[a-zA-Z0-9]+[a-zA-Z0-9-]+[.][a-zA-Z0-9]{2,})@[a-zA-Z0-9]+[a-zA-Z0-9-_]+[.][a-zA-Z0-9]{2,}$", email):
        print "valid"
    

    DEMO

    Regular Expression:

    ^                        the beginning of the string
    (?:                      group, but do not capture:
      [a-zA-Z0-9]+             any character of: 'a' to 'z', 'A' to
                               'Z', '0' to '9' (1 or more times)
      [a-zA-Z0-9-]*            any character of: 'a' to 'z', 'A' to
                               'Z', '0' to '9', '-' (0 or more times)
     |                        OR
      [a-zA-Z0-9]+             any character of: 'a' to 'z', 'A' to
                               'Z', '0' to '9' (1 or more times)
      [a-zA-Z0-9-]+            any character of: 'a' to 'z', 'A' to
                               'Z', '0' to '9', '-' (1 or more times)
      [.]                      any character of: '.'
      [a-zA-Z0-9]{2,}          any character of: 'a' to 'z', 'A' to
                               'Z', '0' to '9' (at least 2 times)
    )                        end of grouping
    @                        '@'
    [a-zA-Z0-9]+             any character of: 'a' to 'z', 'A' to 'Z',
                             '0' to '9' (1 or more times)
    [a-zA-Z0-9-_]+           any character of: 'a' to 'z', 'A' to 'Z',
                             '0' to '9', '-', '_' (1 or more times)
    [.]                      any character of: '.'
    [a-zA-Z0-9]{2,}          any character of: 'a' to 'z', 'A' to 'Z',
                             '0' to '9' (at least 2 times)
    $                        before an optional \n, and the end of the
                             string
    
    1
    On

    Get rid of the anchors from the two groups and apply it to the whole group as

    if re.search(r"^(?:[a-zA-Z0-9]+[a-zA-Z0-9-]*|[a-zA-Z0-9]+[a-zA-Z0-9-]+\.[a-zA-Z0-9]{2,})@[a-zA-Z0-9]+[-\w]+\.[a-zA-Z0-9]{2,}$", email):
        print "valid!"
    

    Changes made

    • The anchors ^ and $ are applied to the entire regex

    • [\.]{1} can be simplified as \. since it matches only one occurence of .

    • [a-zA-z0-9-_] can be simplified as [-\w]

    0
    On

    Below is the regex which can validate all your criteria and I hope it is also more efficient.

    ^(?![\W_])((?:([\w-]{2,})\.?){1,})(?<![\W_])@(?![\W_])(?=[\w.-]{5,})(?=.+\..+)((?:([\w-]{2,})\.?){1,})(?<![\W_])$
    

    And here is the regex demo.