Regular Expression to Extract Text Bounded by '/'

Question

Regular Expression to Extract Text Bounded by '/'

222 Views Asked by Magic Bullet Dave At 18 February 2017 at 08:53

I need to a regular expression to extract names from a GEDCOM file. The format is:

Fred Joseph /Smith/

Where the text bounded by the / is the surname and the Fred Joseph are the forenames. The complication is that the surname could be at any place in the text or may not be there at all. I need something that will extract the surname and capture everything else as the forenames.

This is as far as I have got and I have tried making groups optional with the ? qualifier but to no avail:

As you can see it has several problems: If the surname is missing nothing gets captured, the forename(s) sometimes have leading and trailing spaces, and I have 3 capture groups when I'd really like 2. Even better would be if the capture group for the surname didn't include the '/' characters.

Any help would be much appreciated.

Original Q&A

There are 5 best solutions below

Sandeep Bhaskar On 18 February 2017 at 09:18

For your requirements

([A-z a-z /])+\w*

Sample

grail On 18 February 2017 at 09:29

I am not sure I follow what language is being used to extract the data, but based on what you have so far, you simply need to add '?':

(.*)(\/?.*\/?)(.*)

Not that this does not give you groupings for EACH name as some solutions will have multiple names in a single group

Edit:

Extending on Niitaku solution and looking at having each individual name in its own group, you could use:

^\s*(?:\/?([a-z]+)\/?)\s*(?:\/?([a-z]+)\/?)\s*(?:\/?([a-z]+)\/?)\s*$

As explained though, if using a language like ruby it would simply be:

ruby -pe '$_ = $_.scan(/\w+/)' file

user3507211 On 18 February 2017 at 09:57

Hope this helps (.\*?)\\/(.\*?)\\/(.\*)

Theo On 18 February 2017 at 11:16

Try this: ^([^/]*)(/[^/]+/)?([^/]*)$

This matches the following:

^ start of string (or with multiline modifier start of line)
([^/\n]*) anything other than / or new line zero or more times - this is captured as group 1
- (/[^/\n]+/)? a single / followed by one or more non / or new line characters, then a single '/' character - this is captured as group 2, and is optional
- ([^/\n]*) anything other than / or new line zero or more times - this is captured as group 3
- $ end of string (or with multiline modifier end of line)

You can see in action with your example text here: https://regex101.com/r/9kmKpy/1

To not capture the slashes you can add a non capturing group by adding ?: to the second set of brackets, and then adding another pair between the slashes: ^([^\/\n]*)(?:\/([^\/\n]+)\/)?([^\/\n]*)$

https://regex101.com/r/9kmKpy/2

**Niitaku** · Accepted Answer · 2017-02-18T10:01:11.020000

For your last line, I'm not sure there is a way to join the group 1 with group 3 into a single group.

Here is my proposed solution. It doesn't capture spaces around forenames.

^(?:\h*([a-z\h]+\b)\h*)?(?:\/([a-z\h]+)\/)?(?:\h*([a-z\h]+\b)\h*)?$

To correctly match the names, care to use the insensitive flag, and if you test all lines at once, use multiline flag.

See the demo

Explanation

^ start of the line
(?:\h*([a-z\h]+\b)\h*)? first non-capturing group that matches 0 or 1 time:
- \h* 0 or more horizontal spaces
- ([a-z\h]+\b) captures in a group letters and spaces, but stops at the end of the last word
- \h* matches the possible remaining spaces without capturing
(?:\/([a-z\h]+)\/)? second non-capturing group that matches 0 or 1 time a name in a capturing group surrounded by slashes
(?:\h*([a-z\h]+\b)\h*)? third non-capturing group doing the same as first one, capturing the names in a third group.
$ end of the line

Regular Expression to Extract Text Bounded by '/'

There are 5 best solutions below

Explanation

Related Questions in REGEX

Related Questions in GEDCOM

Trending Questions

Popular # Hahtags

Popular Questions