How to split this text using JavaScript regular expression?

151 Views Asked by At

I would like to split this text. I am trying to do it with JavaScript regular expression.

(1) Really not. (2) Uh huh. (3) Behold Prince (4) are key in his natural element, cowering at the mercy of the women in his life. (5) See me perhaps you'd like to spout with my daughters and teach them some combination. (6) No doubt you are the best teacher, your majesty. (7) It is my daughter's that teach me in the languages of the modern world, for instance.

I would like to parse it to groups of fragments. I am looking for one of these results.

[
  [1, "Really not."],
  [2, "Uh huh."],
  [3, "Behold Prince"],
]


[
  {id: 1, text: "Really not."},
  {id: 2, text: "Uh huh."},
  {id: 3, text: "Behold Prince"},
]

I use this pattern.

/\(([0-9])\){1,3}(.+?)\(/g

Could you help me, please? What pattern should I use to split the text properly?

Thank you in advance!

3

There are 3 best solutions below

0
On

... an approach based on matchAll as well as on RegExp which uses named capture groups and a positive lookahead ... /\((?<id>\d+)\)\s*(?<text>.*?)\s*(?=$|\()/g ...

// see ... [https://regex101.com/r/r39BoJ/1]
const regX = (/\((?<id>\d+)\)\s*(?<text>.*?)\s*(?=$|\()/g);

const text = "(1) Really not. (2) Uh huh. (3) Behold Prince (4) are key in his natural element, cowering at the mercy of the women in his life. (5) See me perhaps you'd like to spout with my daughters and teach them some combination. (6) No doubt you are the best teacher, your majesty. (7) It is my daughter's that teach me in the languages of the modern world, for instance."

console.log([
  ...text.matchAll(regX)
  ].map(
    ({groups: { id, text }}) => ({ id: Number(id), text })
  )
);
.as-console-wrapper { min-height: 100%!important; top: 0; }

Note

The above approach does not cover the occurrence (allowed existence) of an opening paren/( within a text fragment. Thus, in order to always be on the save side, the OP should consider a split / reduce based approach ...

const text = "  (1) Really not. (2) Uh (huh). (3) Behold Prince (4) are key in his natural element, cowering at the mercy of the women in his life. (5) See me perhaps you'd like to spout with my daughters and teach them some combination. (6) No doubt you are the best teacher, your majesty. (7) It is my daughter's that teach me in the languages of the modern world, (for instance).  "

console.log(
  text
    .split(/\s*\((\d+)\)\s*/)
    .slice(1)
    .reduce((list, item, idx) => {
      if (idx % 2 === 0) {
        list.push({ id: Number(item) });
      } else {
        // list.at(-1).text = item;
        list[list.length - 1].text = item.trim();
      }
      return list;
    }, [])
);

// test / check ...
console.log(
  'text.split(/\s*\((\d+)\)\s*/) ...',
  text.split(/\s*\((\d+)\)\s*/)
);
.as-console-wrapper { min-height: 100%!important; top: 0; }

0
On

You can use regex and string.matchAll function in javascript to do what you want

const str = `(1) Really not. (2) Uh huh. (3) Behold Prince (4) are key in his natural element, cowering at the mercy of the women in his life. (5) See me perhaps you'd like to spout with my daughters and teach them some combination. (6) No doubt you are the best teacher, your majesty. (7) It is my daughter's that teach me in the languages of the modern world, for instance.`;

let array = [...str.matchAll(/\(([0-9]+)\)\s*(.*?)\s*(?=$|\()/g)].map(a=>[+a[1],a[2]])

console.log(array)

I updated my answer using The fourth bird's regex because it is alot cleaner than the regex I wrote.

2
On

Instead of matching the ( you can assert it or either the end of the string.

This part \){1,3} means repeating the closing parenthesis 1-3 times.

If you want to match 1-3 digits:

\(([0-9]+)\)\s*(.*?)\s*(?=$|\()
  • \( Match (
  • ([0-9]+) Capture 1+ digits in group 1 (Denoted by m[1] in the code)
  • \) Match )
  • \s* Match optional whitespace chars
  • (.*?) Capture as least as possible chars in group 2 (Denoted by m[2] in the code)
  • \s* Match optional whitespace chas
  • (?=$|\() Assert either the end of string or ( to the right

Regex demo

const regex = /\(([0-9]+)\)\s*(.*?)\s*(?=$|\()/g;
const str = `(1) Really not. (2) Uh huh. (3) Behold Prince (4) are key in his natural element, cowering at the mercy of the women in his life. (5) See me perhaps you'd like to spout with my daughters and teach them some combination. (6) No doubt you are the best teacher, your majesty. (7) It is my daughter's that teach me in the languages of the modern world, for instance.`;
console.log(Array.from(str.matchAll(regex), m => [m[1], m[2]]));