Suppose i have the following markdown
# Comman mark is **just great**
You can try CommonMark here. This dingus is powered by
[commonmark.js](https://github.com/commonmark/commonmark.js), the
JavaScript reference implementation.
## Try CommonMark
1. item one
2. item two
- sublist
- sublist
I want to get the first h1
tag and first p
tag for making them title and description of the post receptively.
I can not use browser API, because it is running on the Node server
To get the first h1
tag, I used commonmark.js
.
document.getElementById('btn').addEventListener('click', function (e) {
let parsed = reader.parse(md);
let result = writer.render(parsed);
let walker = parsed.walker();
let event, node;
while ((event = walker.next())) {
node = event.node;
// h1 tags
if (event.entering && node.type === 'heading' && node.level == 1) {
console.log('h1', '--', node?.firstChild?.literal);
}
// p tags
if (event.entering && node.type === 'text') {
console.log('p', '--', node?.literal);
}
}
});
For the above markdown the output I got on the console.
You can see that, the first h1
returned is Common mark is, but it should be actually # Comman mark is **just great**
Same thing for p tag, how can I solve this problem?
See live - https://stackblitz.com/edit/js-vegggl?file=index.js
Since you are already in the Node.js world, I suggest you check out the unified collective's remark and rehype processors. These processors support parsing markdown and HTML respectively to/from syntax trees. All such processors in the unified collective support custom and third-party 'plugins' that enable you to inspect and manipulate the intermediary syntax trees. Powerful stuff. Bit of a learning curve though. However, at some point, RegEx breaks down with non-regular languages like markdown. Syntax trees can save the day.