How to replace reference to a file with its contents?

177 Views Asked by At

Supose that I have a directory containing among others two markdown files

a note.md
another one.md

with the a note.md containing

Here is a description of an idea.

![[another one.md]]

and the another one.md containing

Here is another idea related to it.

I am looking for a command in bash that would

  1. take a note.md,
  2. replace that the reference ![[another one.md]] in the a note.md with the actual contents of the another one.md, and
  3. return the result (so that I could pipe it to Pandoc).

The output in this example would contain

Here is a description of an idea.

Here is another idea related to it.

Why? Obsidian markdown note-taking app allows embedding file contents into markdown files using ![[]] as described above. However, when converting such files using Pandoc, the references are treated as text. So I am looking for a way to add the embedded content prior to Pandoc conversion.

3

There are 3 best solutions below

3
Philippe On

If you can use perl :

perl -i -pe 's/!\[\[(.*?)]]/`cat "$1"`/eg' "a note.md"

-i option changes "a note.md", so you may want to do a backup before running the command.

@jhnc provided a much safer version :

perl -0777pe 's/!\[\[([^]]+)]]/ -f $1 && `cat \Q$1` =~ s#^\s*(.*?)\s*$#$1#sr || $& /eg' 'a note.md' > 'result.md'
0
jhnc On

Code based on my comment, formatted for legibility, with some explanatory notes:

# -0777 - read entire file as single "line"
# -p - modify each "line" then print
# -e - commands to perform the modification

perl -0777pe '
    # s/regex/rhs/egsr
    #   /e - rhs is command to produce replacement, not a string
    #   /g - global search (replace all matches)
    #   /s - let . in regex match newline
    #   /r - return result; do not modify in-place

    # locate macros to expand; extract the filename
    s{!\[\[([^]]+\.md)]]}{

        # if file exists
        -f $1

        # then return its content with whitespace trimmed
        #     \Q protects from double-expansions like:
        #         ![[`cmd1` $var;cmd2;" wordsplit ".md]]
        ? `cat \Q$1` =~ s/^\s*(.*\S)\s*$/$1/sr

        # else return original string
        : $&

    }eg
' 'a note.md' >'result.md'

Instead of saving (... >file), output can be piped (... |cmd)

Obsidian's embed files syntax appears to accept more than just simple filenames; the code above doesn't. Requiring the .md extension may help limit interpolation to markdown files and exclude pdf, image blobs, etc.

cat is short but there are better/safer ways to slurp contents of a file. (For example, code above doesn't complain when a file exists but is unreadable.)


Obsidian Export claims it supports ![[note]] file includes

Perhaps also: https://github.com/bingryan/obsidian-markdown-export-plugin (See: Pull request #13)

There is a commercial plugin: Export Markdown with Embeds

0
JayCravens On

The awk solution.

awk '{
    while(match($0, /!\[\[([^]]*)\]\]/, capture)) {
        cmd = "cat \"" capture[1] "\""
        cmd | getline file_contents
        close(cmd)
        gsub(/!\[\[([^]]*)\]\]/, file_contents)
    }
    print
}' "a file.md"

My goal is to provide something that you can paste to terminal, with viewable results. Without the jq it outputs as your expecting. I was under the false impression pandoc wanted a single line.

It sounds like there's more files and you were going for structure or, at least, the capability for it. What came to mind upon reading the question was something like this:

#!/bin/bash

markdown_files=(*.md)

# Generate checklist dynamically
options=()
index=1
for file in "${markdown_files[@]}"; do
    options+=("$index" "$file" "off")
    ((index++))
done

md_checklist() {
    selected_files=($(dialog --checklist 'checklist' 15 40 10 "${options[@]}" 2>&1 >/dev/tty))

    # Handles cancellation
    if [ $? -eq 1 ]; then
        echo "User canceled."
        exit 1
    fi

    # Concatenate selected files
    concat_content=""
    for index in "${selected_files[@]}"; do
        file="${markdown_files[index - 1]}"
        concat_content+=$(cat "$file")
    done

    # Output concatenated content
    awk_input=$(echo -e "$concat_content")
}

md_checklist

awk '{
    while(match($0, /!\[\[([^]]*)\]\]/, capture)) {
        file_path = capture[1]
        gsub(/\\/, "", file_path)  # Remove any backslashes
        cmd = "cat \"" file_path "\""
        cmd | getline file_contents
        close(cmd)
        gsub(/!\[\[([^]]*)\]\]/, file_contents)
    }
    print
}' <<< "$awk_input"

exit 0

If you add the | jq -sR '.' after $awk_input, it's the easiest way I know of to output quoted and character escaped streams. My apologies. I had the idea in my head, incorrectly, pandoc wouldn't accept new lines.