Load Quarto html map data from json for Leaflet map generated in R

406 Views Asked by At

I have created a Quarto blog post which contains many leaflet maps, generated in R. As the data for each map is embedded within the html file, the file itself is very large. This is causing problems on the server which hosts the file.

I want to make the html file smaller. The embed-resources: false YAML option in Quarto means that the libraries (e.g. leaflet.js) are stored in separate files. The helps but the data is still stored within the html (once per map). I am trying to load the data itself from a separate file. Here is a minimal example of a qmd file:

---
format:
  html:
    embed-resources: false  
---

```{r}
leaflet::leaflet(elementId = "map1") |>
    leaflet::addTiles() |>
    leaflet::addMarkers(lng = 174.768, lat = -36.852, popup = "The birthplace of R")
```

When I quarto render this it creates an html file which shows a map when opened in a browser. The file includes the data for the map in the following <div>:

<div class="leaflet html-widget html-fill-item-overflow-hidden html-fill-item" id="map1" style="width:100%;height:464px;"></div>
<script type="application/json" data-for="map1">{**json**}</script>
</div>

Where I have written {**json**} there is one long line of json with the map coordinates, CRS and various options.

It seemed to me that I might be able to copy the json content to a file and then change the <script> tag to load the data from that file:

<script src="page_data/map1.json" type="application/json" data-for="map1"></script>

However, I know now this is not possible. Instead, I have tried adding a script to inject the json into the innerHTML of the required element (using Live Server for testing):

<script>
  fetch('./page_data/map1.json')
    .then((response) => response.json())
    .then((json) => (
      document.querySelectorAll('[data-for="map1"]')[0].innerHTML = 
        JSON.stringify(json).replaceAll("\/", "/"))
    );
</script>

This works in that it loads the exact json content into the tag as when it was hard-coded into the html file (the replaceAll() is required to make it identical as a couple of escape characters are added before backslashes).

However, this alone does not display the map and the console throws this error:

Uncaught SyntaxError: Unexpected end of JSON input
    at JSON.parse (<anonymous>)
    at htmlwidgets.js:646:27

The relevant line of htmlwidgets.js is:

var scriptData = document.querySelector("script[data-for='" + el.id + "'][type='application/json']");
var data = JSON.parse(scriptData.textContent || scriptData.text);

i.e. at the point the loaded script is looking for the data, the fetch() request has not yet updated the innerHTML of the <script data-for="map1"></script> tag so there is nothing to parse.

With this in mind, as well as fetch() request I moved the htmlwidgets.js and other <script> tags to try to delay their loading. Currently there are about 10 lines of tags in the <head> like:

<script src="page_files/libs/htmlwidgets-1.6.2/htmlwidgets.js"></script>
<script src="page_files/libs/jquery-1.12.4/jquery.min.js"></script>

If I move these from the <head> to between the </body> and </html> tags, the map renders around half the time. So it looks like there's some sort of race between them loading and the script which injects the json into the <script data-for="map1"></script> tag.

To ensure the loading happened in the right order, I removed the scripts from the html <head> and used this async loadScript() function to dynamically load the scripts to ensure that they are only loaded after the data loads:

fetch('./map1.json')
   .then((response) => response.json())
   .then((json) => (
document.querySelectorAll('[data-for="map1"]')[0].innerHTML = JSON.stringify(json).replaceAll("\/", "/"))) 
  .then(() =>    
loadScript("page_files/libs/htmlwidgets-1.6.2/htmlwidgets.js")
      ).then(() =>        
loadScript("page_files/libs/jquery-1.12.4/jquery.min.js"));
/*etc - for all scripts on the page in the order they appear in the html*/

The scripts now only load after the json is injected into the <script data-for="map1"></script> tag. However, it does not render the map at all and the html widget is not registered (i.e. document.getElementById("map1").htmlwidget_data_init_result in the console returns undefined).

Am I missing something about the order that events are supposed to happen on a static Quarto-generated web page with htmlwidgets?

Is there a way for a Quarto html file to load the data for a leaflet map generated in R from a json file and render the map?

1

There are 1 best solutions below

0
On

Inevitably, two days after placing the bounty I have found a solution. This approach reduces my real html file from 22.5mb to 165kb. The steps are:

  1. Remove all the <script src = "*.js"> tags from the <head> of the html, storing the URLs to be loaded after the data in step 4 (to avoid the issue on load where there is no data to parse).
  2. Find the JS in <script> tags hardcoded into the body of the html and move the code to separate *.js files to be loaded in step 4 (to prevent errors caused by loading them before the scripts which had been loaded in the <head>).
  3. Remove the hardcoded map (and any other htmlwidgets) json data from within the <script type="application/json"> tags and save to separate json files in a ./page_files/data/ folder.
  4. Insert a JS script into the <head> which uses Promises with chained .then() statements to do the following (in this order):
    • Inject the relevant data from each json file back into the html.
    • Dynamically load the scripts that were in the <head>.
    • Dynamically load the scripts that were in the <body>.
    • Render all elements with HTMLWidgets.staticRender().

I have written a Python script to automate this for any html file. This can be added as a post-render option to a Quarto project YAML, e.g.:

project:
  type: website
  post-render: remove_hardcoded_data.py

This will convert all html files in the folder. Alternatively, if using Quarto outside a project it can be placed in the folder with one or more html files and run from the terminal with ./remove_hardcoded_data.py.

Python script

This requires Beautiful Soup 4. It will create a minimal html file for all html files in the folder and append "_min" to the output (e.g. if the input is "./page.html", the output will be "./page_min.html"). Quarto already creates a "./page_files/" folder that needs to be uploaded to the server, which is where the script copies the json data.

Files can be excluded from this script by adding them to the files_to_exclude list in the make_all_html_min() function.

#!/usr/bin/env python3
# coding: utf-8

from bs4 import BeautifulSoup
from pathlib import Path
import re

def load_page(page_path):
    with open(page_path, "r", encoding="utf-8") as f:
        soup = BeautifulSoup(f, "html.parser")
    return soup

# 1. Remove all script tags but keep their src
def get_script_links(soup):
    script_links = []
    for script in soup.findAll("script"):
        if script.has_attr("src"):
            script_links.append(script.attrs["src"])
            script.decompose()
    return script_links

# 2. Move quarto-html-after-body and any other scripts to files
#    so they're not loaded before htmlwidgets etc. are loaded
def get_body_scripts(soup, page_name):
    body_scripts = []
    for i, script in enumerate(soup.html.body.findAll("script")):
        # don't copy the data scripts here
        if not script.has_attr("data-for"):
            if script.has_attr("id"):
                out_file = f"./{page_name}_files/libs/{script.attrs['id']}.js"
            else:
                out_file = f"./{page_name}_files/libs/body_script_{i}.js"
            with open(out_file, "w", encoding= "utf-8") as f:
                f.write(script.get_text())             
            body_scripts.append(out_file)
            script.decompose()
    return body_scripts

# 3. Remove the hardcoded json data and write to file
def remove_json_data(json_tag, page_name):
    Path(f"./{page_name}_files/data/").mkdir(exist_ok=True)
    el_id = json_tag.attrs['data-for']
    with open(f"./{page_name}_files/data/{el_id}.json", "w", encoding="utf-8") as f:
        f.write(json_tag.get_text()) 
    json_tag.string.replace_with("")
    return el_id

# 4. Create the javascript to load the data and scripts
def create_load_data_js(soup, page_name):
  script_links = get_script_links(soup)
  body_scripts = get_body_scripts(soup, page_name)
  json_tags = [script for script in soup.findAll("script") if script.has_attr("data-for")]
  el_ids = [remove_json_data(json_tag, page_name) for json_tag in json_tags]   

  load_function = """
    const loadScript = (file_url, async = true, type = "text/javascript", appendToHead = true) => {
        return new Promise((resolve, reject) => {
            try {
                const scriptEle = document.createElement("script");
                scriptEle.type = type;
                scriptEle.async = async;
                scriptEle.src = file_url;
                scriptEle.addEventListener("load", (ev) => {
                    resolve({ status: true });
                });
                scriptEle.addEventListener("error", (ev) => {
                    reject({
                        status: false,
                        message: `Failed to load the script ${file_url}`
                    });
                });
                appendToHead ? document.head.appendChild(scriptEle) : document.body.appendChild(scriptEle);
            } catch (error) {
                reject(error);
            }
        });
    };
  """

  load_data_first_element = f"""
  fetch("./{page_name}_files/data/{el_ids[0]}.json")
    .then((response) => response.json())
    .then(
      (json) =>
        (document.querySelectorAll('[data-for="{el_ids[0]}"]')[0].innerHTML =
          JSON.stringify(json).replaceAll("/", "/"))
    )
  """

  load_data_all_elements = [f"""
      .then(() => fetch("./{page_name}_files/data/{el_id}.json"))
      .then((response) => response.json())
      .then(
        (json) =>
          (document.querySelectorAll('[data-for="{el_id}"]')[0].innerHTML =
            JSON.stringify(json).replaceAll("/", "/"))
      )
    """ for el_id in el_ids]

  if(len(el_ids) > 1):
    load_data_all_elements.pop(0)
    load_data_next_elements = "".join(load_data_all_elements)
  else:
    load_data_next_elements = ""

  then_load_scripts = "\n".join([f'.then(() => loadScript("{script}"))' for script in script_links])
  then_body_scripts = "\n".join([f'.then(() => loadScript("{script}"))' for script in body_scripts])
  then_render_mermaid = ".then(() => window.mermaid.init())" # mermaid charts will not render otherwise
  then_render_html = ".then(() => window.HTMLWidgets.staticRender());"

  script_content = f"""
  {load_function}
  {load_data_first_element}
  {load_data_next_elements}
  {then_load_scripts}
  {then_body_scripts}
  {then_render_mermaid}
  {then_render_html}
  """
  return script_content

def insert_main_js_script(soup, page_name):
    load_data_js = create_load_data_js(soup, page_name)
    s = soup.new_tag("script")
    s.string = load_data_js 
    soup.html.head.append(s)   

def save_new_html(soup, page_name):
    outfile = f"{page_name}_min.html"
    with open(outfile, "w", encoding='utf-8') as file:
        file.write(str(soup))
    print(f"File created: {outfile}")

def create_page_min(page_path):
    soup = load_page(page_path)
    page_name = re.sub("\\.html$", "", page_path.name)
    print(f"Converting {page_path}")
    insert_main_js_script(soup, page_name)
    save_new_html(soup, page_name)

def make_all_html_min(files_to_exclude = ["example_file_to_exclude.html"]):
    # .endswith("min") is quick and dirty shortcut to not apply this script to files it creates
    files_to_make_min = [f for f in Path("./").glob("*.html") if not f.name.endswith("min.html")]
    files_to_make_min = list(set(files_to_make_min) - set([Path(f) for f in files_to_exclude]))
    for page_path in files_to_make_min:
        create_page_min(page_path)

make_all_html_min()

I arrived at this after continuing to try in the absence of any answers. Once I worked it out I decided to answer this myself in case anyone else faces this issue. However, I cannot award the bounty to myself, so I remain very open to alternative solutions.