This is the data that I need:
I already imported the table into R:
library(tidyverse)
library(rvest)
webpage <- read_html("https://www.lpi.usra.edu/meteor/metbull.php?sea=%2A&sfor=names&ants=&nwas=&falls=&valids=&stype=contains&lrec=200&map=ge&browse=&country=All&srt=name&categ=Ungrouped+achondrites&mblist=All&rect=&phot=&strewn=&snew=0&pnt=Normal%20table&dr=&page=0")
tbls <- html_nodes(webpage, "table")
tbls_ls <- webpage %>%
html_nodes("table") %>%
.[5] %>%
html_table(fill = TRUE)
data = as.tibble(tbls_ls[[1]])
Yet, I need to add one more thing to the table. For some meteorites, there are oxygen isotope values available. One can see this when clicking on the name of the meteorite under the section "plots". When clicking on the plot, we get redirected to a page where we have the three isotope values. What I want to do is to add three columns to my table, containing the respective isotope values for each meteorite. I tried writing code for each "plot" section separately, but I feel like there could be a much more elegant solution for this.
You could grab the table without isotopes, then mimic the post request the page does if you decide to go with isotopes; then left-join the two on
Name
column. You will get more rows back than were in left table (no isotopes) because there are multipleChange values
, but this matches with what you see in the method of viewing isotopes you describe, where there are comma separated lists of values against isotopes, within plots, rather than split out by rows.I go for a more selective css selector to target the specific table of interest initially, rather than indexing into lists.
I use
write_excel_csv
to preserve the character encoding of headers on write out (an idea I got from @stefan).You can remove columns you don't want in output from
joint_table
before writing out (subset/select etc).r
Example output:
Edit:
Adding in the additional information that comes from other urls as per your request in comments. I had to dynamically determine which table number to pick up, as well as handle cases where no table present.
Created on 2021-02-27 by the reprex package (v0.3.0)
N.B.
OP had problems with lookups variable for some reason so here is an alternative I wrote that worked for them: