Merging two datasets from different indexation plataforms (same variables with different headers)

Question

Merging two datasets from different indexation plataforms (same variables with different headers)

59 Views Asked by Valério Souza Neto At 20 December 2023 at 04:33

It might be silly, but I am having trouble merging two datasets (in a new file); I want to have the structure of Excel A (which I am calling BehaviouralEconomicsTourism.xlsx) with the information from the variables (same variables with different spelling from scopus.csv). So, I want them to appear as new rows, adding the information from these variables in common. Does someone have any suggestions? I tried to follow the following code (I tried some other variations before but did not save). But it did not work for hte purpose intended

# Install and load necessary packages if not already installed
if (!requireNamespace("tidyverse", quietly = TRUE)) install.packages("tidyverse")
if (!requireNamespace("readxl", quietly = TRUE)) install.packages("readxl")
if (!requireNamespace("writexl", quietly = TRUE)) install.packages("writexl")
if (!requireNamespace("janitor", quietly = TRUE)) install.packages("janitor")

library(tidyverse)
library(readxl)
library(writexl)
library(janitor)

# Load datasets from Sheet2 in BehaviouralEconomicsTourism.xlsx
behavioral_economics_df <- read_excel("/Users/s5320381/Downloads/BehaviouralEconomicsTourism.xlsx", sheet = "Sheet2")
scopus_df <- read.csv("/Users/s5320381/Downloads/scopus.csv")

# Clean column names
scopus_df <- janitor::clean_names(scopus_df)
behavioral_economics_df <- janitor::clean_names(behavioral_economics_df)

# Mapping columns between the two datasets
column_mapping <- c(
  "authors" = "authors",
  "author_full_names" = "author_full_names",
  "article_title" = "title",
  "publication_year" = "year",
  "volume" = "volume",
  "issue" = "issue",
  "publisher" = "publisher",
  "source_title" = "source_title",
  "author_keywords" = "author_keywords",
  "abstract" = "abstract",
  "issn" = "issn",
  "isbn" = "isbn",
  "start_page" = "page_start",
  "end_page" = "page_end",
  "doi" = "doi",
  "language" = "language_of_original_document",
  "doi_link" = "link",
  "affiliations" = "affiliations"
)

# Convert relevant columns to character in scopus_df
scopus_df <- scopus_df %>%
  mutate(across(c(volume, issue, start_page, end_page), as.character))

# Convert relevant columns to double in behavioral_economics_df
behavioral_economics_df <- behavioral_economics_df %>%
  mutate(across(c(volume, issue, start_page, end_page), as.double))

# Merge datasets based on common columns
merged_df <- bind_rows(behavioral_economics_df, scopus_df %>% select(column_mapping, references))

# Save the merged dataset to a new Excel file
writexl::write_xlsx(merged_df, "/Users/s5320381/Downloads/test201223.xlsx")

I tried to merge two datasets with same variables that possess different header for these variables.

Original Q&A

There are 1 best solutions below

**Jannn** · Answer 1 · 2023-12-20T06:25:12.880000

Assuming you have the following dataframes (given you didn't share a piece of code):

  behavioral_economics_df = tibble(article_title = letters[1:3],
                                   publication_year = seq(1990, 1992, 1),
                                   start_page = seq(1, 3, 1),
                                   end_page = seq(4, 6, 1),
                                   language = letters[4:6],
                                   doi_link = letters[7:9])
  
  
  scopus_df = tibble(title = letters[10:12],
                     year = seq(1993, 1995, 1),
                     page_start = seq(10, 12, 1),
                     page_end = seq(13, 15, 1),
                     language_of_original_document = letters[13:15],
                     link = letters[16:18],
                     other_column = c("A", 1, NA))

Then you can use dplyr::select to select and rename the variables you need as follows:

 merge_df = bind_rows(behavioral_economics_df,
                       scopus_df %>% 
                         select(article_title = title, 
                                publication_year = year,
                                start_page = page_start,
                                end_page = page_end,
                                language = language_of_original_document,
                                doi_link = link))

Note that for bind_rows to work you need that variables in both dataframes have same names and types so I believe the error might be related to this detail.

Merging two datasets from different indexation plataforms (same variables with different headers)

There are 1 best solutions below

Related Questions in R

Related Questions in DPLYR

Related Questions in JANITOR

Trending Questions

Popular # Hahtags

Popular Questions