library(readxl)
Data <- read_excel("20240214.xlsx")
library(tidyverse)
Data_frame <- separate(Data, col = "FUN", into = c("Bacteria", "Protein"), remove = TRUE, sep = ";" )
View(Data_frame)
Bacteria Protein Log2quantity Sample 1 Log2quantity Sample 2 Log2quantity Sample 3 ...
clostridium ABC transporter 15.2 5.2 2.1
clostridium Kinase1 8.2 1.2 8.2
bacillus ABC transporter 5.5 8.8 24.2
bacillus Oxidoreductase 3.2 10.2 12.2
bacillus Kinase1 2.1 1.2 42.2
firmicutes Kinase1 9.9 9.2 22.2
... ... ... ... ...
I have this protein data that shows (1) The protein name (2) the Bacteria name (deduced from the protein) (3) The amount of protein.
I would like to summarize the proportions of protein and bacteria per each sample. After, since there are too many different names for bacteria and proteins, I want to see the short data table containing only the top 20 most abundant in both bacteria and proteins per sample.
For instance, it would look like -
clostridium= 15%, bacillus = 5%, firmicutes= 2% ... in sample 1
clostridium= 2%, bacillus = 15%, firmicutes= 42% ...in sample 2
clostridium= 12%, bacillus = 11%, firmicutes= 6% ...in sample 3
ABC transporter= 30%, Kinase1= 15%, Oxidoreductase= 3% ...in sample 1
ABC transporter= 10%, Kinase1= 11%, Oxidoreductase= 12% ...in sample 2
ABC transporter= 20%, Kinase1= 55%, Oxidoreductase= 21% ...in sample 3
How should I do it in R and export the data in an Excel file? Because I expect the summary data will be huge with the long list. Therefore, I would like to view it more readable in Excel.
I'm a beginner in R and just know a few visualization methods with ggplot2. It seems like it could work with dplyr, but I'm not sure how should I start with it. Especially, I would like to know if it is even possible to export relatively huge data in an Excel file with R.