Run external program inside a conda environment in R

180 Views Asked by At

I am trying to run stitchr in R. For programs that run in Python, I use reticulate. I create a conda environment named r-reticulate, where I want to install stitchr and run it.

I try the following:

if (!('r-reticulate' %in% reticulate::conda_list()[,1])){
  reticulate::conda_create(envname = 'r-reticulate', packages = 'python=3.10')
}
reticulate::use_condaenv('r-reticulate')
reticulate::py_install("stitchr", pip = TRUE)

system("stitchr -h") # this does not work

But obviously enough, the system() call does not work, with the message error in running command.

What would be the right way to do this?

I had success in the past with anndata, for example. But this is an R package wrapper, so I can just do:

reticulate::use_condaenv('r-reticulate')
reticulate::py_install("anndata", pip = TRUE)

data_h5ad <- anndata::read_h5ad("file.h5ad")

How can I approach the stitchr case?

EDIT:

So I retrieved stitchr.py location during the package installation: /usr/local/Caskroom/miniconda/base/envs/r-reticulate/lib/python3.10/site-packages/Stitchr/stitchr.py

I tried all the following but nothing works (see error messages):

pyloc="/usr/local/Caskroom/miniconda/base/envs/r-reticulate/lib/python3.10/site-packages/Stitchr/stitchr.py"
reticulate::source_python(pyloc)

Error in py_run_file_impl(file, local, convert) : ImportError: attempted relative import with no known parent package Run reticulate::py_last_error() for details.

reticulate::py_run_file(pyloc)

Error in py_run_file_impl(file, local, convert) : ImportError: attempted relative import with no known parent package Run reticulate::py_last_error() for details.

reticulate::py_run_string(paste(pyloc, "-h"))

Error in py_run_string_impl(code, local, convert) : File "", line 1 /usr/local/Caskroom/miniconda/base/envs/r-reticulate/lib/python3.10/site-packages/Stitchr/stitchr.py -h SyntaxError: invalid syntax Run reticulate::py_last_error() for details.

I am absolutely clueless on how to proceed here.

2

There are 2 best solutions below

1
phili_b On BEST ANSWER

Here is maybe what you expect.

shell:

conda create --name=testenv python
# or conda create --name=testenv python==3.10.13 if you want a specific version for jupyter for example
conda activate testenv
# to be sure which pip is:
whereis pip

~/anaconda3/envs/testenv/bin/pip

shell stitchr part, read from the doc of stitchr

pip install stitchr IMGTgeneDL

stitchrdl
stitchr -v TRBV7-3*01 -j TRBJ1-1*01 -cdr3 CASSYLQAQYTEAFF

It works with command line.

shell

cd ~
cp /home/extraits/anaconda3/envs/testenv/bin/stitchr ~/teststitchr.py
./teststitchr.py -v TRBV7-3*01 -j TRBJ1-1*01 -cdr3 CASSYLQAQYTEAFF

It works with command line.

Create ~/teststitchr2.py filled by the content of https://jamieheather.github.io/stitchr/importing.html

~/teststitchr2.py:

# import stitchr
from Stitchr import stitchrfunctions as fxn
from Stitchr import stitchr as st

# specify details about the locus to be stitched
chain = 'TRB'
species = 'HUMAN'

# initialise the necessary data
tcr_dat, functionality, partial = fxn.get_imgt_data(chain, st.gene_types, species)
codons = fxn.get_optimal_codons('', species)

# provide details of the rearrangement to be stitched
tcr_bits = {'v': 'TRBV7-3*01', 'j': 'TRBJ1-1*01', 'cdr3': 'CASSYLQAQYTEAFF',
            'l': 'TRBV7-3*01', 'c': 'TRBC1*01',
            'skip_c_checks': False, 'species': species, 'seamless': False,
            '5_prime_seq': '', '3_prime_seq': '', 'name': 'TCR'}

# then run stitchr on that rearrangement
stitched = st.stitch(tcr_bits, tcr_dat, functionality, partial, codons, 3, '')

print(stitched)
# Which produces
(['TCR', 'TRBV7-3*01', 'TRBJ1-1*01', 'TRBC1*01', 'CASSYLQAQYTEAFF', 'TRBV7-3*01(L)'],
 'ATGGG snip snip snip snip snip snip TTC',
 0)

python in the shell

python ./teststitchr2.py

(['TCR', 'TRBV7-301', 'TRBJ1-101', 'TRBC101','CASSYLQAQYTEAFF','TRBV7-301(L)'],'ATG snip snip snip snip TTC', 0)

In R:

library(reticulate)
reticulate::use_condaenv('testenv')
py_run_file(file.path(path.expand('~'),'teststitchr2.py'))
names(py)

reticulate::py_run_file() populates the variable py: https://rstudio.github.io/reticulate/articles/calling_python.html#executing-code

Here is, by names(py), all functions and variables from reticulate prefixed by py$

c("chain", "codons", "functionality", "fxn", "partial", "r", "species", "st", "stitched", "tcr_bits", "tcr_dat")

In R:

print(py$stitched )

It works :)

[[1]]
[1] "TCR"             "TRBV7-3*01"      "TRBJ1-1*01"      "TRBC1*01"       
[5] "CASSYLQAQYTEAFF" "TRBV7-3*01(L)"  

[[2]]
[1] "ATGGGCAC snip snip snip snip "

[[3]]
[1] 0

You can type myvar=py$stitched to have it in a variable and use it later.

You can also try this: In R:

tcr_bits2= list(v = "TRBV7-3*01", j = "TRBJ1-1*01", cdr3 = "CASSYLQAQYTEAFF", 
    l = "TRBV7-3*01", c = "TRBC1*01", skip_c_checks = FALSE, 
    species = "HUMAN", seamless = FALSE, `5_prime_seq` = "", 
    `3_prime_seq` = "", name = "TCR")

py$st$stitch(tcr_bits2, py$tcr_dat,py$functionality, py$partial, py$codons, 3, '')
  • 'TCR''TRBV7-301''TRBJ1-101''TRBC101''CASSYLQAQYTEAFF''TRBV7-301(L)'
  • 'ATG snip snip snip snip ATTTC'
  • 0

Be careful I mixed R variable, tcr_bits2, and reticulate environment (py$). You can type myvar2=py$st$stitch(bla bla) to have it in a variable and use it later.

It works again :)

Edit:

And a bad trick, in the Python side, if you have an issue of import, before from Stitchr import

import os
os.chdir(os.path.join(os.path.expanduser('~'), 'anaconda3/envs/testenv/lib/python3.12/site-packages'))

But look at also How can I import a module dynamically given the full path?

This trick (os.chdir()) is only for test, but try to not use it.

1
merv On

Use Conda Run

AFAIK, reticulate can't handle arbitrary execution in the environment - it only handles Python code, plus some wrappers around conda and pip. However, with a system() call one can use conda run directly, using the environment:

## same as in OP, but instead:
system('conda run -n r-reticulate stitchr -h')

Resulting Output:

usage: stitchr [-h] -v V -j J -cdr3 CDR3 [-s SPECIES] [-c C] [-l L] [-aa AA] [-n NAME] [-sl] [-5p 5_PRIME_SEQ] [-3p 3_PRIME_SEQ] [-xg]
               [-p PREFERRED_ALLELES_PATH] [-m MODE] [-cu CODON_USAGE_PATH] [-jt J_WARNING_THRESHOLD] [-sc] [-sw] [--version] [--cite] [-dd]

stiTChR v1.1.2 : Stitch together a coding TCR nucleotide sequence from V, J, and CDR3 info. Use IMGT gene names, and include terminal CDR3
residues (C/F). E.g. 'python stitchr.py -v TRBV20-1 -j TRVJ1-2 -cdr3 CASWHATEVERF'. See https://github.com/JamieHeather/stitchr and
https://doi.org/10.1093/nar/gkac190.

options:
  -h, --help            show this help message and exit
  -v V, --v V           V gene name. Required. Specific allele not required, will default to prototypical (*01)
  -j J, --j J           J gene name. Required. Specific allele not required, will default to prototypical (*01)
  -cdr3 CDR3, --cdr3 CDR3
                        CDR3 amino acid sequence. Required. Must include terminal residues (e.g. C/F)
  -s SPECIES, --species SPECIES
                        Species (common name). Optional: see data directory for all possible options. Default = HUMAN
  -c C, --c C           Constant gene. Optional. Specific allele not required, will default to prototypical (*01). See README re: alternative
                        TRGC exon configurations.
  -l L, --l L           Leader region. Optional. Will default to match the appropriate V gene.
  -aa AA, --aa AA       Partial amino acid sequence, if known. Optional. Can be used to check stitching success.
  -n NAME, --name NAME  Name for TCR sequence. Optional. Will be added to output FASTA header.
  -sl, --seamless       Optional flag to integrate known nucleotide sequences seamlessly. NB: nucleotide sequences covering the CDR3 junction
                        with additional V gene context required.
  -5p 5_PRIME_SEQ, --5_prime_seq 5_PRIME_SEQ
                        Optional sequence to add to the 5' of the output sequence (e.g. a Kozak sequence).
  -3p 3_PRIME_SEQ, --3_prime_seq 3_PRIME_SEQ
                        Optional sequence to add to the 3' out the output sequence (e.g. a stop codon).
  -xg, --extra_genes    Optional flag to use additional (non-database/-natural) sequences in the 'additional-genes.fasta' file.
  -p PREFERRED_ALLELES_PATH, --preferred_alleles_path PREFERRED_ALLELES_PATH
                        Path to a file of preferred alleles to use when no allele specified (instead of *01). Optional.
  -m MODE, --mode MODE  Standard out output mode. Options are 'BOTH_FA' (default), 'AA_FA', 'NT_FA', 'AA', 'NT'.
  -cu CODON_USAGE_PATH, --codon_usage_path CODON_USAGE_PATH
                        Path to a file of Kazusa-formatted codon usage frequencies. Optional.
  -jt J_WARNING_THRESHOLD, --j_warning_threshold J_WARNING_THRESHOLD
                        J gene substring length warning threshold. Default = 3. Decrease to get fewer notes on short J matches.
  -sc, --skip_c_checks  Optional flag to skip usual constant region gene checks.
  -sw, --suppress_warnings
                        Optional flag to suppress warnings.
  --version             Print current stitchr version.
  --cite                Print citation details.
  -dd, --data_dir       Print installed stitchr data directory path.

Additional Suggestion

I would encourage you not to use r-reticulate as your environment name. Others use that, and one should maintain separation of concerns. Create a dedicated environment per task or project.