I have a data set of enzyme sequences and a target variable to predict.
The process I am doing is transforming sequences into smiles and then get numerical inputs for machine learning models.
Problem is: rdkit fails to transform some of the sequences but not all of them. In this case the transformation was stopped for index = 5 which corresponds to the following sequence: 'PQITLWQRPIVTIKIGGQLIEALLDTGADDTVLEXXNLPGRWKPKXIGGIGGFXKVRQYDQVPIEIXGHKTXSTVLVGPTPVNIIGRNLMTQIGCTLNFPISPIETVPVKLKPGMDGPKXKQWPLTEEKIKALMEICKELEEEGKISKIGPENPYNTPVFAIKKKNSTKWRKLVDFRELNKRTQDFWEVQLGIPHPAGLKRKKSVTVLDVGDAYFSIPLDKDFRKYTAFTIPSINNETPGIRYQYNVLPQGWKGSPAIFQSSMTKILEPFRKQNPDIVIYQYVDDLYVGSDLEIEQHRTKIKELRQYLWKWGFYTPDXKHQEEPPFHWXGYELHPDKWTVQPIVLPEKESWTVNDIQKLVGKLNWASQIYAGIKVKQLCKLLRG'

Problem transforming a SEQUENCE into SMILES with RDKit
988 Views Asked by Triki Sadok At
1
There are 1 best solutions below
Related Questions in PYTHON
- How to store a date/time in sqlite (or something similar to a date)
- Instagrapi recently showing HTTPError and UnknownError
- How to Retrieve Data from an MySQL Database and Display it in a GUI?
- How to create a regular expression to partition a string that terminates in either ": 45" or ",", without the ": "
- Python Geopandas unable to convert latitude longitude to points
- Influence of Unused FFN on Model Accuracy in PyTorch
- Seeking Python Libraries for Removing Extraneous Characters and Spaces in Text
- Writes to child subprocess.Popen.stdin don't work from within process group?
- Conda has two different python binarys (python and python3) with the same version for a single environment. Why?
- Problem with add new attribute in table with BOTO3 on python
- Can't install packages in python conda environment
- Setting diagonal of a matrix to zero
- List of numbers converted to list of strings to iterate over it. But receiving TypeError messages
- Basic Python Question: Shortening If Statements
- Python and regex, can't understand why some words are left out of the match
Related Questions in BIOINFORMATICS
- GROMACS 2024 with CP2K 9.1 BUILD
- Error: 'GDCquery_Maf' is not an exported object from 'namespace:TCGAbiolinks
- doing some ontology enrichment analysis
- Mercury + Mito + jupyter notebook
- Facing problem with gather() function in R
- How to derive snakemake wildcards from python objects?
- Clustering Medium length (100bp) DNA Sequences
- Creating a function that takes 2 DNA sequences, checks if they're the same length, if they're valid sequences and what type of mutation they have
- Residue contact map
- Troubleshooting SCTransform Error in Seurat: "Can't use NA as row index in a tibble for assignment"
- Fetching Gene Ontology Terms for a List of Genes Using Python
- How to convert .pep file extension to .fasta in python
- Is there a way to quickly identify all matching coding region sequences based on a list of protein identifiers from all GTDB representatives?
- How to use processed DEGs file for UMAP in Seurat?
- Function Structure Issue: 'track_plot' Error with No 'p' Argument
Related Questions in FINGERPRINT
- How to setup nist nbis in raspbian raspberry pi 4
- How can I override navigator.userAgentData?
- Digital Persona Fingerprint(4500) reader integration with Vue JS 2
- Verifying Secugen fingerprint capture from database in android java
- Issue with Windows Biometric Framework Capture Sample for Capacitive Touch Fingerprint Sensor
- Is anyone here tried connecting a Biometric fingerprint device with a web or desktop application?
- Unable to show biometric authentication dialog using flutter webview
- R503 fingerprint sensor
- how to validate fingerprints in angular?
- Why Truecaller SDK gives me an error: 40304, Invalid fingerprint in my flutter app?
- Google Maps not working on release Android bundle
- NET8 Maui IFingerprint doesn't stop to ask for authentication
- Is it a good idea to verify biometric fingerprint authentication only in FE?
- Fingerprint template matching zkteco
- Puppeteer SetViewPort in Gologin or AdsPower Browsers, change creepJS browser fingerprint
Related Questions in RDKIT
- i am trying to install pyboost for rdkit but it showing following problems
- Why Morgan Fingerprints for the molecules in my data does not plateaus with increasing number of bits?
- Similarity search in a python database using rdkit
- How can I generate a molecular image from the molecular structure provided in MBL molfile format?
- How import NPM files outside of NPM into NextJS
- How to polymerize repeating units of polymers into dimers
- Draw a cloud or lines over the polar area of a molecule in RDKit
- molecule image in vscode
- How to use the CalCRDF function of rdkit.Chem.rdMolDescriptors to calculate RDFs of selected atom/atom type?
- How to convert large sdf file to dataframe in RDKit
- MaxMin diversity selection with RDKit
- How do I use get_mol() on a pdb file in RDKIT-JS?
- Compute Tanimoto similarity from a csv file of bitstrings
- About a critical issue while building a Flask-based web app involves in the Docker desktop
- Importing rdkit.Chem.Draw makes plt.show() freeze the program
Related Questions in CHEMINFORMATICS
- Using smiles strings as database keys
- Why Morgan Fingerprints for the molecules in my data does not plateaus with increasing number of bits?
- Similarity search in a python database using rdkit
- Draw a cloud or lines over the polar area of a molecule in RDKit
- How to use the CalCRDF function of rdkit.Chem.rdMolDescriptors to calculate RDFs of selected atom/atom type?
- MaxMin diversity selection with RDKit
- Jupyter notebook not reading my 'SMILES' to make Morgan Fingerprints
- Setting 'ball and stick' as default for new molecules in pymol
- How to choose right function from RDKit to calculate atomic RMSD?
- How to import a SDF or sd file in python?
- Get 3D Chemical Structures - Python Automation
- Is it stupid to do l2 normalization with sklearns Normalizer for a correlation analysis on this type of dataset?
- Accessing output of RDKIT Chem.FindAllSubgraphsOfLengthN(mol,n)
- How to obtain all n atom fragments (substructures) from a mol file using RDKIT?
- MACCS Fingerprint
Trending Questions
- UIImageView Frame Doesn't Reflect Constraints
- Is it possible to use adb commands to click on a view by finding its ID?
- How to create a new web character symbol recognizable by html/javascript?
- Why isn't my CSS3 animation smooth in Google Chrome (but very smooth on other browsers)?
- Heap Gives Page Fault
- Connect ffmpeg to Visual Studio 2008
- Both Object- and ValueAnimator jumps when Duration is set above API LvL 24
- How to avoid default initialization of objects in std::vector?
- second argument of the command line arguments in a format other than char** argv or char* argv[]
- How to improve efficiency of algorithm which generates next lexicographic permutation?
- Navigating to the another actvity app getting crash in android
- How to read the particular message format in android and store in sqlite database?
- Resetting inventory status after order is cancelled
- Efficiently compute powers of X in SSE/AVX
- Insert into an external database using ajax and php : POST 500 (Internal Server Error)
Popular # Hahtags
Popular Questions
- How do I undo the most recent local commits in Git?
- How can I remove a specific item from an array in JavaScript?
- How do I delete a Git branch locally and remotely?
- Find all files containing a specific text (string) on Linux?
- How do I revert a Git repository to a previous commit?
- How do I create an HTML button that acts like a link?
- How do I check out a remote Git branch?
- How do I force "git pull" to overwrite local files?
- How do I list all files of a directory?
- How to check whether a string contains a substring in JavaScript?
- How do I redirect to another webpage?
- How can I iterate over rows in a Pandas DataFrame?
- How do I convert a String to an int in Java?
- Does Python have a string 'contains' substring method?
- How do I check if a string contains a specific word?
Looks like the issue is that you have X in your sequence. This is not an amino acid code but a placeholder for an unknown/atypical amino acid. Seems that RDKit cannot process this case:
When we removed the Xs RDKit parsed the sequence correctly. I am not saying that merely removing these is the correct solution, just highlighting the issue. There is probably a much better method for processing these cases.