11---
2- category : Research
32id : chembl-database
43name : ChEMBL Database
5- description : Guidance and answers for chembl database.
4+ description : Guidance for retrieving compound information, bioactivity data, and identifier mapping in ChEMBL.
5+ category : Research
6+ requires : []
7+ examples :
8+ - Search for bioactivity data of Geldanamycin in the ChEMBL database.
9+ - Map this KEGG compound ID to its corresponding ChEMBL ID.
610---
711
812# BioServices
@@ -24,334 +28,31 @@ This skill should be used when:
2428- Mining genomic data (BioMart, ArrayExpress, ENA)
2529- Integrating data from multiple bioinformatics resources in a single workflow
2630
27- ## Core Capabilities
28-
29- ### 1. Protein Analysis
30-
31- Retrieve protein information, sequences, and functional annotations:
32-
33- ``` python
34- from bioservices import UniProt
35-
36- u = UniProt(verbose = False )
37-
38- # Search for protein by name
39- results = u.search(" ZAP70_HUMAN" , frmt = " tab" , columns = " id,genes,organism" )
40-
41- # Retrieve FASTA sequence
42- sequence = u.retrieve(" P43403" , " fasta" )
43-
44- # Map identifiers between databases
45- kegg_ids = u.mapping(fr = " UniProtKB_AC-ID" , to = " KEGG" , query = " P43403" )
46- ```
47-
48- ** Key methods:**
49- - ` search() ` : Query UniProt with flexible search terms
50- - ` retrieve() ` : Get protein entries in various formats (FASTA, XML, tab)
51- - ` mapping() ` : Convert identifiers between databases
52-
53- Reference: ` references/services_reference.md ` for complete UniProt API details.
54-
55- ### 2. Pathway Discovery and Analysis
56-
57- Access KEGG pathway information for genes and organisms:
58-
59- ``` python
60- from bioservices import KEGG
61-
62- k = KEGG()
63- k.organism = " hsa" # Set to human
64-
65- # Search for organisms
66- k.lookfor_organism(" droso" ) # Find Drosophila species
67-
68- # Find pathways by name
69- k.lookfor_pathway(" B cell" ) # Returns matching pathway IDs
70-
71- # Get pathways containing specific genes
72- pathways = k.get_pathway_by_gene(" 7535" , " hsa" ) # ZAP70 gene
73-
74- # Retrieve and parse pathway data
75- data = k.get(" hsa04660" )
76- parsed = k.parse(data)
77-
78- # Extract pathway interactions
79- interactions = k.parse_kgml_pathway(" hsa04660" )
80- relations = interactions[' relations' ] # Protein-protein interactions
81-
82- # Convert to Simple Interaction Format
83- sif_data = k.pathway2sif(" hsa04660" )
84- ```
85-
86- ** Key methods:**
87- - ` lookfor_organism() ` , ` lookfor_pathway() ` : Search by name
88- - ` get_pathway_by_gene() ` : Find pathways containing genes
89- - ` parse_kgml_pathway() ` : Extract structured pathway data
90- - ` pathway2sif() ` : Get protein interaction networks
91-
92- Reference: ` references/workflow_patterns.md ` for complete pathway analysis workflows.
93-
94- ### 3. Compound Database Searches
95-
96- Search and cross-reference compounds across multiple databases:
97-
98- ``` python
99- from bioservices import KEGG , UniChem
100-
101- k = KEGG()
102-
103- # Search compounds by name
104- results = k.find(" compound" , " Geldanamycin" ) # Returns cpd:C11222
105-
106- # Get compound information with database links
107- compound_info = k.get(" cpd:C11222" ) # Includes ChEBI links
108-
109- # Cross-reference KEGG → ChEMBL using UniChem
110- u = UniChem()
111- chembl_id = u.get_compound_id_from_kegg(" C11222" ) # Returns CHEMBL278315
112- ```
113-
114- ** Common workflow:**
115- 1 . Search compound by name in KEGG
116- 2 . Extract KEGG compound ID
117- 3 . Use UniChem for KEGG → ChEMBL mapping
118- 4 . ChEBI IDs are often provided in KEGG entries
119-
120- Reference: ` references/identifier_mapping.md ` for complete cross-database mapping guide.
121-
122- ### 4. Sequence Analysis
123-
124- Run BLAST searches and sequence alignments:
125-
126- ``` python
127- from bioservices import NCBIblast
128-
129- s = NCBIblast(verbose = False )
130-
131- # Run BLASTP against UniProtKB
132- jobid = s.run(
133- program = " blastp" ,
134- sequence = protein_sequence,
135- stype = " protein" ,
136- database = " uniprotkb" ,
137- email = " your.email@example.com" # Required by NCBI
138- )
139-
140- # Check job status and retrieve results
141- s.getStatus(jobid)
142- results = s.getResult(jobid, " out" )
143- ```
144-
145- ** Note:** BLAST jobs are asynchronous. Check status before retrieving results.
146-
147- ### 5. Identifier Mapping
148-
149- Convert identifiers between different biological databases:
150-
151- ``` python
152- from bioservices import UniProt, KEGG
153-
154- # UniProt mapping (many database pairs supported)
155- u = UniProt()
156- results = u.mapping(
157- fr = " UniProtKB_AC-ID" , # Source database
158- to = " KEGG" , # Target database
159- query = " P43403" # Identifier(s) to convert
160- )
161-
162- # KEGG gene ID → UniProt
163- kegg_to_uniprot = u.mapping(fr = " KEGG" , to = " UniProtKB_AC-ID" , query = " hsa:7535" )
164-
165- # For compounds, use UniChem
166- from bioservices import UniChem
167- u = UniChem()
168- chembl_from_kegg = u.get_compound_id_from_kegg(" C11222" )
169- ```
170-
171- ** Supported mappings (UniProt):**
172- - UniProtKB ↔ KEGG
173- - UniProtKB ↔ Ensembl
174- - UniProtKB ↔ PDB
175- - UniProtKB ↔ RefSeq
176- - And many more (see ` references/identifier_mapping.md ` )
177-
178- ### 6. Gene Ontology Queries
179-
180- Access GO terms and annotations:
181-
182- ``` python
183- from bioservices import QuickGO
184-
185- g = QuickGO(verbose = False )
186-
187- # Retrieve GO term information
188- term_info = g.Term(" GO:0003824" , frmt = " obo" )
189-
190- # Search annotations
191- annotations = g.Annotation(protein = " P43403" , format = " tsv" )
192- ```
193-
194- ### 7. Protein-Protein Interactions
195-
196- Query interaction databases via PSICQUIC:
197-
198- ``` python
199- from bioservices import PSICQUIC
200-
201- s = PSICQUIC(verbose = False )
202-
203- # Query specific database (e.g., MINT)
204- interactions = s.query(" mint" , " ZAP70 AND species:9606" )
205-
206- # List available interaction databases
207- databases = s.activeDBs
208- ```
209-
210- ** Available databases:** MINT, IntAct, BioGRID, DIP, and 30+ others.
211-
212- ## Multi-Service Integration Workflows
213-
214- BioServices excels at combining multiple services for comprehensive analysis. Common integration patterns:
215-
216- ### Complete Protein Analysis Pipeline
217-
218- Execute a full protein characterization workflow:
219-
220- ``` bash
221- python scripts/protein_analysis_workflow.py ZAP70_HUMAN your.email@example.com
222- ```
223-
224- This script demonstrates:
225- 1 . UniProt search for protein entry
226- 2 . FASTA sequence retrieval
227- 3 . BLAST similarity search
228- 4 . KEGG pathway discovery
229- 5 . PSICQUIC interaction mapping
230-
231- ### Pathway Network Analysis
232-
233- Analyze all pathways for an organism:
234-
235- ``` bash
236- python scripts/pathway_analysis.py hsa output_directory/
237- ```
238-
239- Extracts and analyzes:
240- - All pathway IDs for organism
241- - Protein-protein interactions per pathway
242- - Interaction type distributions
243- - Exports to CSV/SIF formats
244-
245- ### Cross-Database Compound Search
246-
247- Map compound identifiers across databases:
248-
249- ``` bash
250- python scripts/compound_cross_reference.py Geldanamycin
251- ```
252-
253- Retrieves:
254- - KEGG compound ID
255- - ChEBI identifier
256- - ChEMBL identifier
257- - Basic compound properties
258-
259- ### Batch Identifier Conversion
260-
261- Convert multiple identifiers at once:
262-
263- ``` bash
264- python scripts/batch_id_converter.py input_ids.txt --from UniProtKB_AC-ID --to KEGG
265- ```
266-
267- ## Best Practices
268-
269- ### Output Format Handling
270-
271- Different services return data in various formats:
272- - ** XML** : Parse using BeautifulSoup (most SOAP services)
273- - ** Tab-separated (TSV)** : Pandas DataFrames for tabular data
274- - ** Dictionary/JSON** : Direct Python manipulation
275- - ** FASTA** : BioPython integration for sequence analysis
276-
277- ### Rate Limiting and Verbosity
278-
279- Control API request behavior:
280-
281- ``` python
282- from bioservices import KEGG
283-
284- k = KEGG(verbose = False ) # Suppress HTTP request details
285- k.TIMEOUT = 30 # Adjust timeout for slow connections
286- ```
287-
288- ### Error Handling
289-
290- Wrap service calls in try-except blocks:
291-
292- ``` python
293- try :
294- results = u.search(" ambiguous_query" )
295- if results:
296- # Process results
297- pass
298- except Exception as e:
299- print (f " Search failed: { e} " )
300- ```
301-
302- ### Organism Codes
303-
304- Use standard organism abbreviations:
305- - ` hsa ` : Homo sapiens (human)
306- - ` mmu ` : Mus musculus (mouse)
307- - ` dme ` : Drosophila melanogaster
308- - ` sce ` : Saccharomyces cerevisiae (yeast)
309-
310- List all organisms: ` k.list("organism") ` or ` k.organismIds `
311-
312- ### Integration with Other Tools
313-
314- BioServices works well with:
315- - ** BioPython** : Sequence analysis on retrieved FASTA data
316- - ** Pandas** : Tabular data manipulation
317- - ** PyMOL** : 3D structure visualization (retrieve PDB IDs)
318- - ** NetworkX** : Network analysis of pathway interactions
319- - ** Galaxy** : Custom tool wrappers for workflow platforms
320-
321- ## Resources
322-
323- ### scripts/
324-
325- Executable Python scripts demonstrating complete workflows:
326-
327- - ` protein_analysis_workflow.py ` : End-to-end protein characterization
328- - ` pathway_analysis.py ` : KEGG pathway discovery and network extraction
329- - ` compound_cross_reference.py ` : Multi-database compound searching
330- - ` batch_id_converter.py ` : Bulk identifier mapping utility
331-
332- Scripts can be executed directly or adapted for specific use cases.
333-
334- ### references/
335-
336- Detailed documentation loaded as needed:
337-
338- - ` services_reference.md ` : Comprehensive list of all 40+ services with methods
339- - ` workflow_patterns.md ` : Detailed multi-step analysis workflows
340- - ` identifier_mapping.md ` : Complete guide to cross-database ID conversion
341-
342- Load references when working with specific services or complex integration tasks.
343-
344- ## Installation
345-
346- ``` bash
347- uv pip install bioservices
348- ```
349-
350- Dependencies are automatically managed. Package is tested on Python 3.9-3.12.
351-
352- ## Additional Information
353-
354- For detailed API documentation and advanced features, refer to:
355- - Official documentation: https://bioservices.readthedocs.io/
356- - Source code: https://github.com/cokelaer/bioservices
357- - Service-specific references in ` references/services_reference.md `
31+ ## Instruction
32+ You are a Chemical Informatics and Bioactivity Specialist. When this skill is activated, you must guide the user through the retrieval and cross-referencing of compound data using the following behavioral logic:
33+
34+ 1 . ** Compound Identification Logic** : Guide the user in searching the ChEMBL repository for bioactive molecules using common names or structural identifiers.
35+ 2 . ** Bioactivity & Assay Analysis** :
36+ - Instruct the user on how to retrieve quantitative data, such as IC50, Ki, and EC50 values, from specific assays.
37+ - Explain the logic of filtering results by Target Type and Organism to ensure scientific relevance.
38+ 3 . ** Cross-Database Mapping** :
39+ - Use the logic of UniChem to map identifiers between ChEMBL and other chemical repositories like KEGG, ChEBI, or PubChem.
40+ - Describe the workflow for mapping KEGG compound IDs to ChEMBL IDs to bridge pathway analysis with drug discovery data.
41+ 4 . ** Data Integration Flow** :
42+ - Guide the user in integrating ChEMBL data with protein repositories like UniProt to link ligands with their target receptors.
43+ - Explain how to handle tabular data (TSV) or structured JSON returns for downstream analysis.
44+
45+ ## Output
46+ Your response must be structured to provide a professional chemoinformatics report:
47+
48+ ### 1. Compound & Bioactivity Summary
49+ - ** Target Compound** : Standard name and ChEMBL identifier.
50+ - ** Activity Profile** : A summary of key bioactivity metrics and the associated targets.
51+
52+ ### 2. Implementation Logic (Natural Language)
53+ - ** Search Workflow** : Step-by-step guidance on querying compounds and filtering assay results.
54+ - ** Mapping Logic** : A natural language description of how to bridge identifiers across databases.
55+
56+ ### 3. Best Practices & Data Interpretation
57+ - ** Data Quality Warnings** : Reminders to check the "Confidence Score" of assays.
58+ - ** Unit Standardization** : Advice on ensuring concentration units are consistent across data sets.
0 commit comments