Chapter 5 Construction and Content
5.1 Input Spreadsheet
NOTE: This section describes input spreadsheet hosted on Google Drive.
Self-hosted runs use target_list.tsv
located in data
directory.
Each entry obtained from the search results in the interactive online
platform of EZCancerTarget
is referenced and has at least one scientific
piece of evidence. Figure 5.1 shows a flowchart on
the processing steps and workflow of EZCancerTarget
.
First, users of
EZCancerTarget
can start their workflow by opening the project’s starting page on
GitHub (https://cycle20.github.io/EZCancerTarget/). Then, users can upload their target
list with three pieces of information into a Google Spreadsheet
(Target INPUT).
The first piece (Fig. 5.2 Column A), asks for the HUGO ID.
In Column B, users can give a “Label” for every target for classification and
clustering useful in later work.
The third piece (Fig. 5.2 column C) is the UniProtKB ID of
the searched gene. Inputs for the HUGO
and Label
columns are limited to
12 characters. On the right side of the spreadsheet (columns E-K
), hyperlinks
provide access to the results page on GitHub, and
in the “Results of Update Request” box, users can check the query’s status.
Hitting the “Start Rendering” button located on columns E-F
starts the query.
The area within E1-K8
are protected and automatically overwritten if edited
(Fig. 5.2).
5.2 Result Page
NOTE: This section describes result page made public on GitHub or
located in OUTPUT
directory of EZCancerTarget in case of self-hosted run
index.target.with.data.html
.
By clicking on the “Result page” link on the target spreadsheet, we can access
the results of our query within approximately 10 minutes. Clicking on
the hyperlink in cell H6
we can follow the progress of the query
(Fig. 5.2, arrow). A new query overwrites the earlier one in
the web application, but every previous version is saved on GitHub under the
“Result of Update Request” link
(https://github.com/cycle20/EZCancerTarget/actions/workflows/clue.yml).
A scrollable panel displays all the targets on the left with at least one valid
drug compound available. The software automatically excludes entries where no
drug or small molecule inhibitor/agonist is available according to the clue.io
repurposing hub.
In the first entry of the results list (“Summary”) the evaluation report on the search is accessible. In the “Overview” section, it displays the total number of found compounds for all listed targets and the average number of compounds per target. The amount of found compounds are classified according to their pre- and clinical phase as well. In the “Molecular Background” section details from the retrieved molecular background data is evaluated according to the number of found Reactome or KEGG pathways - Gillespie et al. (2021), Kanehisa (2002)] -, STRING interactors and GO molecular functions, subcellular localizations, biological processes. Finally (“Compound” section), every listed target is separately detailed of their compound entries in PubMed, PubChem, ChEMBL and DrugBank.
The platform creates a table for every target, where different columns indicate the mechanism of action (MoA), clinical status (preclinical, phase 1, phase 2, phase 3, or launched), and the search resources from PubMed, EMA and the direct entry from clue.io. Furthermore, the query table includes hyperlinks with DrugBank, PubChem, and ChEMBL IDs to quickly access the compounds’ chemical and pharmacological properties (Fig. 5.3).
EZCancerTarget
also gives a comprehensive, highly structured overview of the
selected targets regarding their molecular biology data,
including molecular function (Gene Ontology), their connectome (STRING),
participation in pathways, and cellular localization (Reactome and KEGG)
retrieved from various databases. Hyperlinks to GeneCards and
DrugBank Target Search are also available but differently structured as for
UniProt entries. The “STRING” entry opens a static STRING map for
the target and provides a hyperlink to https://string-db.org/ (Fig. 5.4).
The following entry carries the “Molecular Functions / Subcellular Localizations” title, where the two main hyperlinks (source) lead to UniProt’s “Function” and “Subcellular Localization” pages. Molecular function entries and target localizations are also provided as text separately, where hyperlinks lead to the QuickGO platform to obtain further information about relevant compartment-specific molecular pathways (Fig. 5.4). The last entry named “Pathways” provides links to every Reactome database, where the target’s participation is visualized in every relevant metabolic pathway (Fig. 5.4). The hyperlink to the entry of the KEGG database is also displayed here, without being broken down to individual links to pathways. Supplementary Video 1 shows a short tutorial about the functionality of the program and the main steps to generate a query.
References
Gillespie, Marc, Bijay Jassal, Ralf Stephan, Marija Milacic, Karen Rothfels, Andrea Senff-Ribeiro, Johannes Griss, et al. 2021. “The reactome pathway knowledgebase 2022.” Nucleic Acids Research 50 (D1): D687–D692. https://doi.org/10.1093/nar/gkab1028.
Kanehisa, M. 2002. “The KEGG Databases at GenomeNet.” Nucleic Acids Research 30 (1): 42–46. https://doi.org/10.1093/nar/30.1.42.