Fix references

maxim-k · maxim-k · commit 59cdcba3dc58 · 2022-02-16T22:53:34.000-05:00
diff --git a/appyters/Bulk_RNA_seq/RNA_seq_Analysis_Pipeline.ipynb b/appyters/Bulk_RNA_seq/RNA_seq_Analysis_Pipeline.ipynb
@@ -479,7 +479,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "# Load datasets"
+    "# Loaded datasets"
    ]
   },
   {
@@ -630,7 +630,7 @@
    "source": [
     "%%appyter markdown\n",
     "{% if visualization_method.value == \"PCA\" %}\n",
-    "Principal Component Analysis (PCA) (Clark et al. 2011) is a statistical technique used to identify global patterns in high-dimensional datasets. It is commonly used to explore the similarity of biological samples in RNA-seq datasets. To achieve this, gene expression values are transformed into Principal Components (PCs), a set of linearly uncorrelated features which represent the most relevant sources of variance in the data, and subsequently visualized using a scatter plot.\n",
+    "Principal Component Analysis (PCA) [1] is a statistical technique used to identify global patterns in high-dimensional datasets. It is commonly used to explore the similarity of biological samples in RNA-seq datasets. To achieve this, gene expression values are transformed into Principal Components (PCs), a set of linearly uncorrelated features which represent the most relevant sources of variance in the data, and subsequently visualized using a scatter plot.\n",
     "{% endif %}"
    ]
   },
@@ -658,7 +658,6 @@
     "# Display results\n",
     "plot_name = \"{}_plot_of_samples.png\".format(method)\n",
     "figure_counter, notebook_metadata = plot_samples(results[method], meta_class_column_name=meta_class_column_name, counter=figure_counter, plot_name=plot_name, notebook_metadata=notebook_metadata, plot_type=plot_type)\n"
-
    ]
   },
   {
@@ -672,7 +671,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "Clustergrammer (Fernandez et al. 2017) is a web-based tool for visualizing and analyzing high-dimensional data as interactive and hierarchically clustered heatmaps. It is commonly used to explore the similarity between samples in an RNA-seq dataset. In addition to identifying clusters of samples, it also allows to identify the genes which contribute to the clustering."
+    "Clustergrammer [2] is a web-based tool for visualizing and analyzing high-dimensional data as interactive and hierarchically clustered heatmaps. It is commonly used to explore the similarity between samples in an RNA-seq dataset. In addition to identifying clusters of samples, it also allows to identify the genes which contribute to the clustering."
    ]
   },
   {
@@ -720,7 +719,6 @@
     "    fig.show(renderer=\"png\")\n",
     "else:\n",
     "    fig.show()\n",
-
     "plot_name = \"library_size_plot.png\"\n",
     "fig.write_image(plot_name)\n",
     "figure_counter, notebook_metadata = display_object(figure_counter, \"Histogram of the total number of reads mapped for each sample. The figure contains an interactive bar chart which displays the number of samples according to the total number of reads mapped to each RNA-seq sample in the dataset. Additional information for each sample is available by hovering over the bars.\", notebook_metadata, saved_filename=plot_name, istable=False)"
@@ -737,7 +735,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "Gene expression signatures are alterations in the patterns of gene expression that occur as a result of cellular perturbations such as drug treatments, gene knock-downs or diseases. They can be quantified using differential gene expression (DGE) methods (Ritchie et al. 2015, Clark et al. 2014), which compare gene expression between two groups of samples to identify genes whose expression is significantly altered in the perturbation. "
+    "Gene expression signatures are alterations in the patterns of gene expression that occur as a result of cellular perturbations such as drug treatments, gene knock-downs or diseases. They can be quantified using differential gene expression (DGE) methods [3, 4], which compare gene expression between two groups of samples to identify genes whose expression is significantly altered in the perturbation. "
    ]
   },
   {
@@ -798,7 +796,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "Enrichment analysis is a statistical procedure used to identify biological terms which are over-represented in a given gene set. These include signaling pathways, molecular functions, diseases, and a wide variety of other biological terms obtained by integrating prior knowledge of gene function from multiple resources. Enrichr (Kuleshov et al. 2016) is a web-based application which allows to perform enrichment analysis using a large collection of gene-set libraries and various interactive approaches to display enrichment results."
+    "Enrichment analysis is a statistical procedure used to identify biological terms which are over-represented in a given gene set. These include signaling pathways, molecular functions, diseases, and a wide variety of other biological terms obtained by integrating prior knowledge of gene function from multiple resources. Enrichr [5] is a web-based application which allows to perform enrichment analysis using a large collection of gene-set libraries and various interactive approaches to display enrichment results."
    ]
   },
   {
@@ -831,7 +829,6 @@
     "    case_label = label.split(\" vs. \")[1]\n",
     "    # Run analysis\n",
     "    results['enrichr'][label] = run_enrichr(signature=signature, signature_label=label, fc_colname=fc_colname,geneset_size=gene_topk, sort_genes_by = sort_genes_by,ascending=ascending)\n",
-
     "    tmp_enrichr_link_dict = dict()\n",
     "    title_up = f\"Enrichment Analysis Result: {label} (up-regulated in {case_label})\"  \n",
     "    title_down = f\"Enrichment Analysis Result: {label} (down-regulated in {case_label})\"  \n",
@@ -842,7 +839,6 @@
     "\n",
     "enrichr_link_df = pd.DataFrame.from_dict(enrichr_link_dict).T\n",
     "table_counter, notebook_metadata = display_object(table_counter, \"The table displays links to Enrichr containing the results of enrichment analyses generated by analyzing the up-regulated and down-regulated genes from a differential expression analysis. By clicking on these links, users can interactively explore and download the enrichment results from the Enrichr website.\", notebook_metadata=notebook_metadata, saved_filename=\"enrichr_links.csv\", df=enrichr_link_df, ishtml=True)"
-
    ]
   },
   {
@@ -925,7 +921,7 @@
     "    for gene_set_library in libraries:\n",
     "        # Display results\n",
     "        plot_name = \"{}_barchart_{}.png\".format(gene_set_library, label)\n",
-    "        plot_library_barchart(enrichment_results, gene_set_library, enrichment_results['signature_label'], enrichment_results['sort_results_by'], nr_genesets=nr_genesets, plot_type=plot_type)\n",
+    "        plot_library_barchart(enrichment_results, gene_set_library, enrichment_results['signature_label'], enrichment_results['sort_results_by'], nr_genesets=nr_genesets, plot_type=plot_type, plot_name=plot_name\n",
     "        figure_counter, notebook_metadata = display_object(figure_counter, \"Enrichment Analysis Results for {} in {}. The figure contains interactive bar charts displaying the results of the pathway enrichment analysis generated using Enrichr. The x axis indicates the -log10(P-value) for each term. Significant terms are highlighted in bold. Additional information about enrichment results is available by hovering over each bar.\".format(label, gene_set_library), notebook_metadata, saved_filename=plot_name, istable=False)\n",
     "{% endif %}"
    ]
@@ -1124,45 +1120,46 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
+    "1. Clark, N.R. and Ma’ayan, A. (2011) Introduction to statistical methods to analyze large data sets: principal components analysis. Sci. Signal., 4, tr3-tr3.\n",
+    "<br>\n",
+    "2. Fernandez, Nicolas F., et al. \"Clustergrammer, a web-based heatmap visualization and analysis tool for high-dimensional biological data.\" Scientific data 4 (2017): 170151.\n",
+    "<br>\n",
+    "3. Ritchie, Matthew E., et al. \"limma powers differential expression analyses for RNA-sequencing and microarray studies.\" Nucleic acids research 43.7 (2015): e47-e47.\n",
+    "<br>\n",
+    "4. Clark, Neil R., et al. \"The characteristic direction: a geometrical approach to identify differentially expressed genes.\" BMC bioinformatics 15.1 (2014): 79.\n",
+    "<br>\n",
+    "5. Kuleshov, M.V., Jones, M.R., Rouillard, A.D., Fernandez, N.F., Duan, Q., Wang, Z., Koplev, S., Jenkins, S.L., Jagodnik, K.M. and Lachmann, A. (2016) Enrichr: a comprehensive gene set enrichment analysis web server 2016 update. Nucleic acids research, 44, W90-W97.\n",
+    "<br>\n",
+    "\n",
     "Agarwal, Vikram, et al. \"Predicting effective microRNA target sites in mammalian mRNAs.\" elife 4 (2015): e05005.\n",
     "<br>\n",
     "Ashburner, M., Ball, C.A., Blake, J.A., Botstein, D., Butler, H., Cherry, J.M., Davis, A.P., Dolinski, K., Dwight, S.S. and Eppig, J.T. (2000) Gene Ontology: tool for the unification of biology. Nature genetics, 25, 25.\n",
     "<br>\n",
     "Chou, Chih-Hung, et al. \"miRTarBase 2016: updates to the experimentally validated miRNA-target interactions database.\" Nucleic acids research 44.D1 (2016): D239-D247.\n",
     "<br>\n",
-    "Clark, N.R. and Ma’ayan, A. (2011) Introduction to statistical methods to analyze large data sets: principal components analysis. Sci. Signal., 4, tr3-tr3.\n",
-    "<br>\n",
-    "Clark, Neil R., et al. \"The characteristic direction: a geometrical approach to identify differentially expressed genes.\" BMC bioinformatics 15.1 (2014): 79.\n",
-    "<br>\n",
     "Consortium, E.P. (2004) The ENCODE (ENCyclopedia of DNA elements) project. Science, 306, 636-640.\n",
     "<br>\n",
     "Croft, David, et al. \"The Reactome pathway knowledgebase.\" Nucleic acids research 42.D1 (2014): D472-D477.\n",
     "<br>\n",
     "Duan, Q., et al. \"L1000cds2: Lincs l1000 characteristic direction signatures search engine. NPJ Syst Biol Appl. 2016; 2: 16015.\" (2016).\n",
     "<br>\n",
-    "Fernandez, Nicolas F., et al. \"Clustergrammer, a web-based heatmap visualization and analysis tool for high-dimensional biological data.\" Scientific data 4 (2017): 170151.\n",
-    "<br>\n",
     "Kanehisa, M. and Goto, S. (2000) KEGG: kyoto encyclopedia of genes and genomes. Nucleic acids research, 28, 27-30.\n",
     "<br>\n",
     "Kelder, Thomas, et al. \"WikiPathways: building research communities on biological pathways.\" Nucleic acids research 40.D1 (2012): D1301-D1307.\n",
     "<br>\n",
-    "Kuleshov, M.V., Jones, M.R., Rouillard, A.D., Fernandez, N.F., Duan, Q., Wang, Z., Koplev, S., Jenkins, S.L., Jagodnik, K.M. and Lachmann, A. (2016) Enrichr: a comprehensive gene set enrichment analysis web server 2016 update. Nucleic acids research, 44, W90-W97.\n",
-    "<br>\n",
     "Lachmann, A., Xu, H., Krishnan, J., Berger, S.I., Mazloom, A.R. and Ma'ayan, A. (2010) ChEA: transcription factor regulation inferred from integrating genome-wide ChIP-X experiments. Bioinformatics, 26, 2438-2444.\n",
     "<br>\n",
     "Lachmann, Alexander, and Avi Ma'ayan. \"KEA: kinase enrichment analysis.\" Bioinformatics 25.5 (2009): 684-686.\n",
     "<br>\n",
-    "Ritchie, Matthew E., et al. \"limma powers differential expression analyses for RNA-sequencing and microarray studies.\" Nucleic acids research 43.7 (2015): e47-e47.\n",
-    "<br>\n",
     "Wang, Zichen, et al. \"L1000FWD: fireworks visualization of drug-induced transcriptomic signatures.\" Bioinformatics 34.12 (2018): 2150-2152."
    ]
   }
  ],
  "metadata": {
   "kernelspec": {
-   "display_name": "py39",
+   "display_name": "Python 3",
    "language": "python",
-   "name": "py39"
+   "name": "python3"
   },
   "language_info": {
    "codemirror_mode": {
@@ -1174,9 +1171,9 @@
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
-   "version": "3.9.5"
+   "version": "3.9.2"
   }
  },
  "nbformat": 4,
- "nbformat_minor": 2
-}
+ "nbformat_minor": 4
+}