Skip to content

Latest commit

 

History

History
221 lines (182 loc) · 14.7 KB

File metadata and controls

221 lines (182 loc) · 14.7 KB

New to Python?

What is Python?

Python is a general purpose programming language that can be used in a number of settings, from website development to robotics. For our purposes, one common usage of Python is in data analysis and machine learning.

While many researchers use R, a different language, for data analysis, Python also has important strengths in data analytics, especially in image analysis and natural language processing. Often, data analysis takes place in a special format of Python language called a "notebook". If you've heard of "iPython" notebooks or "Jupyter" notebooks or labs, that's the format we're talking about. You may have also seen a notebook format in Google Collaboratory (or Colab) notebooks. These notebooks allow your code to be interspersed with formatted text that is intended to communicate with other humans, not with a computer. In Arcus, we provide the JupyterLab environment to provide users with notebook functionality.

""

Additionally, Python is widely used in machine learning, a computational method that helps develop models that can classify data and make predictions on new data. Often the development of machine learning takes place in notebooks, which makes trial and error and human intervention easy, and is then, once successful, is scaled up for production use in an automated form that doesn't use notebooks but rather raw python code for speed and efficiency.

What Makes Python Popular?

Like R, Python is free and open source, and promotes research reproducibility.

Arcus-Specific Python Training

Arcus On-Ramp

If you're already an Arcus user (you've signed our Terms of Use and completed CITI training), you can sign up for our Arcus On-Ramp webinars. In these webinars, you work in a real Arcus lab analyzing CHOP's electronic health record (EHR) to replicate an actual published study. Workshops focus either on exploring the data and defining a query for your study using SQL, or running the analysis in R/Python. No coding experience is required to attend. Registration closes one week before each workshop so we have time to add registered attendees as users in the webinar training lab. To sign up, please visit https://arcus.chop.edu/education/webinar-signup/. This link is only available for Arcus customers on the CHOP network.

Lab Training Videos

""

For an example of how to use Python / JupyterLab in your Arcus lab, start with the training videos on your lab's landing page.

These are very introductory, but help you understand specifically how to work with your Arcus lab.

We strongly encourage you to watch all of the videos, in order, even the ones that don't refer to Python specifically. It's only about an hour of your time, and we think it will answer many of your questions and save time in the long run.

Additional Resources

Arcus training is a great place to get started with your Python education, but you will probably want to continue your education on your own, growing in skills that are specific to your own research goals or career needs.

You have several options when it comes to growing in your Python skills.

There are a number of university classes, online courses and live workshops that go in depth about how to use Python. Simply search for courses at the university or MOOC (e.g. Coursera) you prefer to use.

When looking for a course in Python, it's important to search specifically for phrases like "Python for data analysis" or "Python notebook". Python is a broad language with many kinds of training courses associated, and there are many courses out there that won't make sense for your use case. You don't want to waste hours or days learning the kind of Python that's best for building desktop applications, for example, if what you really want to learn is how to write the kind of Python that's used for data analysis.

If you prefer something a bit more "just in time", however, we suggest the Python modules from the DART (Data and Analytics for Research Training) program.

DART includes dozens of data science modules that are each 1 hour or less in duration and with a narrow focus and clear learning objectives. They are asynchronous and you can take them at any time!

Arcus Education's DART modules are the result of a study funded by an NIH grant aimed at educating biomedical researchers. The active research phase of this program is complete, so we are no longer recruiting learners to be our subjects. However, if you'd like to receive updates about publications or applications of this research, please email us at dart@chop.edu.

Training modules:

To begin learning R, there are a couple of options with regard to the DART self-guided tutorial modules.

If you want a comprehensive curriculum of nearly twenty modules, you might enjoy our Suggested Pathway 5: Analysis in Python curriculum, which includes overview materials about reproducible research and data organization, introductory material in Python, and some advanced topics you'll need as a biomedical researcher. While you're there, check out the other suggested pathways, too!


Expand to see a sneak preview of Suggested Pathway 5: Analysis in Python!


Order Module Description Estimated Time
1 Reproducibility, Generalizability, and Reuse This module provides learners with an approachable introduction to the concepts and impact of research reproducibility, generalizability, and data reuse, and how technical approaches can help make these goals more attainable. 60 min
2 How to Troubleshoot Learning to use technical methods like coding and version control in your research inevitably means running into problems. Learn practical methods for troubleshooting and moving past error codes and other difficulties. 30 min
3 Learning to Learn Data Science Discover how learning data science is different than learning other subjects. 20 min
4 Demystifying Python This module introduces the Python programming language, explores why Python is useful in research, and describes how to download Python and Jupyter. 20 min
5 Directories and File Paths In this module, learners will explore what a directory is and how to describe the location of a file using its file path. 15 min
6 Python Basics: Functions, Methods, and Variables Learn the foundations of writing Python code, including the use of functions, methods, and variables. 20 min
7 Python Basics: Lists and Dictionaries Learn about collection objects, specifically lists and dictionaries, in Python. 15 min
8 Python Basics: Loops and Conditionals Learn how to use loops and conditional statements in Python. 20 min
9 Python Basics: Exercise Practice the skills acquired in the Python Basics sequence by working through an exercise. 30 min
10 Transform Data with pandas This is an introduction to transforming data using a Python library named pandas. 60 min
11 Tidy Data Tidy is a technical term in data analysis and describes an optimal way for organizing data that will be analyzed computationally. 45 min
12 Data Visualization in Open Source Software Introduction to principles of data vizualization and typical data vizualization workflows using two common open source libraries: ggplot2 and seaborn. 20 min
13 Data Visualization in seaborn This module includes code and explanations for several popular data visualizations using python's seaborn library. It also includes examples of how to modify seaborn plots to customize them for different uses. 60 min
14 Introduction to Null Hypothesis Significance Testing This is an introduction to NHST for biomedical researchers. 40 min
15 Statistical Tests in Open Source Software This module provides an overview of the most commonly used kinds of statistical tests and links to code for running many of them in both R and python. 20 min
16 Python Practice Use the basics of Python coding, data transformation, and data visualization to work with real data. 60 min
17 Demystifying Machine Learning An approachable and practical introduction to machine learning for biomedical researchers. 60 min
18 Understanding the Bias-Variance Tradeoff The bias-variance tradeoff is a central issue in nearly all machine learning analyses. This module explains what the tradeoff is, why it matters for machine learning, and what you can do to manage it in your own analyses. 20 min




If these pathways are close, but not quite right, you can also build your own pathway through these materials using our prototype curriculum development tool at https://learn.arcus.chop.edu.

If you're in a hurry and and you want to just get a bit of specific Python instruction, we recommend starting with these modules:

Additionally, beyond the NIH grant, we have other articles and miscellany we suggest, whether those are resources we've created in Arcus, or things we recommend from the larger Python community.

Compendia of Resources:

  • Our "Python 101" Guide includes links to articles, webinars, and other materials on a variety of topics.