Skip to content

Latest commit

 

History

History
158 lines (107 loc) · 13 KB

File metadata and controls

158 lines (107 loc) · 13 KB

Environment Setup

If you want to quickly start using Python for data-science-related work, it is recommended that you directly install Anaconda, and then use the Notebook or JupyterLab tools integrated in Anaconda to write code. This is because for beginners, first installing the official Python interpreter and then installing the third-party libraries that will be used in work one by one is relatively troublesome. Especially in the Windows environment, installation often fails because of missing build tools or DLL files, and beginners usually find it hard to take the correct action based on the error message, so it is easy to develop a strong sense of frustration. If there is already a Python interpreter environment on the computer, you can also directly use Python's package-management tool pip to install Jupyter, and then install third-party libraries according to actual work needs. This way is more suitable for users who already have some experience.

Installing and Using Anaconda

For individual users, you can download the "Individual Edition" installer from the official Anaconda website. After the installation is finished, your computer will not only have a Python environment and Spyder, which is an integrated development tool similar to PyCharm, but will also have nearly 200 tool packages related to data-science work, including the three great tools of Python data analysis that we mentioned above. In addition, Anaconda also provides a package-management tool named conda. Through this tool, we can not only manage Python packages, but also use it to create virtual environments for running Python programs.

As shown in the picture above, you can choose an installer suitable for your operating system through the download link provided on the Anaconda official website. It is recommended that everyone choose the graphical installer. After the download is finished, double-click the installer to start the installation. During installation, the default settings are usually fine. After the installation is complete, macOS users can find an application named Anaconda-Navigator in Applications or Launchpad. Running this application shows an interface like the one below, where we can choose the action we need to perform.

For Windows users, it is recommended to install Anaconda according to the prompts and suggested options of the installation wizard. There is basically nothing to choose except the installation path. After the installation is finished, you can find Anaconda3 in the Start Menu.

Tip: You can choose Miniconda as an alternative to Anaconda. Miniconda only installs the Python interpreter environment and some necessary tools, and other third-party libraries are installed by the user when needed. Actually, I personally do not like Anaconda very much, because it is for beginner users. Once we already have a Python environment, we can completely install the third-party libraries we need according to our own wishes.

conda commands

For non-beginner users, if you want to use the conda tool to manage dependencies or create virtual environments for projects, you can use conda commands in the terminal or command prompt. Windows users can find Anaconda3 in the Start Menu, and then click Anaconda Prompt or Anaconda PowerShell to start a command-line prompt that supports conda. If beginner users want to create new virtual environments or manage third-party libraries, meaning dependencies, it is recommended to directly use Environments in Anaconda-Navigator to manage virtual environments and dependencies visually.

  1. Version and help information.

    • Check version: conda -V or conda --version
    • Get help: conda -h or conda --help
    • Related information: conda list
  2. Virtual-environment related commands.

    • Show all virtual environments: conda env list
    • Create a virtual environment: conda create --name venv
    • Create a virtual environment with a specified Python version: conda create --name venv python=3.7
    • Create a virtual environment with a specified Python version and install specified dependencies: conda create --name venv python=3.7 numpy pandas
    • Create a virtual environment by cloning an existing virtual environment: conda create --name venv2 --clone venv
    • Share a virtual environment and redirect it into a specified file: conda env export > environment.yml
    • Create a virtual environment from a shared virtual environment file: conda env create -f environment.yml
    • Activate a virtual environment: conda activate venv
    • Leave a virtual environment: conda deactivate
    • Delete a virtual environment: conda remove --name venv --all

    Note: In the commands above, venv and venv2 are the names of the virtual-environment folders. You can replace them with any name you like, but it is strongly recommended to use English names and not use special characters.

  3. Package, meaning third-party library or tool, management.

    • Show installed packages: conda list
    • Search for a specified package: conda search matplotlib
    • Install a specified package: conda install matplotlib
    • Update a specified package: conda update matplotlib
    • Remove a specified package: conda remove matplotlib

    Note: When searching, installing, and updating packages, it connects to the official website by default. If you think the speed is not strong enough, you can replace the default official website with a domestic mirror website. The Tsinghua University open-source mirror website is recommended. The commands for changing the default source to a domestic mirror are conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/free/ and conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main. If you need to switch back to the default source, you can use the command conda config --remove-key channels.

Installing and Using JupyterLab

Installation and startup

If Anaconda is already installed, you can directly start Notebook or JupyterLab in Anaconda-Navigator as mentioned above. According to the official statement, JupyterLab is the next generation of Notebook and provides a friendlier interface and more powerful functions, so we also recommend everyone use JupyterLab. Windows users can also open Anaconda Prompt or Anaconda PowerShell from the Start Menu. Because the default Anaconda virtual environment is already activated, you only need to enter the command jupyter lab to start JupyterLab. On macOS, after Anaconda is installed, the default Anaconda virtual environment is automatically activated every time you open the terminal, so you can also start JupyterLab by entering the command jupyter lab.

For users who have installed a Python environment but not Anaconda, you can use Python's package-management tool pip to install JupyterLab. After the installation succeeds, run the command jupyter lab in the terminal or command prompt to start JupyterLab, as shown below.

Install JupyterLab:

pip install jupyterlab

Install the three great tools of Python data analysis:

pip install numpy pandas matplotlib

Start JupyterLab:

jupyter lab

JupyterLab is a web-based application for interactive computing. It can be used for code development, document writing, code running, and result presentation. Simply speaking, you can directly write code and run code on a web page, and the running results of the code are also directly shown below the code cell. If you need to write explanatory documents while writing code, you can write them on the same page in Markdown format, and you can directly see the rendered effect. In addition, the original design purpose of Notebook was to provide a working environment that can support many programming languages. At present, it can support more than 40 programming languages, including Python, R, Julia, Scala, and so on.

First, we can create a Notebook for writing Python code, as shown below.

Next, we can write code, write documents, and run programs, as shown below.

Usage tips

If you use Python for engineering-style project development, PyCharm is definitely the best choice. It provides all the functions that an integrated development environment should have. In particular, functions such as smart hints, code completion, and automatic error correction make developers feel very comfortable. If you use Python for data-science-related work, JupyterLab is not inferior to PyCharm, and JupyterLab is even better in data and chart presentation. Because of this, JetBrains also specially developed a new tool, DataSpell, to compete with JupyterLab. Interested readers can learn about it by themselves. Below, we introduce some tips for using JupyterLab, hoping they can help everyone improve work efficiency.

  1. Auto-completion. When writing code in JupyterLab, pressing the Tab key gives code hints and completion.

  2. Getting help. If you want to know the related information or usage of an object, such as a variable, class, or function, you can put ? after the object and run the code. The matching information will be shown below the window to help us understand the object, as shown below.

  3. Searching names. If you only remember part of the name of a class or function, you can use the wildcard * together with ? to search, as shown below.

  4. Running commands. In JupyterLab, you can run system commands by putting ! before the system command.

  5. Magic commands. There are many very interesting and useful magic commands in JupyterLab. For example, you can use %timeit to test the running time of a statement, and %pwd to see the current working directory. If you want to see all magic commands, you can use %lsmagic. If you want to understand how magic commands are used, you can use %magic to see it, as shown below.

    Common magic commands are shown below.

    Magic Command Description
    %pwd Show the current working directory
    %ls List the contents under the current or specified folder
    %cat Show the contents of the specified file
    %hist Show input history
    %matplotlib inline Set matplotlib output charts to be embedded in the page
    %config Inlinebackend.figure_format='svg' Set charts to use SVG format
    %run Run the specified program
    %load Load the specified file into a cell
    %quickref Show the quick reference of IPython
    %timeit Run code many times and count execution time
    %prun Use cProfile.run to run code and show profiler output
    %who / %whos Show the variables in the namespace
    %xdel Delete an object and clear all references to it
  6. Shortcuts. Many operations in JupyterLab can be done through shortcuts, and using shortcuts can improve work efficiency. The shortcuts of JupyterLab can be divided into shortcuts in command mode and shortcuts in edit mode. The so-called edit mode is the mode where you are entering code or writing documents. In edit mode, pressing Esc can go back to command mode. In command mode, pressing Enter can enter edit mode.

    Shortcuts in command mode:

    Shortcut Description
    Alt + Enter Run the current cell and insert a new cell below
    Shift + Enter Run the current cell and select the cell below
    Ctrl + Enter Run the current cell
    j / k, Shift + j / Shift + k Select the cell below / above, continuously select cells below / above
    a / b Insert a new cell below / above
    c / x Copy a cell / cut a cell
    v / Shift + v Paste a cell below / above
    dd / z Delete a cell / restore a deleted cell
    Shift + l Show or hide line numbers for the current / all cells
    Space / Shift + Space Scroll the page down / up

    Shortcuts in edit mode:

    Shortcut Description
    Shift + Tab Get hint information
    Ctrl + ] / Ctrl + [ Increase / decrease indentation
    Alt + Enter Run the current cell and insert a new cell below
    Shift + Enter Run the current cell and select the cell below
    Ctrl + Enter Run the current cell
    Ctrl + Left / Right Move the cursor to the start / end of the line
    Ctrl + Up / Down Move the cursor to the start / end of the code
    Up / Down Move the cursor up / down one line or move to the previous / next cell

    Note: On macOS, you can replace the Alt key with the Option key, and replace the Ctrl key with the Command key.