Setting Up Jupyter Notebook

Introduction

My research work involves the use of simulation to verify the theoretical calculations I make. These verifications take the form of Monte Carlo simulations for probability values, and multiple graphical plots for comparison of theoretical and observed functions. In order to keep the data generated through these experiments as well the source code accessible to all, I organise them into Jupyter notebooks ⟨jupyter-nb⟩. Jupyter Notebooks are interactive Web GUIs that utilise the interactive Python Shell. They store Python code (and Markdown for documentation) in “cells”. These cells can be executed by the backend and the results are associated with the corresponding cells. These results can be text, tables, images, and more, thereby making this technology stack extremely useful for researchers.

Prerequisites

This tutorial assumes that we are able to use the Terminal on our machine. If one is a beginner, I suggest these resources:

  1. The Command Line for Complete Beginners by Flavio Copes. ⟨copes-cli⟩
  2. Official Microsoft blogpost on the Windows Terminal. ⟨windows-terminal⟩
  3. Command Line Crash Course by Free Code Camp. ⟨fcc-cli⟩

The next requirement is the ability to install software either through the CLI using package managers (like Scoop ⟨scoop-win⟩ for Windows, Homebrew ⟨homebrew-macos⟩ on MacOS, the default package manager ⟨linux-cli⟩ available on Linux), or through the official websites. We will be needing the following software:

  1. Git - the version control system software that we will use to obtain repositories from Github. ⟨git-scm⟩
  2. Python - the language that Jupyter Notebook is built on. It is recommended to use the latest version available at the time on the official website (or in the package manager’s listing). ⟨python-web⟩
  3. Pip - the package manager for Python that comes pre-installed with the latest versions of Python. In case it does not come out of the box, we can find it through our package manager.

After this, make sure that these programs are available on our PATH variable. That is, we can run these three commands from any directory. If we have installed git and python correctly, we should not face this problem. In case there are any errors, feel free to contact me.

Obtaining The Repository

The reasearch work that I do contains the data, figures as well as the source code organised into Jupyter notebooks. I group all relevant notebooks of the same project into a repository (essentially, a folder) and upload it to Github ⟨github⟩ (for the time being). One needs to obtain this source code from Github to their local machine to be able to look at the pre-stored data in these notebooks, or to run the simulations again to generate new tables and graphs. We achieve this using git.

Suppose the repository we are interested in is root-from-parameters. Follow these steps to “clone” the repository on our local machine:

Step 1: Navigating to the Desired Directory

We navigate to a folder we want to store our work in. This should be done using the command-line and most likely, the cd command.

Step 2: Cloning the Repository

We type the following command and hit Return⏎ :

> git clone https://github.com/hungrybluedev/root-from-parameters.git

Note that the URL is mostly unchanged. There is just an added .git at the end. We can also obtain the correct link by navigating to the Github page of the repository, clicking on the Code button, then on HTTPS, and copying the link generated.

Step 3: Moving into the Root of the Repository

cd into this newly downloaded folder. We are now in the root directory of the repository.

Creating a Virtual Environment

I highly recommend creating a virtual environment for Python projects. I learned the importance of isolation and proper dependency management the hard way. Often, older packages work flawlessly until Python or a few other dependency packages get updated, thereby breaking everything. Also, it is probably not prudent to have several packages installed globally on our system; it paves the way for the common “but it works on my machine” problem and hours of debugging and dependency tracking ensue. Virtual environments help remove some of these difficulties.

Step 1: Installing virtualenv

First, we need to make sure that we have virtualenv installed. Type the following command into the terminal and hit Return⏎ :

> pip install virtualenv

This should be the only global Python package we ever need to install. We will isolate all other packages for every project from now on. For any other repositories that we clone, we do not need to repeat this step.

Step 2: Creating a Virtual Environment for our Repository

Now we create a virtual environment specifically for the project we have just cloned:

> virtualenv statsenv

Step 3: Activating the Virtual Environment

This step is important. If we skip this, the dependencies will be installed globally and not in our virtual environment.

For Windows:

> ./statsenv/Scripts/activate

For MacOS/Linux:

> . statsenv/bin/activate

There might be a visual change to the terminal prompt, suggesting that we have entered a virtual environment.

Step 4: Downloading the Required Packages

This step might vary from project to project. However, there is a high chance these specific packages will be used over and over:

> pip install scipy numpy jupyter pandas seaborn statsmodels

These packages will be installed in our virtual environment and this will leave other projects undisturbed.

Using the Notebooks

Now, we’re all set up; we can proceed to view the notebooks and run the simulations ourselves if necessary.

Step 0: Make Sure Virtual Environment Is Active

First, we make sure that we are indeed inside the virtual environment. If not, refer to Step 3: Activating the Virtual Environment.

Step 1: Starting the Jupyter Notebook Server

From the root directory, enter the command:

> jupyter notebook

A web page will open in the default browser. Clicking on the required notebook will open it in a new tab where the code can be run and the results can be verified.

Step 2: Getting Around

Click on any notebook from the list and a new tab will open containing its contents. We can read the contents of the notebook as is, or run the cells. We can refer to the context menus at the top for more information and keyboard shortcuts.

It is recommended to have a rudimentary idea about Python as well. I recommend CrashPy ⟨crashpy⟩ by Sourav Sen Gupta to learn about the basics. There are several other resources available in the README of that repository as well.

Conclusion

In this tutorial, we set up a robust, isolated environment to clone repositories containing Jupyter Notebooks and are now able to reproduce experiments on our local machines. I hope this tutorial was simple enough to follow. In case there are any errors please contact me and I will get them corrected.

Resources

  1. Jupyter↗ - The project that maintains Jupyter Notebook, among other useful Python-based Open Source tools. ⟨jupyter-nb⤴⟩
  2. The Command Line for Complete Beginners↗ - An overview of what a CLI is, and how to get started. ⟨copes-cli⤴⟩
  3. Getting starting with Windows Terminal↗ - An official blog post with Microsoft for helping people get started with the Windows Terminal. ⟨windows-terminal⤴⟩
  4. Command Line Crash Course↗ - A talk introducing the UNIX-based CLI by FreeCodeCamp. Recommended for beginners. ⟨fcc-cli⤴⟩
  5. Scoop↗ - A popular “package manager” for Windows that started in 2013. ⟨scoop-win⤴⟩
  6. Homebrew↗ - Advertised as the missing package manager for MacOS. ⟨homebrew-macos⤴⟩
  7. Installing Software on Linux↗ - An in-depth tutorial demonstrating the process of installing software through package managers. ⟨linux-cli⤴⟩
  8. Git↗ - The most popular version control system/source code management software used on the internet. ⟨git-scm⤴⟩
  9. Python↗ - An easy to learn and use programming language that is widely used in academia. It is also the basis for Jupyter. ⟨python-web⤴⟩
  10. My Github Profile↗ - Here one can find all the repositories I have created where I store Jupyter Notebooks. ⟨github⤴⟩
  11. CrashPy↗ - A Crash Course in Python by Sourav Sen Gupta. ⟨crashpy⤴⟩