My research work involves the use of simulation to verify the theoretical calculations I make. These verifications take the form of Monte Carlo simulations for probability values, and multiple graphical plots for comparison of theoretical and observed functions. In order to keep the data generated through these experiments as well the source code accessible to all, I organise them into Jupyter notebooks 〈jupyter-nb〉. Jupyter Notebooks are interactive Web GUIs that utilise the interactive Python Shell. They store Python code (and Markdown for documentation) in “cells”. These cells can be executed by the backend and the results are associated with the corresponding cells. These results can be text, tables, images, and more, thereby making this technology stack extremely useful for researchers.
This tutorial assumes that we are able to use the Terminal on our machine. If one is a beginner, I suggest these resources:
- The Command Line for Complete Beginners by Flavio Copes. 〈copes-cli〉
- Official Microsoft blogpost on the Windows Terminal. 〈windows-terminal〉
- Command Line Crash Course by Free Code Camp. 〈fcc-cli〉
The next requirement is the ability to install software either through the CLI using package managers (like Scoop 〈scoop-win〉 for Windows, Homebrew 〈homebrew-macos〉 on MacOS, the default package manager 〈linux-cli〉 available on Linux), or through the official websites. We will be needing the following software:
- Git - the version control system software that we will use to obtain repositories from Github. 〈git-scm〉
- Python - the language that Jupyter Notebook is built on. It is recommended to use the latest version available at the time on the official website (or in the package manager’s listing). 〈python-web〉
- Pip - the package manager for Python that comes pre-installed with the latest versions of Python. In case it does not come out of the box, we can find it through our package manager.
After this, make sure that these programs are available on our
PATH variable. That is, we can run these three commands from any directory. If we have installed
python correctly, we should not face this problem. In case there are any errors, feel free to contact me.
Obtaining The Repository
The reasearch work that I do contains the data, figures as well as the source code organised into Jupyter notebooks. I group all relevant notebooks of the same project into a repository (essentially, a folder) and upload it to Github
〈github〉 (for the time being). One needs to obtain this source code from Github to their local machine to be able to look at the pre-stored data in these notebooks, or to run the simulations again to generate new tables and graphs. We achieve this using
Suppose the repository we are interested in is root-from-parameters. Follow these steps to “clone” the repository on our local machine:
Step 1: Navigating to the Desired Directory
We navigate to a folder we want to store our work in. This should be done using the command-line and most likely, the
Step 2: Cloning the Repository
We type the following command and hit Return⏎ :
> git clone https://github.com/hungrybluedev/root-from-parameters.git
Note that the URL is mostly unchanged. There is just an added
.git at the end. We can also obtain the correct link by navigating to the Github page of the repository, clicking on the Code button, then on HTTPS, and copying the link generated.
Step 3: Moving into the Root of the Repository
cd into this newly downloaded folder. We are now in the root directory of the repository.
Creating a Virtual Environment
I highly recommend creating a virtual environment for Python projects. I learned the importance of isolation and proper dependency management the hard way. Often, older packages work flawlessly until Python or a few other dependency packages get updated, thereby breaking everything. Also, it is probably not prudent to have several packages installed globally on our system; it paves the way for the common “but it works on my machine” problem and hours of debugging and dependency tracking ensue. Virtual environments help remove some of these difficulties.
Step 1: Installing
First, we need to make sure that we have
virtualenv installed. Type the following command into the terminal and hit Return⏎
> pip install virtualenv
This should be the only global Python package we ever need to install. We will isolate all other packages for every project from now on. For any other repositories that we clone, we do not need to repeat this step.
Step 2: Creating a Virtual Environment for our Repository
Now we create a virtual environment specifically for the project we have just cloned:
> virtualenv statsenv
Step 3: Activating the Virtual Environment
This step is important. If we skip this, the dependencies will be installed globally and not in our virtual environment.
> . statsenv/bin/activate
There might be a visual change to the terminal prompt, suggesting that we have entered a virtual environment.
Step 4: Downloading the Required Packages
This step might vary from project to project. However, there is a high chance these specific packages will be used over and over:
> pip install scipy numpy jupyter pandas seaborn statsmodels
These packages will be installed in our virtual environment and this will leave other projects undisturbed.
Using the Notebooks
Now, we’re all set up; we can proceed to view the notebooks and run the simulations ourselves if necessary.
Step 0: Make Sure Virtual Environment Is Active
First, we make sure that we are indeed inside the virtual environment. If not, refer to Step 3: Activating the Virtual Environment.
Step 1: Starting the Jupyter Notebook Server
From the root directory, enter the command:
> jupyter notebook
A web page will open in the default browser. Clicking on the required notebook will open it in a new tab where the code can be run and the results can be verified.
Step 2: Getting Around
Click on any notebook from the list and a new tab will open containing its contents. We can read the contents of the notebook as is, or run the cells. We can refer to the context menus at the top for more information and keyboard shortcuts.
It is recommended to have a rudimentary idea about Python as well. I recommend CrashPy 〈crashpy〉 by Sourav Sen Gupta to learn about the basics. There are several other resources available in the README of that repository as well.
In this tutorial, we set up a robust, isolated environment to clone repositories containing Jupyter Notebooks and are now able to reproduce experiments on our local machines. I hope this tutorial was simple enough to follow. In case there are any errors please contact me and I will get them corrected.