Matplotlib

In this lecture we will talk about how to produce scientific graphs using the python library matplotlib. Matplotlib provides a number of functions that will allow you to quickly and easily produce a variety of useful, pretty graphs.

Matplotlib is not included by default with Python. It is a separate python library that is downloaded and installed alongside python. If you installed python using anaconda, you probably already have it.

In order to use matplotlib you need to import it like this

import matplotlib.pyplot as plt

The name after as is just an abbreviation for the library. That means that, whenever we want to use a function from matplotlib, we will prefix it with plt.

DISCLAIMER: The graphs we will plot here are highly customizable, and we are far from exhausting every configuration option. We encourage you to check matplotlib's documentation and the gallery of examples to find out more.

Prelude

The code below is similar for all that comes afterwards, so we will keep it once here to avoid having to copy/paste it everywhere.

Plot

The plot graph simply plots whatever coordinates we pass to it.

If we pass coordinates separately, it plots the points:

Alternatively, we can pass the x coordinates as a list (first parameter) and the y coordinates as a list (second parameter). In this case, mathplotlib will connect the dots. Observe that the lists must be in the correct order. That means that (x,y) coordinates of the same point need to be at the same position in both lists.

The closer the coordinates are, the smoother the line will be. The function below plots a graph of the sine function, using x coordinates at every 0.1.

We can plot more than one graph in one.

Pie charts

The pie function from matplotlib plots pie charts. It expects a list of values as a parameter, and it will create one pie slice for each one of those values. The whole pie corresponds to the sum of values.

Optionally, we can pass a named parameter to this function to determine the label of each slice. The name of this parameter is labels, and the label must be at the same position in the list as its slice.

Another useful optional paramter for the pie function is autopct="%.1f%%" which writes the percentage of each slice on the pie.

Recall: named parameters

Named parameters

Functions can have named parameters, and this way the order of parameters can be changed, or some of them can be ommitted (taking their default value). Many functions in the python libraries have named parameters. For example, when you use sorted(L, reverse=True), you are using the named paramter reverse from the sorted function.

Histograms

The hist function plots histograms. A histogram takes a list of values, organizes them into bins, and then graphs how many items are in each bin. Note that the order of this list of values does not matter.

Let's start with a basic example that takes a small list of numbers and generates a histogram using the bins [0,2) (2 is not included in the bin), [2,4) (4 is not included in the bin), and [4,6]. Observe that the bin list specifies where each bin begins, and where the last bin ends.

As you can see from the output, there is one number in the range [0,2), four numbers in the range [2,4), and three numbers in the range [4,6].

Instead of including a list indicating the start and end point of the bins, you can also just specify the total number of bins you want and matplotlib will automatically generate the bin ranges. Matplotlib generates the bins by taking the lowest and highest values in the list, and splitting this interval into the number of bins requested.

There are a number of simple arguments you can pass to hist in order to improve the appearance of the histogram. From the example below, can you figure out what rwidth and color do? Change the values and experiment to see what happens to the graph.

Bar charts

The function bar is used to plot bar charts. A bar chart looks a lot like a histogram, but the difference is that the "bins" on the x-axis may be completely unrelated categories, while in histograms these are continuous values.

The bar function takes as paramters a list of the positions of the bars on the x-axis, and a list of the heights of each bar. Optionally you can define the named parameter labels as the list of labels for the bars to be placed on the x-axis.

Subplots

Sometimes you want to produce multiple different plots and display all of them at the same time, but not on the same set of axes. We can accomplish this with subplots.

The subplot function can be used to allow multiple plots to be displayed in the same figure. It takes as arguments three numbers: the number of rows, the number of columns, and which subplot you want to activate. For example, plt.subplot(121) arranges the subplots in a grid with 1 row and two columns, and then activate the first subplot (in this case, the left one). plt.subplot(122) arranges the subplots the same way, but activate the 2nd subplot.

Consider the following example that graphs both sine and cosine in the same figure, but on different subplots.

Exercise

Use the file https://web2.qatar.cmu.edu/cs/15110/resources/zoo.csv containing information about animals to build a pie chart showing the division of animals per class. The class is indicated in the type column by a number ranging from 1 to 7. The mapping of numbers to classes is:

  1. Mammals
  2. Birds
  3. Reptiles
  4. Fish
  5. Amphibians
  6. Insects
  7. Others