5. Plot, plot, plots

Let’s jump right in: plotting. In this section, we will look at THE python plotting package: matplotlib. We will also learn how to import some common data formats and turn measurement data into nice looking plots. We We will get to know three external python libraries in this chapter:

  1. matplotlib: for plotting

  2. numpy: for new data types that can handle arrays of data

  3. scipy: for advanced operations on such arrays

“External libraries” are libraries that do not come with python by default. They need to be installed first. If you are using Anaconda, then you can install them via the “Anaconda” package manager (in the default Anaconda environment, these three libraries are typically already installed).

5.1. import

In the previous chapter, we already saw how packages/libraries are imported. Below, we see two extensions of the basic import syntax. First, packages can have sub-packages. In the case of matplotlib, we use the pyplot sub-package, that gives us easy access to plotting commands. Second, we can rename imports, to save time on typing. In this case, we import matplotlib.pyplot and rename it as plt. Now, to call function from matplotlib.pyplot we can just type plt. and then the function name.

import matplotlib.pyplot as plt
import numpy as np
import scipy as sp

To enable interactive plotting inside the jupyter notebook, we use the “magic” %matplotlib widget function. This automatically makes plots zoomable and draggable.

# this is a magic command, that only works in Jupyter
# it tells Jupyter how to display the plots
# widgets are interactive: plots can be zoomed, dragged and so on
#%matplotlib widget

# an alternative is 
# %matplotlib inline
# which creates rendered plots, that can't be modified

5.2. The structure of a matplotlib Figure

In matplotlib, all our plots are contained within a figure. Each figure can hold one more more Axes that we can plot data to. The typical steps to create a plot are:

  1. create figure

  2. create one or more axes

  3. plot data to axes

  4. adjust layout

  5. export

mplfig

# 1. create figure
fig = plt.figure()

# 2. create Axes

ax1 = fig.add_subplot()

# 3. plot data: first argument here is the data for x, 
# second for y

ax1.plot([1,2,3,4], [1,2,2,1], label="This is the first line")

# 4. adjust layout
ax1.set_xlabel("this is the x-axis")
ax1.set_ylabel("this is the y-axis")

# add a legend, automatically filled with everything we labeled
ax1.legend(loc="best")

# 5. export

fig.savefig("our_first_fig.png", dpi=300)
_images/Plots_12_0.svg

Instead of creating a single axis, we can also put two next to eachother in the figure. Using the add_subplots() method:

fig = plt.figure()

# The new step, two subplots on top of eachother
axs = fig.subplots(nrows=2)

# We can select plots using indexing

axs[0].plot([1,2,3,4], [1,2,2,1])
axs[1].plot([1,1,0,0], [1,0,0,1]);

# And you can the x and y axis of axes in a figure together

axs[0].sharex(axs[1])

# if you are running this sheet in widget mode, 
#try to move one of the plots around
_images/Plots_14_0.svg

5.3. Loading data from a file

So far, so good. Of course, we likely don’t just want to display data we’ve typed in manually. Likely, the data is stored in some file on the disk. If the data is stored in a CSV file, we can easily load it using np.genfromtxt.

In a CSV file, data is saved in a table of human readable numbers. Values in a row are delimited using a some character that is not found in the rest of the data. This could be a ,, ;, a space or a TAB. The minimum information genfromtxt needs is the path and the delimiter. By default, any series of whitespace (spaces or TABS) is considered the delimiter. genfromtxt returns the contents of the file as numpy array.

We want to open data tabulated as in the .csv file “data/gaussians.csv”.

genfromtxt is a function provided by the numpy package that converts tabular text data into a numpy array. It has quite a few parameters to allow us to accommodate a wide range of different formats. Let’s have a look at the genfromtxt documentation to see which parameters it has:

help(np.genfromtxt)
Help on function genfromtxt in module numpy:

genfromtxt(fname, dtype=<class 'float'>, comments='#', delimiter=None, skip_header=0, skip_footer=0, converters=None, missing_values=None, filling_values=None, usecols=None, names=None, excludelist=None, deletechars=" !#$%&'()*+,-./:;<=>?@[\\]^{|}~", replace_space='_', autostrip=False, case_sensitive=True, defaultfmt='f%i', unpack=None, usemask=False, loose=True, invalid_raise=True, max_rows=None, encoding='bytes', *, like=None)
    Load data from a text file, with missing values handled as specified.
    
    Each line past the first `skip_header` lines is split at the `delimiter`
    character, and characters following the `comments` character are discarded.
    
    Parameters
    ----------
    fname : file, str, pathlib.Path, list of str, generator
        File, filename, list, or generator to read.  If the filename
        extension is `.gz` or `.bz2`, the file is first decompressed. Note
        that generators must return byte strings. The strings
        in a list or produced by a generator are treated as lines.
    dtype : dtype, optional
        Data type of the resulting array.
        If None, the dtypes will be determined by the contents of each
        column, individually.
    comments : str, optional
        The character used to indicate the start of a comment.
        All the characters occurring on a line after a comment are discarded
    delimiter : str, int, or sequence, optional
        The string used to separate values.  By default, any consecutive
        whitespaces act as delimiter.  An integer or sequence of integers
        can also be provided as width(s) of each field.
    skiprows : int, optional
        `skiprows` was removed in numpy 1.10. Please use `skip_header` instead.
    skip_header : int, optional
        The number of lines to skip at the beginning of the file.
    skip_footer : int, optional
        The number of lines to skip at the end of the file.
    converters : variable, optional
        The set of functions that convert the data of a column to a value.
        The converters can also be used to provide a default value
        for missing data: ``converters = {3: lambda s: float(s or 0)}``.
    missing : variable, optional
        `missing` was removed in numpy 1.10. Please use `missing_values`
        instead.
    missing_values : variable, optional
        The set of strings corresponding to missing data.
    filling_values : variable, optional
        The set of values to be used as default when the data are missing.
    usecols : sequence, optional
        Which columns to read, with 0 being the first.  For example,
        ``usecols = (1, 4, 5)`` will extract the 2nd, 5th and 6th columns.
    names : {None, True, str, sequence}, optional
        If `names` is True, the field names are read from the first line after
        the first `skip_header` lines.  This line can optionally be proceeded
        by a comment delimiter. If `names` is a sequence or a single-string of
        comma-separated names, the names will be used to define the field names
        in a structured dtype. If `names` is None, the names of the dtype
        fields will be used, if any.
    excludelist : sequence, optional
        A list of names to exclude. This list is appended to the default list
        ['return','file','print']. Excluded names are appended an underscore:
        for example, `file` would become `file_`.
    deletechars : str, optional
        A string combining invalid characters that must be deleted from the
        names.
    defaultfmt : str, optional
        A format used to define default field names, such as "f%i" or "f_%02i".
    autostrip : bool, optional
        Whether to automatically strip white spaces from the variables.
    replace_space : char, optional
        Character(s) used in replacement of white spaces in the variables
        names. By default, use a '_'.
    case_sensitive : {True, False, 'upper', 'lower'}, optional
        If True, field names are case sensitive.
        If False or 'upper', field names are converted to upper case.
        If 'lower', field names are converted to lower case.
    unpack : bool, optional
        If True, the returned array is transposed, so that arguments may be
        unpacked using ``x, y, z = genfromtxt(...)``.  When used with a
        structured data-type, arrays are returned for each field.
        Default is False.
    usemask : bool, optional
        If True, return a masked array.
        If False, return a regular array.
    loose : bool, optional
        If True, do not raise errors for invalid values.
    invalid_raise : bool, optional
        If True, an exception is raised if an inconsistency is detected in the
        number of columns.
        If False, a warning is emitted and the offending lines are skipped.
    max_rows : int,  optional
        The maximum number of rows to read. Must not be used with skip_footer
        at the same time.  If given, the value must be at least 1. Default is
        to read the entire file.
    
        .. versionadded:: 1.10.0
    encoding : str, optional
        Encoding used to decode the inputfile. Does not apply when `fname` is
        a file object.  The special value 'bytes' enables backward compatibility
        workarounds that ensure that you receive byte arrays when possible
        and passes latin1 encoded strings to converters. Override this value to
        receive unicode arrays and pass strings as input to converters.  If set
        to None the system default is used. The default value is 'bytes'.
    
        .. versionadded:: 1.14.0
    like : array_like
        Reference object to allow the creation of arrays which are not
        NumPy arrays. If an array-like passed in as ``like`` supports
        the ``__array_function__`` protocol, the result will be defined
        by it. In this case, it ensures the creation of an array object
        compatible with that passed in via this argument.
    
        .. note::
            The ``like`` keyword is an experimental feature pending on
            acceptance of :ref:`NEP 35 <NEP35>`.
    
        .. versionadded:: 1.20.0
    
    Returns
    -------
    out : ndarray
        Data read from the text file. If `usemask` is True, this is a
        masked array.
    
    See Also
    --------
    numpy.loadtxt : equivalent function when no data is missing.
    
    Notes
    -----
    * When spaces are used as delimiters, or when no delimiter has been given
      as input, there should not be any missing data between two fields.
    * When the variables are named (either by a flexible dtype or with `names`),
      there must not be any header in the file (else a ValueError
      exception is raised).
    * Individual values are not stripped of spaces by default.
      When using a custom converter, make sure the function does remove spaces.
    
    References
    ----------
    .. [1] NumPy User Guide, section `I/O with NumPy
           <https://docs.scipy.org/doc/numpy/user/basics.io.genfromtxt.html>`_.
    
    Examples
    --------
    >>> from io import StringIO
    >>> import numpy as np
    
    Comma delimited file with mixed dtype
    
    >>> s = StringIO(u"1,1.3,abcde")
    >>> data = np.genfromtxt(s, dtype=[('myint','i8'),('myfloat','f8'),
    ... ('mystring','S5')], delimiter=",")
    >>> data
    array((1, 1.3, b'abcde'),
          dtype=[('myint', '<i8'), ('myfloat', '<f8'), ('mystring', 'S5')])
    
    Using dtype = None
    
    >>> _ = s.seek(0) # needed for StringIO example only
    >>> data = np.genfromtxt(s, dtype=None,
    ... names = ['myint','myfloat','mystring'], delimiter=",")
    >>> data
    array((1, 1.3, b'abcde'),
          dtype=[('myint', '<i8'), ('myfloat', '<f8'), ('mystring', 'S5')])
    
    Specifying dtype and names
    
    >>> _ = s.seek(0)
    >>> data = np.genfromtxt(s, dtype="i8,f8,S5",
    ... names=['myint','myfloat','mystring'], delimiter=",")
    >>> data
    array((1, 1.3, b'abcde'),
          dtype=[('myint', '<i8'), ('myfloat', '<f8'), ('mystring', 'S5')])
    
    An example with fixed-width columns
    
    >>> s = StringIO(u"11.3abcde")
    >>> data = np.genfromtxt(s, dtype=None, names=['intvar','fltvar','strvar'],
    ...     delimiter=[1,3,5])
    >>> data
    array((1, 1.3, b'abcde'),
          dtype=[('intvar', '<i8'), ('fltvar', '<f8'), ('strvar', 'S5')])
    
    An example to show comments
    
    >>> f = StringIO('''
    ... text,# of chars
    ... hello world,11
    ... numpy,5''')
    >>> np.genfromtxt(f, dtype='S12,S12', delimiter=',')
    array([(b'text', b''), (b'hello world', b'11'), (b'numpy', b'5')],
      dtype=[('f0', 'S12'), ('f1', 'S12')])

In the docstring above, all arguments - except for fname - are marked as optional, meaning we only need to pass them if the default value doesn’t work for us. Let’s see what happens if we only pass filename.

raw_data = np.genfromtxt("data/gaussians.csv")
raw_data
array([nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
       nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
       nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
       nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
       nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
       nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
       nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
       nan, nan, nan, nan, nan, nan, nan, nan, nan, nan])

Well, that doesn’t look great. All these “nan”s show us, that Python wasn’t able to read anything.

Let’s look at the first few lines of the csv file to figure out what went wrong (you can also open the file in a text reader, or even in Jupyter). To keep everything in one place, here we will use Python instead).

We open the file as text file and print out the first few lines. The syntax for safely opening a file in python uses a so called “context manager” or with block. While discussing the intricacies of context managers goes beyond the scope of this course, we don’t really need to understand them to use them. Just remember, that files are opened in python in the following way:

with open("data/gaussians.csv", "r") as f:
    # we only have access to the file in this block
    # it is closed, as soon as we leave it
    print("first line:", f.readline())
    print("second line:",f.readline())
first line: # first column contains x values, other columns y values

second line: -5.000000000000000000e+01,1.663668552769293465e-44,2.107693412240174823e-27,9.230723952874861570e-19,8.396961391172795506e-14,1.064264733787643876e-10,1.242708024595863142e-08,3.458050799114254380e-07,3.858869813994044932e-06,2.341178771255181268e-05,9.315909810606924179e-05,2.743190629113324212e-04,6.478774868841780934e-04,1.297114829420015344e-03,2.289187047352660240e-03,3.663127777746836081e-03

Looking back to the docstring for genfromtxt we see that the default value for the delimiter is any whitespace character. However, our file uses commas. Hence, we need to adapt the parameters we pass to the function call. Since the first line begins on a # genfromtxt automatically disregards it.

We will take another look at loading data in the next section.

raw_data = np.genfromtxt("data/gaussians.csv", 
                         delimiter=",")
raw_data
array([[-5.00000000e+01,  1.66366855e-44,  2.10769341e-27, ...,
         1.29711483e-03,  2.28918705e-03,  3.66312778e-03],
       [-4.90000000e+01,  8.72716035e-43,  2.31304455e-26, ...,
         1.58733595e-03,  2.73567600e-03,  4.29184782e-03],
       [-4.80000000e+01,  4.22605890e-41,  2.41848193e-25, ...,
         1.93458473e-03,  3.25750237e-03,  5.01241265e-03],
       ...,
       [ 4.80000000e+01,  4.22605890e-41,  2.41848193e-25, ...,
         1.93458473e-03,  3.25750237e-03,  5.01241265e-03],
       [ 4.90000000e+01,  8.72716035e-43,  2.31304455e-26, ...,
         1.58733595e-03,  2.73567600e-03,  4.29184782e-03],
       [ 5.00000000e+01,  1.66366855e-44,  2.10769341e-27, ...,
         1.29711483e-03,  2.28918705e-03,  3.66312778e-03]])

This already looks much better. Now, luckily, we also know how to interpret the data. The first column is the x-axis, and the following columns represent y-values.

raw_data is a numpy ndarray. An efficient way to store multidimensional data of one type, in this case floats.

type(raw_data)
numpy.ndarray

The .shape attribute of the numpy array shows the shape/size of the array

raw_data.shape
(101, 16)

This array has 101 rows and 16 columns. To select a specific row or column, we use indexing (as with tuples and lists). The difference here is, that ndarray support multidimensional indexing and slicing.

For example, to select the first column, we use a colon for the first index (to select all rows) and 0 for the second index. That returns our x-axis:

raw_data[:,0]
array([-50., -49., -48., -47., -46., -45., -44., -43., -42., -41., -40.,
       -39., -38., -37., -36., -35., -34., -33., -32., -31., -30., -29.,
       -28., -27., -26., -25., -24., -23., -22., -21., -20., -19., -18.,
       -17., -16., -15., -14., -13., -12., -11., -10.,  -9.,  -8.,  -7.,
        -6.,  -5.,  -4.,  -3.,  -2.,  -1.,   0.,   1.,   2.,   3.,   4.,
         5.,   6.,   7.,   8.,   9.,  10.,  11.,  12.,  13.,  14.,  15.,
        16.,  17.,  18.,  19.,  20.,  21.,  22.,  23.,  24.,  25.,  26.,
        27.,  28.,  29.,  30.,  31.,  32.,  33.,  34.,  35.,  36.,  37.,
        38.,  39.,  40.,  41.,  42.,  43.,  44.,  45.,  46.,  47.,  48.,
        49.,  50.])

Let’s write a for loop, that plots all the columns of the dataset. First, we create a figure and add axes. Then we loop over all columns, starting at column 1. Column 0 contains the x axis values, so we pass it as x to all plot commands. The column selected by the current index is passed for y.

We also label the plot, using .format string formatting to label each line. And then display the legend using ax.legend().

# Create figure and axes
fig = plt.figure()
ax_data = fig.add_subplot()

#we start at 1 because column 0 is the x-axis
#we use the shape of the array as the upper end of the range
for col_idx in range(1,raw_data.shape[1]):
    ax_data.plot(raw_data[:,0], 
                 raw_data[:,col_idx], 
                 label="Col {}".format(col_idx))
ax_data.legend();
_images/Plots_33_0.svg

In case we don’t care about the labels, we can also take a shortcut to get to this plot, by using a slice for the second argument. In that case, every column is plotted as one line.

fig = plt.figure()
ax_data = fig.add_subplot()
ax_data.plot(raw_data[:,0],raw_data[:,1:]);
_images/Plots_35_0.svg

5.4. Plot Layout

5.4.1. Colors

Now that these basics are out of the way, we can go back to our plot. Let’s talk about lines. By default, matplotlib uses a series of colors for plotting. This can be overridden by using the color argument. Possible choices for colors are listed in the matplotlib docs. In short: RGB values can be typed in starting with a “#”, e.g. the TUW color is “#006699”. Some oft-used colors are available as single letter abbreviations (‘b’, ‘g’, ‘r’, ‘c’, ‘m’, ‘y’, ‘k’, ‘w’), then there is a list of named colors from the CSS4 standard and from the xkcd color survey (these need to be prefixed with “xkcd:”).

We can also label lines by using the label argument. This argument takes a string that is then displayed, together with the line, when we create a legend. Let’s draw some lines and try out these options:

# create figure and add axes
fig = plt.figure()
ax_data = fig.add_subplot()

# let's make colorful lines
ax_data.plot(raw_data[:,0],raw_data[:,2], # x and y values
             color="r", # set the line color
             label="this line is red") # and add a label

ax_data.plot(raw_data[:,0],raw_data[:,3], # again x and y
             color="#006699", # Actual RGB code from TUW CI guideline
             label="this line is TUW blue")
ax_data.plot(raw_data[:,0],raw_data[:,4], # x and y
             color="xkcd:banana", # using color names from the xkcd survey
             label="this line is banana")

# and adding a legend
ax_data.legend()
<matplotlib.legend.Legend at 0x7fdc7d605eb0>
_images/Plots_38_1.svg

5.4.2. Linestyles and Markers

Let’s look at the docstring of ax_data.plot to see what else we can do:

help(ax_data.plot)
Help on method plot in module matplotlib.axes._axes:

plot(*args, scalex=True, scaley=True, data=None, **kwargs) method of matplotlib.axes._subplots.AxesSubplot instance
    Plot y versus x as lines and/or markers.
    
    Call signatures::
    
        plot([x], y, [fmt], *, data=None, **kwargs)
        plot([x], y, [fmt], [x2], y2, [fmt2], ..., **kwargs)
    
    The coordinates of the points or line nodes are given by *x*, *y*.
    
    The optional parameter *fmt* is a convenient way for defining basic
    formatting like color, marker and linestyle. It's a shortcut string
    notation described in the *Notes* section below.
    
    >>> plot(x, y)        # plot x and y using default line style and color
    >>> plot(x, y, 'bo')  # plot x and y using blue circle markers
    >>> plot(y)           # plot y using x as index array 0..N-1
    >>> plot(y, 'r+')     # ditto, but with red plusses
    
    You can use `.Line2D` properties as keyword arguments for more
    control on the appearance. Line properties and *fmt* can be mixed.
    The following two calls yield identical results:
    
    >>> plot(x, y, 'go--', linewidth=2, markersize=12)
    >>> plot(x, y, color='green', marker='o', linestyle='dashed',
    ...      linewidth=2, markersize=12)
    
    When conflicting with *fmt*, keyword arguments take precedence.
    
    
    **Plotting labelled data**
    
    There's a convenient way for plotting objects with labelled data (i.e.
    data that can be accessed by index ``obj['y']``). Instead of giving
    the data in *x* and *y*, you can provide the object in the *data*
    parameter and just give the labels for *x* and *y*::
    
    >>> plot('xlabel', 'ylabel', data=obj)
    
    All indexable objects are supported. This could e.g. be a `dict`, a
    `pandas.DataFrame` or a structured numpy array.
    
    
    **Plotting multiple sets of data**
    
    There are various ways to plot multiple sets of data.
    
    - The most straight forward way is just to call `plot` multiple times.
      Example:
    
      >>> plot(x1, y1, 'bo')
      >>> plot(x2, y2, 'go')
    
    - Alternatively, if your data is already a 2d array, you can pass it
      directly to *x*, *y*. A separate data set will be drawn for every
      column.
    
      Example: an array ``a`` where the first column represents the *x*
      values and the other columns are the *y* columns::
    
      >>> plot(a[0], a[1:])
    
    - The third way is to specify multiple sets of *[x]*, *y*, *[fmt]*
      groups::
    
      >>> plot(x1, y1, 'g^', x2, y2, 'g-')
    
      In this case, any additional keyword argument applies to all
      datasets. Also this syntax cannot be combined with the *data*
      parameter.
    
    By default, each line is assigned a different style specified by a
    'style cycle'. The *fmt* and line property parameters are only
    necessary if you want explicit deviations from these defaults.
    Alternatively, you can also change the style cycle using
    :rc:`axes.prop_cycle`.
    
    
    Parameters
    ----------
    x, y : array-like or scalar
        The horizontal / vertical coordinates of the data points.
        *x* values are optional and default to ``range(len(y))``.
    
        Commonly, these parameters are 1D arrays.
    
        They can also be scalars, or two-dimensional (in that case, the
        columns represent separate data sets).
    
        These arguments cannot be passed as keywords.
    
    fmt : str, optional
        A format string, e.g. 'ro' for red circles. See the *Notes*
        section for a full description of the format strings.
    
        Format strings are just an abbreviation for quickly setting
        basic line properties. All of these and more can also be
        controlled by keyword arguments.
    
        This argument cannot be passed as keyword.
    
    data : indexable object, optional
        An object with labelled data. If given, provide the label names to
        plot in *x* and *y*.
    
        .. note::
            Technically there's a slight ambiguity in calls where the
            second label is a valid *fmt*. ``plot('n', 'o', data=obj)``
            could be ``plt(x, y)`` or ``plt(y, fmt)``. In such cases,
            the former interpretation is chosen, but a warning is issued.
            You may suppress the warning by adding an empty format string
            ``plot('n', 'o', '', data=obj)``.
    
    Returns
    -------
    list of `.Line2D`
        A list of lines representing the plotted data.
    
    Other Parameters
    ----------------
    scalex, scaley : bool, default: True
        These parameters determine if the view limits are adapted to the
        data limits. The values are passed on to `autoscale_view`.
    
    **kwargs : `.Line2D` properties, optional
        *kwargs* are used to specify properties like a line label (for
        auto legends), linewidth, antialiasing, marker face color.
        Example::
    
        >>> plot([1, 2, 3], [1, 2, 3], 'go-', label='line 1', linewidth=2)
        >>> plot([1, 2, 3], [1, 4, 9], 'rs', label='line 2')
    
        If you make multiple lines with one plot call, the kwargs
        apply to all those lines.
    
        Here is a list of available `.Line2D` properties:
    
        Properties:
        agg_filter: a filter function, which takes a (m, n, 3) float array and a dpi value, and returns a (m, n, 3) array
        alpha: float or None
        animated: bool
        antialiased or aa: bool
        clip_box: `.Bbox`
        clip_on: bool
        clip_path: Patch or (Path, Transform) or None
        color or c: color
        contains: unknown
        dash_capstyle: {'butt', 'round', 'projecting'}
        dash_joinstyle: {'miter', 'round', 'bevel'}
        dashes: sequence of floats (on/off ink in points) or (None, None)
        data: (2, N) array or two 1D arrays
        drawstyle or ds: {'default', 'steps', 'steps-pre', 'steps-mid', 'steps-post'}, default: 'default'
        figure: `.Figure`
        fillstyle: {'full', 'left', 'right', 'bottom', 'top', 'none'}
        gid: str
        in_layout: bool
        label: object
        linestyle or ls: {'-', '--', '-.', ':', '', (offset, on-off-seq), ...}
        linewidth or lw: float
        marker: marker style string, `~.path.Path` or `~.markers.MarkerStyle`
        markeredgecolor or mec: color
        markeredgewidth or mew: float
        markerfacecolor or mfc: color
        markerfacecoloralt or mfcalt: color
        markersize or ms: float
        markevery: None or int or (int, int) or slice or List[int] or float or (float, float) or List[bool]
        path_effects: `.AbstractPathEffect`
        picker: unknown
        pickradius: float
        rasterized: bool or None
        sketch_params: (scale: float, length: float, randomness: float)
        snap: bool or None
        solid_capstyle: {'butt', 'round', 'projecting'}
        solid_joinstyle: {'miter', 'round', 'bevel'}
        transform: `matplotlib.transforms.Transform`
        url: str
        visible: bool
        xdata: 1D array
        ydata: 1D array
        zorder: float
    
    See Also
    --------
    scatter : XY scatter plot with markers of varying size and/or color (
        sometimes also called bubble chart).
    
    Notes
    -----
    **Format Strings**
    
    A format string consists of a part for color, marker and line::
    
        fmt = '[marker][line][color]'
    
    Each of them is optional. If not provided, the value from the style
    cycle is used. Exception: If ``line`` is given, but no ``marker``,
    the data will be a line without markers.
    
    Other combinations such as ``[color][marker][line]`` are also
    supported, but note that their parsing may be ambiguous.
    
    **Markers**
    
    =============    ===============================
    character        description
    =============    ===============================
    ``'.'``          point marker
    ``','``          pixel marker
    ``'o'``          circle marker
    ``'v'``          triangle_down marker
    ``'^'``          triangle_up marker
    ``'<'``          triangle_left marker
    ``'>'``          triangle_right marker
    ``'1'``          tri_down marker
    ``'2'``          tri_up marker
    ``'3'``          tri_left marker
    ``'4'``          tri_right marker
    ``'s'``          square marker
    ``'p'``          pentagon marker
    ``'*'``          star marker
    ``'h'``          hexagon1 marker
    ``'H'``          hexagon2 marker
    ``'+'``          plus marker
    ``'x'``          x marker
    ``'D'``          diamond marker
    ``'d'``          thin_diamond marker
    ``'|'``          vline marker
    ``'_'``          hline marker
    =============    ===============================
    
    **Line Styles**
    
    =============    ===============================
    character        description
    =============    ===============================
    ``'-'``          solid line style
    ``'--'``         dashed line style
    ``'-.'``         dash-dot line style
    ``':'``          dotted line style
    =============    ===============================
    
    Example format strings::
    
        'b'    # blue markers with default shape
        'or'   # red circles
        '-g'   # green solid line
        '--'   # dashed line with default color
        '^k:'  # black triangle_up markers connected by a dotted line
    
    **Colors**
    
    The supported color abbreviations are the single letter codes
    
    =============    ===============================
    character        color
    =============    ===============================
    ``'b'``          blue
    ``'g'``          green
    ``'r'``          red
    ``'c'``          cyan
    ``'m'``          magenta
    ``'y'``          yellow
    ``'k'``          black
    ``'w'``          white
    =============    ===============================
    
    and the ``'CN'`` colors that index into the default property cycle.
    
    If the color is the only part of the format string, you can
    additionally use any  `matplotlib.colors` spec, e.g. full names
    (``'green'``) or hex strings (``'#008000'``).

First, I want to focus on linestyle, linewidth and marker.

  • linestyle selects if the line is solid (default), dashed (--), dotted (:), dash dotted (-.)

  • linewidth is the width of the line in points

  • marker places a symbol at every point of your line. The full list of possible symbols is given in the docstring.

Here are some examples:

fig = plt.figure()
ax_data = fig.add_subplot()

# let's make colorful lines

ax_data.plot(raw_data[:,0],raw_data[:,2], 
             color="xkcd:ugly pink", #using xkcd names
             # this sets the linewidth to 10pts
             linewidth=10,
             label="a fat, ugly pink line")
ax_data.plot(raw_data[:,0],raw_data[:,5], 
             # color entered here as a name
             color="purple", 
             # add markers at each datapoint
             marker="x",
             label="this is a line with crosses in purple")
ax_data.plot(raw_data[:,0],raw_data[:,9], 
             # set the linestyle to dashed "--"
             linestyle="--",
             # and the linewidth to 4pts
             linewidth=4,
             # the order of keyword arguments doesn't matter
             color="xkcd:banana",
             label="this is a dashed, banana line")
ax_data.plot(raw_data[:,0],raw_data[:,15],
             # remove teh linestyle, to only shop markers
             linestyle="none",
             # asterisk markers
             marker="*",
             color="gold",
             label="gold stars")

# and add a legend
ax_data.legend();
_images/Plots_42_0.svg

5.4.3. Formatting axis

We already know how to label the axis of your plots. However, to produce publication ready plots, we need to control axis limits, tick label positions and spacings, the axis scale (linear scale, log scale) …

Axis limits are set by using ax.set_xlim(<left edge>,<right edge>) and ax.set_ylim(<left edge>,<right edge>). If you don’t care what the axis limits are but just want them to be inverted (IR spectroscopists, represent!), you can useax.set_xlim(ax.set_xlim()[::-1]).

You can either place axis ticks manually by using ax.set_xticks([<tick positions>]) or you can use the automatic formatters:

  • plt.MultipleLocator(<number>): places on tick every multiple of <number>

  • plt.LinearLocator(): a “nice looking” amount of ticks along the axis, the optional numticks argument can be used to set the number of ticks

  • plt.MaxNLocator(<nbins>,<steps>): places at most <nbins> - 1 ticks along the axis and tries out intervals from <steps>

  • plt.LogLocator(<base>): places ticks for a logarithmic axis

A locator can be assigned either as major or as minor locator. Major ticks are by default labeled with numbers, minor locators are smaller. They are set using the set_major_locator and set_minor_locator functions of xaxis and yaxis.

Let’s first start with the manual way using set_xticks:

# create figure and axes, then plot one line.There is nothing new here.
fig = plt.figure()
ax_data = fig.add_subplot()

ax_data.plot(raw_data[:,0],raw_data[:,2], 
             color="xkcd:ugly pink", 
             linewidth=10,
             label="a fat, ugly pink line")
# this command sets the ticks
ax_data.set_xticks([-20,0,20]);
_images/Plots_45_0.svg

The manual way of setting tick locations has the advantage of allowing us the get fancy with tick labels (because we already know where the ticks are). Here is an example:

# setting tick labels

ax_data.set_xticklabels(["minus twenty", "center", "plus twenty"])
fig
_images/Plots_47_0.svg

Now, let’s look at the more automated version. These give us some control over the layout, but make it very easy to set up nice looking ticks. The MultipleLocators put a marker at every multiple of a given number. MaxNLocators put at most N markers at “nice” locations along the axis.

The major locator of an axis gets larger ticks and tick labels, the minor locator only gets smaller ticks.

# create a new figure again
fig = plt.figure()
ax_data = fig.add_subplot()
ax_data.plot(raw_data[:,0],raw_data[:,2], 
             color="xkcd:ugly pink", 
             linewidth=10,
             label="a fat, ugly pink line")
[<matplotlib.lines.Line2D at 0x7fdc7d4f3520>]
_images/Plots_49_1.svg
# we add locators to the xaxis
ax_data.xaxis.set_major_locator(plt.MultipleLocator(10))
ax_data.xaxis.set_minor_locator(plt.MultipleLocator(1))

# and the the yaxis
ax_data.yaxis.set_major_locator(plt.MaxNLocator(5))
ax_data.yaxis.set_minor_locator(plt.MaxNLocator(50))

#and then display the figure
fig
_images/Plots_50_0.svg

The above mentioned set_xlim and set_ylim methods are used to change the limits of our axis. If you are happy with the limits of an axis but want to invert it, you can use .invert_yaxis() (IR spectroscopy represent!).

# shifting the limits
ax_data.set_xlim(-20,10)

# flipping the direction of the yaxis
ax_data.invert_yaxis()

fig
_images/Plots_52_0.svg

If you want to change the color scheme or line style in a plot, without setting the color manually in every .plot call, you can define a cycler to do it for you.

When you want to modify different parameters, you can “add” cyclers together. Here, for example, we create a cycle that goes through the TU Wien corporate design colors once with solid lines and then a second time with dotted lines:

# import the cycler
from cycler import cycler

# create a figure and add a subplot (we know this)
figure_TU_colors = plt.figure()
ax_data = figure_TU_colors.add_subplot()

# create lists for colors and styles, the mu
TU_colors = ["#5485AB", "#ba4682", "#E18922", "#646363", 
             "#72add5", "#cd81a8", "#eeb473", "#9d9d9c"]
linestyles = ["-"]*len(TU_colors) + [":"]*len(TU_colors)

# list multiplication is used to make both lists the same length 
my_prop_cycler = (cycler(color=TU_colors*2)+\
                 cycler(linestyle=linestyles))

# the new step: set the prop cycler
ax_data.set_prop_cycle(my_prop_cycler)

# plot all data in one go
ax_data.plot(raw_data[:,0],raw_data[:,1:]);
_images/Plots_54_0.svg

5.4.4. Annotating plots

The ax.annotate function places a text label and an optional arrow in your plot. We can choose two coordinates for this function, the location of the point we want to annotate xy, passed as tuple, and the location of the text xytext. If we only pass xy, then the text is placed right on top of that point. Otherwise, an arrow is drawn between both points.

For both parameters, we can also choose a coordinate system passed in the xycoords and textcoord parameters (see docstring):

Value

Description

‘figure points’

Points from the lower left of the figure

‘figure pixels’

Pixels from the lower left of the figure

‘figure fraction’

Fraction of figure from lower left

‘axes points’

Points from lower left corner of axes

‘axes pixels’

Pixels from lower left corner of axes

‘axes fraction’

Fraction of axes from lower left

‘data’

Use the coordinate system of the object being annotated (default)

For textcoord, we can also choose:

Value

Description

‘offset points’

Offset (in points) from the xy value

‘offset pixels’

Offset (in pixels) from the xy value

Furthermore, we can choose the horizontal or vertical alignment of the text, i.e. how it is positioned relative to its anchor point.

Here are some examples for annotations:

# create the figure
fig = plt.figure()
ax_data = fig.add_subplot()
ax_data.set_ylim(0,.5)
ax_data.plot(raw_data[:,0],raw_data[:,1], 
             color="xkcd:ugly pink", 
             linewidth=10,
             label="a fat, ugly pink line")
[<matplotlib.lines.Line2D at 0x7fdc7d709e20>]
_images/Plots_60_1.svg
# basic annotation placed at xy
ax_data.annotate("basic annotation" , xy=(-20, .3))
fig
_images/Plots_61_0.svg
# horizontal and vertical alignment position the text on top of the point
ax_data.annotate("annotation placed on top of point" , 
                 xy=(0, .45), 
                 verticalalignment="bottom",
                 horizontalalignment="center")
fig
_images/Plots_62_0.svg
# when we add an xytext argument,
# then text is placed at xytext 
# and the arrow points towards xy
ax_data.annotate("annotation with arrow" , 
                 xy=(0, .45), 
                 xytext=(0.1, 0.5),
                 textcoords="axes fraction", # this text will always be in the same position of the axes
                 arrowprops={"arrowstyle":"->"})
fig
_images/Plots_63_0.svg
# arrowprops allow bent arrows to point from xytext to xy
ax_data.annotate("annotation with arrow and fancier line" , 
                 xy=(0, .45), 
                 xytext=(0.6, 0.5),
                 textcoords="axes fraction",
                 arrowprops={"arrowstyle":"->", "connectionstyle":"angle3"})
fig
_images/Plots_64_0.svg

5.4.5. Mathematical expressions in plot labels/legends

You can use \(\LaTeX\) expressions to label plots in matplotlib. Whenever you pass string that will be displayed as a text in matplotlib, you can use $...$ to indicate the parts that are supposed to be rendered as \(\LaTeX\) formula.

In \(\LaTeX\) formulas, you can write subscripts and superscripts using _ and ^ respectively. If you want to sub/superscript multiple symbols, you can put them in {} curly braces. Greek letters are available as \alpha to \omega for lower case and \Alpha to \Omega for upper case.

There is a one caveat: backslashes \ in python strings are interpreted as escape characters that can be used to encode e.g. newlines \n or tabs \t. You can tell python not to use \ to escape in a string by prefixing the begin of the string with an r.

The available \(\LaTeX\) syntax for matplotlib can be found in the docs as well.

# create another figure
fig = plt.figure()
ax_data = fig.add_subplot()

# set limits because matplotlib doesn't autoscale to annotations
ax_data.set_ylim(-.5,.5)
ax_data.set_xlim(-.5,.5)

#add some annotations with LaTeX
ax_data.annotate(r"$\int_0^{\infty}x dx$" , xy=(0, 0))
ax_data.annotate(r"$\cos(y)$" , xy=(.1, .1))
ax_data.annotate(r"$\Im(y)$" , xy=(-.1, .1))
ax_data.annotate(r"$\Re(x)$" , xy=(-.1, -.1))
ax_data.annotate(r"$\frac{\partial x}{\partial t}$" , xy=(.1, -.1))

ax_data.set_xlabel(r"x / 1")
ax_data.set_ylabel(r"$\cos(x)$ / 1")

ax_data.set_title(r"$\sum \left(\frac{1}{x}\right)$");
_images/Plots_67_0.svg

5.4.6. Example: Smoothing, peak-picking, plotting

One of the most powerful aspects of using matplotlib for visualization is that we can generate plot inputs from python code. That means any thing we can calculate from our data, can be integrated into the plot with a few lines of code. For example, we can load an IR spectrum, display and then use a scipy peak picking function to find peaks.

First, this is what the spectrum looks like (with the traditional reversed x-axis so the gods of FTIR don’t get angry):

# load data (more on that in the next section)
EtOH = np.genfromtxt(fname="data/64-17-5-IR.csv",
                            skip_header=1,
                            unpack=True)
# create figure and axes
EtOH_fig = plt.figure()
EtOH_ax = EtOH_fig.add_subplot()

# and plot the x and y data
EtOH_ax.plot(EtOH[0], EtOH[1])

# set labels
EtOH_ax.set_xlabel(r"$\tilde{\nu}$ / $\rm{cm^{-1}}$")
EtOH_ax.set_ylabel(r"$Absorption / 1")

# invert xaxis
EtOH_ax.invert_xaxis()
_images/Plots_70_0.svg

The data looks a bit noisy, so step 1 is smoothing. Typically, IR people like to use a Savitzky-Golay filter to smooth spectra. Conceptually, the Savitzky-Golay filter performs a least squares fit of a polynomial to data points inside a small window and uses that polynomial to determine the smoothed value at the center point of the window.

The two parameters we need choose are the degree of the polynomial and the size of the window. Higher degree leads to more noise in the output but better adherence to the spectrum, whereas a larger window size leads to a smoother spectrum with the potential of “ironing out” sharp spectral features.

The scipy.signalfunction savgol_filter implements this filter. We can use matplotlib to determine the right window size (we will set the degree of the polynomial to 2, which seems to be a commonly accepted default value).

In the plot below, we first plot the raw spectrum and then plot spectra with increasing smoothing on top:

from scipy.signal import savgol_filter
# create figure and plot original spectrum
EtOH_fig = plt.figure()
EtOH_ax = EtOH_fig.add_subplot()
EtOH_ax.plot(EtOH[0], EtOH[1], color="black", linewidth=4, label="original")
[<matplotlib.lines.Line2D at 0x7fdc62ad81c0>]
_images/Plots_73_1.svg
# for loop to iterate over window sizes
for window_length in range(3,17,2):
    EtOH_ax.plot(EtOH[0], 
                 savgol_filter(EtOH[1],window_length=window_length, polyorder=2), 
                 label="window: {}".format(window_length))
# and one extreme candidate
EtOH_ax.plot(EtOH[0], 
         savgol_filter(EtOH[1],window_length=55, polyorder=2), 
         label="window: {}".format(55))

# add legend and labels
EtOH_ax.legend()
EtOH_ax.set_xlabel(r"$\tilde{\nu}$ / $\rm{cm^{-1}}$")
EtOH_ax.set_ylabel("Absorption / 1")
EtOH_ax.invert_xaxis()
EtOH_fig
_images/Plots_74_0.svg

It looks like smoothing up to window lengths of 13 is still ok. The very large window size of 55 was added to show how too much smoothing would look like. Is the smoothing enough to remove the noise in the baseline e.g. between 2000 cm-1 and 1500 cm-1? We can zoom our plot in that range to check it out.

When we zoom in by setting the x limits, matplotlib will still keep the y axis scaled to the maximum extension of the data. Hence we need to set x limits as well:

EtOH_ax.set_xlim([2000, 1500])
EtOH_ax.set_ylim(0,.05)
EtOH_fig
_images/Plots_76_0.svg

A window length of 11 seems to be a good compromise. We will store a smoothed version of this spectrum:

# important note: numpy arrays are mutable 
# -> if you assign the same array to multiple variables
# changing one, changes all of them
# copy prevents that
EtOH_smooth = EtOH.copy()

# store smoothed spectrum in second row of array:
EtOH_smooth[1] = savgol_filter(EtOH[1], window_length=11, polyorder=2)

And now we can plot our smoothed spectrum again:

EtOH_smooth_fig = plt.figure()
EtOH_ax = EtOH_smooth_fig.add_subplot()
EtOH_ax.plot(EtOH_smooth[0], EtOH_smooth[1],  label="smoothed")
EtOH_ax.set_xlabel(r"$\tilde{\nu}$ / $\rm{cm^{-1}}$")
EtOH_ax.set_ylabel(r"Absorption / 1")
EtOH_ax.invert_xaxis()
_images/Plots_80_0.svg

Next, we want to mark the peak positions in this spectrum. The function find_peaks from scipy_signal does that for us. Let’s import it.

from scipy.signal import find_peaks

find_peaks gives us peak positions in terms of data indices. We can use these to index the numpy array for wavenumbers and intensity to get peak locations in ax and y. Then we will plot the peak positions as cross hairs, by int .plot method setting the linestyle="none" and marker="x"

# the usual: create a plot and plot the spectrum
EtOH_smooth_fig = plt.figure()
EtOH_ax = EtOH_smooth_fig.add_subplot()
EtOH_ax.plot(EtOH_smooth[0], EtOH_smooth[1])

EtOH_ax.set_xlabel(r"$\tilde{\nu}$ / $\rm{cm^{-1}}$")
EtOH_ax.set_ylabel(r"Absorption / 1")

EtOH_ax.invert_xaxis()
_images/Plots_84_0.svg
# this returns a list of peak indices and associated properties
peaks, properties = find_peaks(EtOH_smooth[1])

# we index the x and y arrays to get the x and y coordinates of the peaks
peaks_x = EtOH_smooth[0][peaks]
peaks_y = EtOH_smooth[1][peaks]

# plot as x markers
EtOH_ax.plot(peaks_x, peaks_y,linestyle="none", marker="x", label="peak positions")
EtOH_ax.legend()
EtOH_smooth_fig
_images/Plots_85_0.svg

It seems, the find_peaks has marked a lot of parts of the spectrum as peak that we probably wouldn’t consider to be one. Luckily, we can use optional arguments to fine tune it. For example, we can set a minimum width, to reject single point local maxima as peaks and set a minimum peak height:

peaks, properties = find_peaks(EtOH_smooth[1], 
                               width=1.5, 
                               height=.05)

peaks_x = EtOH_smooth[0][peaks]
peaks_y = EtOH_smooth[1][peaks]

EtOH_ax.plot(peaks_x, peaks_y,linestyle="none", marker="o", label="better peak positions")

EtOH_ax.legend()
EtOH_smooth_fig
_images/Plots_87_0.svg

Next, we don’t just want the peaks to be marked with crosses, but also display their wavenumber using .annotate. We already have positions of the peaks, so we just need to write a bit of code to loop over all positions and put annotations there:

# create another figure
EtOH_smooth_fig = plt.figure()
EtOH_ax = EtOH_smooth_fig.add_subplot()
EtOH_ax.plot(EtOH_smooth[0], EtOH_smooth[1])
EtOH_ax.invert_xaxis()
EtOH_ax.set_xlabel(r"$\tilde{\nu}$ / $\rm{cm^{-1}}$")
EtOH_ax.set_ylabel(r"Absorption / 1")
Text(0, 0.5, 'Absorption / 1')
_images/Plots_89_1.svg
# plot x markers
EtOH_ax.plot(peaks_x, peaks_y,linestyle="none", marker="x", label="peak positions")

# iterate over all peak indices, 
for peak in peaks:
    peak_x = EtOH_smooth[0][peak]
    peak_y = EtOH_smooth[1][peak]
    an = EtOH_ax.annotate(text="{:.0f}".format(peak_x),
                     xy=(peak_x, peak_y),
                     xycoords = "data", # this is the default anyway
                     xytext = (peak_x, 20), 
                     textcoords = ("data", 'offset pixels'), #move label a bit away from data
                     rotation=90, # rotate by 90 degrees
                     ha="center") # center

# increae top a bit to fit all labels
EtOH_ax.set_ylim(top=1.35)
EtOH_smooth_fig
_images/Plots_90_0.svg

Or we can add the peak labels at the top of the figure by changing the y coordinate of textcoords to “axes fraction” and the vertical alignment va to “top”. In this mode, it also makes sense to add a line between the peak and the label.

# create another figure
EtOH_smooth_fig = plt.figure()
EtOH_ax = EtOH_smooth_fig.add_subplot()
EtOH_ax.plot(EtOH_smooth[0], EtOH_smooth[1])
EtOH_ax.invert_xaxis()
EtOH_ax.set_xlabel(r"$\tilde{\nu}$ / $\rm{cm^{-1}}$")
EtOH_ax.set_ylabel(r"Absorption / 1")
EtOH_ax.set_ylim(top=1.5)
(-0.05475795580419598, 1.5)
_images/Plots_92_1.svg
for peak in peaks:
    peak_x = EtOH_smooth[0][peak]
    peak_y = EtOH_smooth[1][peak]
    an = EtOH_ax.annotate(text="{:.0f}".format(peak_x),
                     xy=(peak_x, peak_y),
                     xycoords = "data", # this is the default anyway
                     xytext = (peak_x, 1),
                     textcoords = ("data", 'axes fraction'), #move label a bit away from data
                     rotation=90,
                     ha="center",
                     va="top",
                     arrowprops={"arrowstyle":"-",
                                "shrinkA":5,
                                "shrinkB":20})
EtOH_smooth_fig
_images/Plots_93_0.svg

5.5. More plot types?

Matplotlib can do significantly more than just create line plots. First, you can look at additional ways to plot into cartesian coordinates. Some examples:

5.5.1. Pie charts

A pie chart takes fractional sizes of the pie slices. When the normalize argument is set to True then the the sizes of slices are normalized to a full circle. Pie charts also allow a significant amount of customization. Read the docstring to learn more. We can set labels, change colors, change how the slices are oriented and so on:

# create figure
fig = plt.figure()
ax_box= fig.add_subplot()
ax_box.set_title("Programming is:")

#create pie chart
ax_box.pie([2,2,96], 
           labels=["knowing syntax", 
                   "experience", 
                   "Googling 'How do I X in python?'"],
          normalize=True);
_images/Plots_98_0.svg

For pie charts, too, there are quite a few arguments for customization. By setting labeldistance to None to removes the labels, startangle set the rotation of the pie chart. Also, legend works on pie charts too.

fig = plt.figure()
ax_pie= fig.add_subplot()
ax_pie.pie([.8, .2], 
           labels=["Pac man", "sometimes Pac man"],
                 normalize=False,
          colors=["xkcd:banana", "lightgrey"],
           labeldistance=None,
          startangle=30)
ax_pie.legend()
<matplotlib.legend.Legend at 0x7fdc6071d790>
_images/Plots_100_1.svg

5.5.2. Visualizing distributions: histogram, box plot, violin plot

These three plot types are useful for understanding the distribution of data, e.g. for exploratory data analysis. These plots take as inputs an array (or a 2D array) of data but they don’t depict the actual data but their distribution, i.e. the density of values within the array.

We will use the scipy.stats package to generate datasets with values of three different distributions: exponential, normal and a bimodal normal distribution.

# we use the scipy.stats package to create some randomly distributed data
# following three different distributions

import scipy.stats as sps

# 200 samples of normally distributed data:
norm = sps.norm(4).rvs(200)
# 200 samples of exponentially
expo = sps.expon(1).rvs(200)
# 100 samples each of two normally distributed datasets
bimod = (np.hstack([sps.norm(loc=-2).rvs(100), sps.norm(loc=2).rvs(100)]))

#shuffle to mix the two distributions
np.random.shuffle(bimod)

If we just plot these samples, we have a hard time to see how the data is distributed:

fig = plt.figure()
ax_pointcloud = fig.add_subplot()

ax_pointcloud.plot(norm, label="norm",linestyle="", marker="x")
ax_pointcloud.plot(expo, label="expo",linestyle="", marker="x")
ax_pointcloud.plot(bimod, label="bimod",linestyle="", marker="x")
ax_pointcloud.legend()
<matplotlib.legend.Legend at 0x7fdc606e04f0>
_images/Plots_105_1.svg

The hist plot creates histograms by putting the samples into several bins and then drawing bars with heights corresponding to the number of samples per bin. This makes it much easier to see the distribution:

fig = plt.figure()
ax3 = fig.add_subplot()

# we pass the three datasets in a list
ax3.hist([norm,expo, bimod],
         bins=20, # the number of bins to use
         label=["norm", "expo", "bimod"])

ax3.legend()
<matplotlib.legend.Legend at 0x7fdc609eaac0>
_images/Plots_107_1.svg

The boxplot is often used to check if datasets have similar distributions. By default, they show the median of the dataset as a horizontal line, the quartiles and outliers in the dataset (points, also called fliers). To quickly check if two variables are potentially significantly different, you can compare their boxplots. If they show little overlap (i.e. the median of one lies outside the box of the other), than that is a good sign for a significant difference.

fig = plt.figure()
ax3 = fig.add_subplot()
ax3.boxplot([norm,expo, bimod])

# labeling box plots is a bit annoying
# we set ticks to have integer numbers:
ax3.set_xticks([1,2,3])
# and then label starting from tick 1
ax3.set_xticklabels(["norm", "exp", "bimod"])
[Text(1, 0, 'norm'), Text(2, 0, 'exp'), Text(3, 0, 'bimod')]
_images/Plots_109_1.svg

Violinplots are similar to boxplots, but they also allow us to gain some insight into shape of the distribution. They work by estimating the the probability density function using an empirical method and plotting them in place of the box plot:

fig = plt.figure()
ax3 = fig.add_subplot()
ax3.violinplot([norm,expo, bimod])

# labelling with like with box plots
ax3.set_xticks([1,2,3])
ax3.set_xticklabels(["norm", "exp", "bimod"])
[Text(1, 0, 'norm'), Text(2, 0, 'exp'), Text(3, 0, 'bimod')]
_images/Plots_111_1.svg

5.5.3. 2.5D plots: contour, contourf, pcolormesh, imshow

These plot types can be used to display are 2 dimensional datasets of the type \(f(x,y)\).

Contour and contourf draw isolines from a 2D dataset, like you would see on a topographic map. The difference is, that contourf fills between the lines, while contour draws just the lines. Let’s start with a 2D dataset. We generate to vectors with equal spacing between -1 and 1 and then use the np.meshgrid function to turn get two 2D arrays where one has column wise values of x and the other row wise values of y. Then we use them to calculate the difference of two 2D gaussians:

# linspace to create equally spaced data
x = np.linspace(-1,1,50)
y = np.linspace(-1,1,50)

# meshgrid to create 2D arrays for x ans y
X, Y = np.meshgrid(x,y)
z = 1.1*np.exp(- ((X-.1)**2 +(Y-.1)**2)/.5**2) -\
    1.0*np.exp(- ((X-.1)**2 +(Y+.1)**2)/.5**2) +\
    0.9*np.exp(- ((X+.1)**2 +(Y+.1)**2)/.5**2) -\
    1.2*np.exp(- ((X+.1)**2 +(Y-.1)**2)/.5**2)

Ignore the plotting commands for now, this is what the X and Y arrays look like:

contour_fig = plt.figure()
Xax, Yax= contour_fig.subplots(ncols=2, sharex=True, sharey=True)
mappableX=Xax.pcolormesh(np.arange(len(x)), np.arange(len(y)), X, shading="nearest")
mappableY=Yax.pcolormesh(np.arange(len(x)), np.arange(len(y)), Y, shading="nearest")
Xax.set_aspect(1)
Yax.set_aspect(1)

contour_fig.colorbar(mappableX, ax=Xax)
contour_fig.colorbar(mappableY, ax=Yax)
<matplotlib.colorbar.Colorbar at 0x7fdc606066a0>
_images/Plots_116_1.svg

If we pass just the z array to contour, then it generates a plot of isolines that looks as follows:

# create figure and axes, as per usual
contour_fig = plt.figure()
contour_ax = contour_fig.add_subplot()
# contour plot
cont_mappable=contour_ax.contour(z);
_images/Plots_118_0.svg

When x and y should have the same spacing/unit, we can set the aspect ratio to 1 using set_aspect() and get the correct shape

contour_ax.set_aspect(1)
contour_fig
_images/Plots_120_0.svg

The x and y values here are indices. The shading of the lines corresponds to the value of z. We can add a colorbar to get an idea what those values are. Note that the method is contour_fig.colorbar and not contour_ax.colorbar

# remember that `cont_mappable` was the output of the contour function
cbar = contour_fig.colorbar(cont_mappable)
contour_fig
_images/Plots_122_0.svg

The colorbar can be formatted like a plot axis. We can add a label to is, we can also set tick locations and formats.

cbar.set_ticks([-.2, +.2])
cbar.set_label("height / 1")
contour_ax.set_aspect(1)
contour_fig
_images/Plots_124_0.svg

We can also add the x and y vectors to the function call to have the correct units along the axes. contour and contourf would accept either the 2D arrays or the 1D inputs to meshgrid:

# create figure
contour_fig = plt.figure()
contour_ax = contour_fig.add_subplot()

# plot contour, with x and y vectors
cont_mappable = contour_ax.contour(x,y,z)

# colorbar and formatting
cbar = contour_fig.colorbar(cont_mappable)
cbar.set_label("height / 1")

contour_ax.set_aspect(1)
_images/Plots_126_0.svg

contourf works exactly the same way as contour. The only difference is that it fills the areas between isolines:

# create figure:
contour_fig = plt.figure()
contour_ax = contour_fig.add_subplot()

# the same arguments as before_
cont_mappable = contour_ax.contourf(x,y,z)

# colorbar and formatting
cbar = contour_fig.colorbar(cont_mappable)
cbar.set_label("height / 1")

contour_ax.set_aspect(1)
_images/Plots_128_0.svg

It’s weird though, that we added four gaussians and now can only make out three in this image. It seems one of them is too low to actually show up in the plot. We can increase the number of lines to change that:

# figure
contour_fig = plt.figure()
contour_ax = contour_fig.add_subplot()

# contourf plot
cont_mappable = contour_ax.contourf(x,y,z,
                                    # more levels:
                                    levels=100)

# formatting and colorbar
cbar = contour_fig.colorbar(cont_mappable)
cbar.set_label("height / 1")

contour_ax.set_aspect(1)
_images/Plots_130_0.svg

We can also combine contour and contourf in a single plot. We set the colors of the contour lines to black (otherwise they take a color corresponding to the height and are invisible) and use the .add_lines method of the colorbar to add the isolines to the bar as well:

# create the figure
contour_fig = plt.figure()
contour_ax = contour_fig.add_subplot()

# store outputs of contourf and contour in different variables
contf_mappable = contour_ax.contourf(x,y,z,levels=100)
cont_mappable = contour_ax.contour(x,y,z,levels=5, colors="black")

# colorbar for contourf
cbar = contour_fig.colorbar(contf_mappable)

# add_lines: isolines in colorbar
cbar.add_lines(cont_mappable)

# formatting
cbar.set_label("height / 1")
contour_ax.set_aspect(1)
_images/Plots_132_0.svg

For large datasets generating a smooth plot using contourf can lead to problems because matplotlib figures out the location of the required isolines by interpolating the dataset. If we are want to generate a smooth false color image from a 2D array, then pcolormesh is more suitable. It creates a continuous image by interpolating between the specified grid points. The call signature is similar to that of contourf, however, we don’t need to set a high number of levels to get a smooth image anymore. And we need to select how the interpolation between data points is generated, using the shading argument. Here “nearest” can sometimes look pixelated for images with few pixels. “gouraud” makes the images smoother

# create figure
contour_fig = plt.figure()
contour_ax = contour_fig.add_subplot()

# pcolormesh generate the image
contf_mappable = contour_ax.pcolormesh(x,y,z,
                                        # shading should either be nearest or gouraud
                                       shading='nearest')

# we can combine this with contour lines as well
cont_mappable = contour_ax.contour(x,y,z,levels=5, colors="black")

# and create a colorbar
cbar = contour_fig.colorbar(contf_mappable)
cbar.add_lines(cont_mappable)
cbar.set_label("height / 1")

contour_ax.set_aspect(1)
_images/Plots_134_0.svg
contour_fig = plt.figure()
contour_ax = contour_fig.add_subplot()

contf_mappable = contour_ax.pcolormesh(x,y,z,
                                       # this is the only change from the previous figure
                                       shading='gouraud'
                                      )
cont_mappable = contour_ax.contour(x,y,z,levels=5, colors="black")

cbar = contour_fig.colorbar(contf_mappable)
cbar.add_lines(cont_mappable)
cbar.set_label("height / 1")

contour_ax.set_aspect(1)
_images/Plots_135_0.svg

5.5.4. Colormaps

The choice of colormap and the extent of the color range are crucial for depicting 2D data.

The matplotlib documentation has a really helpful discussion on different types of colormaps, so I will only give brief intro here. We will focus on these three types of colormaps:

  1. Sequential colormaps continuously change from a low value to a high value

  2. Diverging colormaps have two different colors at the edges that converge to the same color in the center

  3. Cyclic colormaps have the same color on both edges and change in between

5.5.4.1. Sequential Colormaps

We have already been using a sequential colormap (matplotlibs default viridis). Simpler ones use a monotonic change from white to blue. We select colormaps by passing them to the cmap argument of contour, contourf or pcolormesh. Matplotlib’s built-in colormaps are found (among other places) in plt.cm.

# create figure
contour_fig = plt.figure()
contour_ax = contour_fig.add_subplot()

# create pcolomesh
contf_mappable = contour_ax.pcolormesh(x,y,z,
                                       # we select a differnt colormap
                                       cmap=plt.cm.Blues,
                                       shading='gouraud')

# the rest is the same as before
cont_mappable = contour_ax.contour(x,y,z,levels=5, colors="black")

cbar = contour_fig.colorbar(contf_mappable)
cbar.add_lines(cont_mappable)
cbar.set_label("height / 1")

contour_ax.set_aspect(1)
_images/Plots_141_0.svg
# this is the copper colormap

contour_fig = plt.figure()
contour_ax = contour_fig.add_subplot()
contf_mappable = contour_ax.pcolormesh(x,y,z,
                                       shading="gouraud", 
                                       #copper
                                       cmap=plt.cm.copper)
cont_mappable = contour_ax.contour(x,y,z,levels=5, colors="black")

cbar = contour_fig.colorbar(contf_mappable)
cbar.add_lines(cont_mappable)
cbar.set_label("height / 1")

contour_ax.set_aspect(1)
_images/Plots_142_0.svg

5.5.4.2. Diverging Colormaps

Diverging colormaps are used to depict data sets where the absolute location of the value above a plane matters. For example, when we want to depict charge densities it matters if we are looking at a positive or negative charge. Examples are “RdBu” or “bwr”:

#again, only the colormap changes

contour_fig = plt.figure()
contour_ax = contour_fig.add_subplot()
contf_mappable = contour_ax.pcolormesh(x,y,z,
                                       shading="gouraud",
                                       cmap=plt.cm.bwr)
cont_mappable = contour_ax.contour(x,y,z,levels=5, colors="black")

cbar = contour_fig.colorbar(contf_mappable)
cbar.add_lines(cont_mappable)
cbar.set_label("height / 1")

contour_ax.set_aspect(1)
_images/Plots_145_0.svg

Unfortunately, since it matters where exactly the white “zero” ends up in our plot for diverging colormaps, it is important that we set the edges of their range correctly. A very comfortable way to do this is the Normalize class provided by matplotlib.

This class converts input float values to values between 0 and 1. We can let it set the range automatically from our data or manually adjust it as needed. When the Normalize object is first used to scale data (in this code that happens in the call to contourf, since we passed it as optional argument there), it automatically sets its limits.

# we create a new Normalize object
norm = plt.Normalize()

contour_fig = plt.figure()
contour_ax = contour_fig.add_subplot()
contf_mappable = contour_ax.pcolormesh(x,y,z,
                                       shading="gouraud",
                                       cmap=plt.cm.bwr,
                                       # norm is passed as kw argument for norm
                                       norm=norm)

cont_mappable = contour_ax.contour(x,y,z,levels=5, 
                                   colors="black",
                                   # and also for the contour plot
                                   norm=norm)

# everything else stays the same
cbar = contour_fig.colorbar(contf_mappable)
cbar.add_lines(cont_mappable)
cbar.set_label("height / 1")

contour_ax.set_aspect(1)
_images/Plots_147_0.svg

That per se does not make the color map symmetric around the zero plane. To achieve that, we need to set the maximum and minimum values of the colormap from the limits of the dataset. By using the maximum absolute value of z we make sure that the largest and the smallest value are included in the colormap. And by setting vmin to -vmax, we make it symmetric around 0:

# again, create the Normalie
norm = plt.Normalize()
# set the minimum and maximum values to the absolute maximum
norm.vmax = np.max(np.abs(z))
# make it symmetric by setting the minimum to - max
norm.vmin = -norm.vmax

# nothing new here (norm=norm as before)
contour_fig = plt.figure()
contour_ax = contour_fig.add_subplot()
contf_mappable = contour_ax.pcolormesh(x,y,z,
                                       shading="gouraud",
                                       cmap=plt.cm.bwr,
                                       norm=norm)
cont_mappable = contour_ax.contour(x,y,z,levels=5, 
                                   colors="black",
                                   norm=norm)

cbar = contour_fig.colorbar(contf_mappable)
cbar.add_lines(cont_mappable)
cbar.set_label("height / 1")

contour_ax.set_aspect(1)
_images/Plots_149_0.svg

When we are looking at noisy measurements, it might also make sense to exclude the extremes of the dataset in the colorbar, especially when they are actually outliers. In these cases, using the quantiles of the data ensures that most of the dataset is still within the colorbar but we don’t have to manually set limits ourselves.

Let’s add some outliers and noise to our height data and store it in the variable z_ noise

#generate noisy data
z_noise = z + .001/(1-np.random.exponential(1, size=z.shape))

If we just pass this data to pcolormesh the result is not great: a single outlier drags our scale very far from where most of the changes in the dataset happen

# here, only the z data changed

norm = plt.Normalize()

norm.vmax = np.max(np.abs(z_noise))
norm.vmin = -norm.vmax

contour_fig = plt.figure()
contour_ax = contour_fig.add_subplot()
contf_mappable = contour_ax.pcolormesh(x,y,z_noise,
                                       shading="gouraud",
                                       cmap=plt.cm.bwr,
                                       norm=norm)

cbar = contour_fig.colorbar(contf_mappable)
cbar.set_label("height / 1")

contour_ax.set_aspect(1)
_images/Plots_153_0.svg

Obviously, we are not interested in those spikes, but in the smaller, slower changes in the center of the data. We use np.quantile to get the 1% and 99% quantiles of our data and use that to set vmin and vmax of the Normalize instance. The data is still noisy and has spikes, but at least we now see the overal pattern and not just a few outliers.

norm = plt.Normalize()


# we first set the upper limit using a quantile
norm.vmax = max(abs(np.quantile(z_noise, .99)), abs(np.quantile(z_noise, .01)))
# then the again - vmax for the lower limit
norm.vmin = -norm.vmax


contour_fig = plt.figure()
contour_ax = contour_fig.add_subplot()
contf_mappable = contour_ax.pcolormesh(x,y,z_noise,
                                       # nearest works better here
                                       shading="nearest",
                                       cmap=plt.cm.bwr,
                                       norm=norm)

cbar = contour_fig.colorbar(contf_mappable)
cbar.set_label("height / 1")

contour_ax.set_aspect(1)
_images/Plots_155_0.svg

Cyclical colormaps are typically used to depict information like phases and angles. An angle between two vectors can’t be larger than 360°, and 0° and 360° are identical angles. Hence, cyclical colormaps start and end on the same color.

In this example below, we calculate the gradient of our data set and from the direction (angle) of the biggest increase of the values.

# the gradient of the dataset points towards the largest slope
grad = np.gradient(z,x,y)
# this calculates the angle of the gradient
angle = np.arctan2(grad[0],grad[1])/np.pi*180


# we make sure the min and max of the scale are +/-180
norm = plt.Normalize()
norm.vmin=-180
norm.vmax=180


# nothing new here
contour_fig = plt.figure()
contour_ax = contour_fig.add_subplot()
contf_mappable = contour_ax.pcolormesh(x,y,angle,
                                       shading="gouraud",
                                       cmap=plt.cm.twilight,
                                       norm=norm)
cont_mappable = contour_ax.contour(x,y,z, colors="k")

cbar = contour_fig.colorbar(contf_mappable)
cbar.set_label("angle / °")

contour_ax.set_aspect(1)
_images/Plots_157_0.svg

The same plot with a sequential colormap has jumps as we go from +180 to -180.

# nothing new here
contour_fig = plt.figure()
contour_ax = contour_fig.add_subplot()
contf_mappable = contour_ax.pcolormesh(x,y,angle,
                                       shading="gouraud",
                                       cmap=plt.cm.Blues,
                                       norm=norm)
cont_mappable = contour_ax.contour(x,y,z, colors="k")

cbar = contour_fig.colorbar(contf_mappable)
cbar.set_label("angle / °")

contour_ax.set_aspect(1)
_images/Plots_159_0.svg

To encode direction and magnitude, we can also use the quiver plot. This plot type takes x and y coordinates and two additional parameter u and v that are the magnitude of a vector in x and x direction, respectively. It then draws arrows that point in the direction of those vectors and have a length corresponding to the vector’s magnitude. Typically, such a plot would be used to depict force fields or velocity fields.

# create the figure and the angle plot, as before
contour_fig = plt.figure()
contour_ax = contour_fig.add_subplot()
contf_mappable = contour_ax.pcolormesh(x,y,angle,
                                       shading="gouraud",
                                       cmap=plt.cm.twilight,
                                       norm=norm)

# we only add a few arrows here, by slicing x,y, and the gradient
contour_ax.quiver(x[::5],y[::5],
                  #x component of arrow
                  grad[0][::5,::5],
                  #y component of arrow
                  grad[1][::5,::5])


#formatting
cbar = contour_fig.colorbar(contf_mappable)
cbar.set_label("angle / °")
cbar.set_ticks([-180,90,0,90,180])

contour_ax.set_aspect(1)
_images/Plots_161_0.svg

You can also quite easily create custom colormaps from lists of colors. We import LinearSegmentedColormap from matplotlib.colors and then use its .from_list classmethod to create a new colormap. The first argument is the name of the colormap, the second argument is a list of colors. Between these colors, matplotlib will interpolate linearly.

# import
from matplotlib.colors import LinearSegmentedColormap

# create colormap and store in variable
my_nice_map = LinearSegmentedColormap.from_list("fluorescence", # name of the colormap
                                                ["black", "#8ffe09"]# list of colors
                                               )


norm = plt.Normalize()
# we set the lower end of norm to 0 to give nice bright peaks
# against a dark background. Not really sensible.
norm.vmin=0

contour_fig = plt.figure()
contour_ax = contour_fig.add_subplot()
contf_mappable = contour_ax.pcolormesh(x,y,z,
                                       shading="gouraud",
                                       cmap=my_nice_map,
                                       norm=norm)

cbar = contour_fig.colorbar(contf_mappable)
cbar.add_lines(cont_mappable)
cbar.set_label("height / 1")

contour_ax.set_aspect(1)
_images/Plots_163_0.svg

Finally, imshow is used to draw pixel images with equal pixel spacing in each direction. Those images can either be false color like the previous examples or an RGB image. Which one is used depends on he dimensions of the input array to imshow. If that array is 2D then a false color image is drawn using he selected colormap. If an array of dimension MxNx3 is passed, then each layer is use as one color channel.

First, the false color image:

norm = plt.Normalize()

contour_fig = plt.figure()
contour_ax = contour_fig.add_subplot()

# imshow does not accept x and y!
contf_mappable = contour_ax.imshow(z,
                                       cmap=plt.cm.Blues,
                                       norm=norm)

cbar = contour_fig.colorbar(contf_mappable)
cbar.add_lines(cont_mappable)
cbar.set_label("height / 1")

contour_ax.set_aspect(1)
_images/Plots_165_0.svg

Two things immediately jump out at us:

  1. the image appears flipped relative to previous plots

  2. it also looks quite pixelated

  3. we can’t pass x and y arrays for the coordinates.

The first problem we can solve quite easily by setting the image origin to the lower left corner. We can also change the interpolation method to create a smoother image.

norm = plt.Normalize()

contour_fig = plt.figure()
contour_ax = contour_fig.add_subplot()

# there are some differences in the imshow command
contf_mappable = contour_ax.imshow(z,
                                       cmap=plt.cm.Blues,
                                       norm=norm,
                                  origin="lower",#changed origin
                                  interpolation="bilinear")# interpolation

cbar = contour_fig.colorbar(contf_mappable)
cbar.add_lines(cont_mappable)
cbar.set_label("height / 1")

contour_ax.set_aspect(1)
_images/Plots_167_0.svg

The location of the image can be set using the extent argument. This argument requires a list that gives the left , right, bottom and top coordinates of the image edges:

norm = plt.Normalize()

contour_fig = plt.figure()
contour_ax = contour_fig.add_subplot()
contf_mappable = contour_ax.imshow(z,
                                       cmap=plt.cm.Blues,
                                       norm=norm,
                                  origin="lower",
                                  interpolation="bilinear",
                                  extent=(x[0],x[-1],y[0],y[-1]))# added extents

cbar = contour_fig.colorbar(contf_mappable)
cbar.add_lines(cont_mappable)
cbar.set_label("height / 1")

contour_ax.set_aspect(1)
_images/Plots_169_0.svg

Displaying a an RGB image is straight forward. We load it using plt.imread and then pass the loaded data to plt.imshow.

logo = plt.imread("figures/logo.png")

fig_logo = plt.figure()
ax_logo = fig_logo.add_subplot()
ax_logo.imshow(logo)
<matplotlib.image.AxesImage at 0x7fdc5fc42f10>
_images/Plots_171_1.svg

5.6. Saving plots and preparing for publication

Once our plot is finished, we of course want to export it. As we already saw, this can be done using the figure.savefig method. You will notice that sometimes, exported figures end up looking differently than they looked in jupyter. This typically happens, when the resolution used for display in jupyter (in dpi) is different than the one used in savefig. Therefore, if you want high-res figures (>100 dpi), it is a good idea to set the resolution and the final size of the figure (in inches) in the figure call.

This next figure look very large and blodgy in jupyter, but that is because it is supposed to be displayed at a resolution of 500 dpi and should be much small once exported:

fig_noice = plt.figure(figsize=(3,2), dpi=500)
ax_data = fig_noice.add_subplot()
ax_data.plot(raw_data[:,0],raw_data[:,1:]);
ax_data.set_xlabel("position")
ax_data.set_ylabel("height")
fig_noice.savefig("figures/noice_figure.png",dpi="figure")
_images/Plots_174_0.svg

The figure that was exported above looks like this: noice figure

Not great, it seems, because the labels are cut off. The first question is, why it was looking nice in the juypter output but terrible as png? The reason is, that when jupyter displays the image inline, it internally recalculates the bounding box of the figure first, by adding bbox_inches='tight' to savefig. However, that means that the output figure does not have the wanted size in inches anymore. So, the better approach is to force juypter to stop doing that. This needs a magic command:

%config InlineBackend.print_figure_kwargs = {'bbox_inches':None}

Now, the internal display also looks wrong. Yey!

fig_noice = plt.figure(figsize=(3,2), dpi=500)
ax_data = fig_noice.add_subplot()
ax_data.plot(raw_data[:,0],raw_data[:,1:]);
ax_data.set_xlabel("position")
ax_data.set_ylabel("height")
Text(0, 0.5, 'height')
_images/Plots_179_1.svg

We can start to can either manually nudge the bounding box of the figure in the correct locations or rely on matplotlibs helper functions. To show the edge of the figure, I will also enable the figure edge here. Then we will try if matplotlib is smart enough to get the shape right, when we call figure.tight_layout()

fig_noice = plt.figure(figsize=(3,2), dpi=500, 
                       frameon=True, # draw the frame
                       edgecolor="black",# make it a black line
                       linewidth=.1 #give it non-zero width
                      )
ax_data = fig_noice.add_subplot()
ax_data.plot(raw_data[:,0],raw_data[:,1:]);
ax_data.set_xlabel("position")
ax_data.set_ylabel("height")

fig_noice.tight_layout()
_images/Plots_181_0.svg

This looks much better, however, sometimes we really want to use the full size of the figure without any border. We can use the pad argument of figure.tight_layout() to reduce the white space around the edges:

fig_noice = plt.figure(figsize=(3,2), dpi=500, 
                       frameon=True, # draw the frame
                       edgecolor="black",# make it a black line
                       linewidth=.1 #give it non-zero width
                      )
ax_data = fig_noice.add_subplot()
ax_data.plot(raw_data[:,0],raw_data[:,1:]);
ax_data.set_xlabel("position")
ax_data.set_ylabel("height")

fig_noice.tight_layout(pad=0)
_images/Plots_183_0.svg

As a last resort, you can manually set the location of each axes using their set_position() method. Here, we input the bounding box of the the axes is a list that contains the cordinates of the bottom left corner and the width and height of the axes: [left, bottom, width, height] in fractions of the figure size.

fig_noice = plt.figure(figsize=(3,2), dpi=500, 
                       frameon=True, # draw the frame
                       edgecolor="black",# make it a black line
                       linewidth=.1 #give it non-zero width
                      )
ax_data = fig_noice.add_subplot()
ax_data.plot(raw_data[:,0],raw_data[:,1:]);
ax_data.set_xlabel("position")
ax_data.set_ylabel("height")

ax_data.set_position([.1, .2, .5, .5])
_images/Plots_185_0.svg

For fine tuning, you could first call .tight_layout(), then get the positions using ax.get_position() and slightly modify those values using .set_position().

5.7. Other things to check out in matplotlib

5.7.1. Changing styles using style sheets

Matplotlib has a kind of style sheet that allows to define plot layouts beforehand and reuse these layouts. The layouts can be activates using plt.style.use. Built-in styles can be found at https://matplotlib.org/stable/gallery/style_sheets/style_sheets_reference.html

To switch back to the default style, you can use plt.style.use('default')

plt.style.use("fivethirtyeight")

fig = plt.figure()
ax_box= fig.add_subplot()
ax_box.pie([33,33,33], 
           labels=["for the analysis", 
                   "to feel superior", 
                   "for Clare Malone's hot takes"],
          normalize=True)
ax_box.set_title("I listen to fivethirtyeight")

plt.style.use("default")
_images/Plots_190_0.svg

5.7.2. xkcd mode

Matplotlib also has an “xkcd” mode that can be activated using plt.xkcd(). This mode makes all lines wobbly and replaces the font with a hand written looking font:

with plt.xkcd():
    fig = plt.figure()
    ax_bar= fig.add_subplot()
    ax_bar.bar(x=[-1,0,1], height=[1,2,3])
    ax_bar.set_title("Bars of height less than or equal to current bar")
    ax_bar.set_xticks([]);
_images/Plots_193_0.svg

5.7.3. Animations

Matplotlib plots can be animated using for example the FuncAnim class. Here, we define a function that modifies our plot. FuncAnim repeatedly calls this function with arguments taken from a list or iterator. The state of the figure after each call is recorded.

(The rc... command in the second line makes sure that he animation is output to HTML.)

from matplotlib.animation import FuncAnimation


from matplotlib import rc
rc('animation', html='html5')

Our animation is pretty straight forward: we are going to create an x and y vector corresponding to the coordinates of a spiral. Our animation will be to plot an ever longer segments of that spiral.

For each step of the animation, animfunc will called with an increments integer. We will use this integer as the end of our slice into the x and y vector and set the data of the line we are drawing to that coordinate:

# create time axis
t = np.linspace(0, 1, 500)

# x and y components of a spiral
spiral_x = t * 8 * np.cos(2*np.pi*10*t)
spiral_y = t * 8 * np.sin(2*np.pi*10*t)

# the animation function
def animfunc(idx):
    """sets line x and y data up to idx"""
    line.set_xdata(spiral_x[:idx])
    line.set_ydata(spiral_y[:idx])
    return (line,)

To run the animation, we create a new plot, add a line to it - by plotting the full vector firsts, we ensure that the axes limits are big enough to hold the final spiral. Then we create the FuncAnimation object, which will start to run as soon as we display the funcanim.

(fig.clf() clears the figure so it doesn’t show up under the cell twice.)

# create the figure and add a subplot
fig = plt.figure()
ax = fig.add_subplot()
# create a single line that will be animated
line = ax.plot(spiral_x,spiral_y)[0]

# square axes look nicer here
ax.set_aspect(1)

# create the animation
funcanim = FuncAnimation(fig, animfunc, range(len(t)),interval=50)

#display
display(funcanim)
fig.clf();
<Figure size 640x480 with 0 Axes>

5.8. Summary

5.8.1. Loading data:

Use np.genfromtxt to load data from CSV files. If it doesn’t work as expected, check the delimiters. More in the next section.

5.8.2. Indexing

  • Square brackets indicate indexing

  • : are used to select ranges and step sizes

5.8.3. Plotting

  1. create figure fig = plt.figure()

  2. create axes ax = fig.add_subplot()

  3. plot ax.plot() and much, much more.

  4. fig.savefig()

5.8.4. Getting help

The help() function prints the docstring.