5. Plot, plot, plots¶
Let’s jump right in: plotting. In this section, we will look at THE python plotting package: matplotlib
. We will also learn how to import some common data formats and turn measurement data into nice looking plots.
We
We will get to know three external python libraries in this chapter:
matplotlib
: for plottingnumpy
: for new data types that can handle arrays of datascipy
: for advanced operations on such arrays
“External libraries” are libraries that do not come with python by default. They need to be installed first. If you are using Anaconda, then you can install them via the “Anaconda” package manager (in the default Anaconda environment, these three libraries are typically already installed).
5.1. import
¶
In the previous chapter, we already saw how packages/libraries are imported. Below, we see two extensions of the basic import syntax. First, packages can have sub-packages. In the case of matplotlib
, we use the pyplot
sub-package, that gives us easy access to plotting commands. Second, we can rename imports, to save time on typing. In this case, we import matplotlib.pyplot
and rename it as plt
. Now, to call function from matplotlib.pyplot
we can just type plt.
and then the function name.
import matplotlib.pyplot as plt
import numpy as np
import scipy as sp
To enable interactive plotting inside the jupyter notebook, we use the “magic” %matplotlib widget
function. This automatically makes plots zoomable and draggable.
# this is a magic command, that only works in Jupyter
# it tells Jupyter how to display the plots
# widgets are interactive: plots can be zoomed, dragged and so on
#%matplotlib widget
# an alternative is
# %matplotlib inline
# which creates rendered plots, that can't be modified
5.2. The structure of a matplotlib
Figure¶
In matplotlib, all our plots are contained within a figure
. Each figure can hold one more more Axes that we can plot data to. The typical steps to create a plot are:
create figure
create one or more axes
plot data to axes
adjust layout
export
# 1. create figure
fig = plt.figure()
# 2. create Axes
ax1 = fig.add_subplot()
# 3. plot data: first argument here is the data for x,
# second for y
ax1.plot([1,2,3,4], [1,2,2,1], label="This is the first line")
# 4. adjust layout
ax1.set_xlabel("this is the x-axis")
ax1.set_ylabel("this is the y-axis")
# add a legend, automatically filled with everything we labeled
ax1.legend(loc="best")
# 5. export
fig.savefig("our_first_fig.png", dpi=300)
Instead of creating a single axis, we can also put two next to eachother in the figure. Using the add_subplots()
method:
fig = plt.figure()
# The new step, two subplots on top of eachother
axs = fig.subplots(nrows=2)
# We can select plots using indexing
axs[0].plot([1,2,3,4], [1,2,2,1])
axs[1].plot([1,1,0,0], [1,0,0,1]);
# And you can the x and y axis of axes in a figure together
axs[0].sharex(axs[1])
# if you are running this sheet in widget mode,
#try to move one of the plots around
5.3. Loading data from a file¶
So far, so good. Of course, we likely don’t just want to display data we’ve typed in manually. Likely, the data is stored in some file on the disk. If the data is stored in a CSV file, we can easily load it using np.genfromtxt
.
In a CSV file, data is saved in a table of human readable numbers. Values in a row are delimited using a some character that is not found in the rest of the data. This could be a ,
, ;
, a space or a TAB. The minimum information genfromtxt
needs is the path and the delimiter. By default, any series of whitespace (spaces or TABS) is considered the delimiter. genfromtxt
returns the contents of the file as numpy
array.
We want to open data tabulated as in the .csv file “data/gaussians.csv”.
genfromtxt
is a function provided by the numpy package that converts tabular text data into a numpy array. It has quite a few parameters to allow us to accommodate a wide range of different formats. Let’s have a look at the genfromtxt
documentation to see which parameters it has:
help(np.genfromtxt)
Help on function genfromtxt in module numpy:
genfromtxt(fname, dtype=<class 'float'>, comments='#', delimiter=None, skip_header=0, skip_footer=0, converters=None, missing_values=None, filling_values=None, usecols=None, names=None, excludelist=None, deletechars=" !#$%&'()*+,-./:;<=>?@[\\]^{|}~", replace_space='_', autostrip=False, case_sensitive=True, defaultfmt='f%i', unpack=None, usemask=False, loose=True, invalid_raise=True, max_rows=None, encoding='bytes', *, like=None)
Load data from a text file, with missing values handled as specified.
Each line past the first `skip_header` lines is split at the `delimiter`
character, and characters following the `comments` character are discarded.
Parameters
----------
fname : file, str, pathlib.Path, list of str, generator
File, filename, list, or generator to read. If the filename
extension is `.gz` or `.bz2`, the file is first decompressed. Note
that generators must return byte strings. The strings
in a list or produced by a generator are treated as lines.
dtype : dtype, optional
Data type of the resulting array.
If None, the dtypes will be determined by the contents of each
column, individually.
comments : str, optional
The character used to indicate the start of a comment.
All the characters occurring on a line after a comment are discarded
delimiter : str, int, or sequence, optional
The string used to separate values. By default, any consecutive
whitespaces act as delimiter. An integer or sequence of integers
can also be provided as width(s) of each field.
skiprows : int, optional
`skiprows` was removed in numpy 1.10. Please use `skip_header` instead.
skip_header : int, optional
The number of lines to skip at the beginning of the file.
skip_footer : int, optional
The number of lines to skip at the end of the file.
converters : variable, optional
The set of functions that convert the data of a column to a value.
The converters can also be used to provide a default value
for missing data: ``converters = {3: lambda s: float(s or 0)}``.
missing : variable, optional
`missing` was removed in numpy 1.10. Please use `missing_values`
instead.
missing_values : variable, optional
The set of strings corresponding to missing data.
filling_values : variable, optional
The set of values to be used as default when the data are missing.
usecols : sequence, optional
Which columns to read, with 0 being the first. For example,
``usecols = (1, 4, 5)`` will extract the 2nd, 5th and 6th columns.
names : {None, True, str, sequence}, optional
If `names` is True, the field names are read from the first line after
the first `skip_header` lines. This line can optionally be proceeded
by a comment delimiter. If `names` is a sequence or a single-string of
comma-separated names, the names will be used to define the field names
in a structured dtype. If `names` is None, the names of the dtype
fields will be used, if any.
excludelist : sequence, optional
A list of names to exclude. This list is appended to the default list
['return','file','print']. Excluded names are appended an underscore:
for example, `file` would become `file_`.
deletechars : str, optional
A string combining invalid characters that must be deleted from the
names.
defaultfmt : str, optional
A format used to define default field names, such as "f%i" or "f_%02i".
autostrip : bool, optional
Whether to automatically strip white spaces from the variables.
replace_space : char, optional
Character(s) used in replacement of white spaces in the variables
names. By default, use a '_'.
case_sensitive : {True, False, 'upper', 'lower'}, optional
If True, field names are case sensitive.
If False or 'upper', field names are converted to upper case.
If 'lower', field names are converted to lower case.
unpack : bool, optional
If True, the returned array is transposed, so that arguments may be
unpacked using ``x, y, z = genfromtxt(...)``. When used with a
structured data-type, arrays are returned for each field.
Default is False.
usemask : bool, optional
If True, return a masked array.
If False, return a regular array.
loose : bool, optional
If True, do not raise errors for invalid values.
invalid_raise : bool, optional
If True, an exception is raised if an inconsistency is detected in the
number of columns.
If False, a warning is emitted and the offending lines are skipped.
max_rows : int, optional
The maximum number of rows to read. Must not be used with skip_footer
at the same time. If given, the value must be at least 1. Default is
to read the entire file.
.. versionadded:: 1.10.0
encoding : str, optional
Encoding used to decode the inputfile. Does not apply when `fname` is
a file object. The special value 'bytes' enables backward compatibility
workarounds that ensure that you receive byte arrays when possible
and passes latin1 encoded strings to converters. Override this value to
receive unicode arrays and pass strings as input to converters. If set
to None the system default is used. The default value is 'bytes'.
.. versionadded:: 1.14.0
like : array_like
Reference object to allow the creation of arrays which are not
NumPy arrays. If an array-like passed in as ``like`` supports
the ``__array_function__`` protocol, the result will be defined
by it. In this case, it ensures the creation of an array object
compatible with that passed in via this argument.
.. note::
The ``like`` keyword is an experimental feature pending on
acceptance of :ref:`NEP 35 <NEP35>`.
.. versionadded:: 1.20.0
Returns
-------
out : ndarray
Data read from the text file. If `usemask` is True, this is a
masked array.
See Also
--------
numpy.loadtxt : equivalent function when no data is missing.
Notes
-----
* When spaces are used as delimiters, or when no delimiter has been given
as input, there should not be any missing data between two fields.
* When the variables are named (either by a flexible dtype or with `names`),
there must not be any header in the file (else a ValueError
exception is raised).
* Individual values are not stripped of spaces by default.
When using a custom converter, make sure the function does remove spaces.
References
----------
.. [1] NumPy User Guide, section `I/O with NumPy
<https://docs.scipy.org/doc/numpy/user/basics.io.genfromtxt.html>`_.
Examples
--------
>>> from io import StringIO
>>> import numpy as np
Comma delimited file with mixed dtype
>>> s = StringIO(u"1,1.3,abcde")
>>> data = np.genfromtxt(s, dtype=[('myint','i8'),('myfloat','f8'),
... ('mystring','S5')], delimiter=",")
>>> data
array((1, 1.3, b'abcde'),
dtype=[('myint', '<i8'), ('myfloat', '<f8'), ('mystring', 'S5')])
Using dtype = None
>>> _ = s.seek(0) # needed for StringIO example only
>>> data = np.genfromtxt(s, dtype=None,
... names = ['myint','myfloat','mystring'], delimiter=",")
>>> data
array((1, 1.3, b'abcde'),
dtype=[('myint', '<i8'), ('myfloat', '<f8'), ('mystring', 'S5')])
Specifying dtype and names
>>> _ = s.seek(0)
>>> data = np.genfromtxt(s, dtype="i8,f8,S5",
... names=['myint','myfloat','mystring'], delimiter=",")
>>> data
array((1, 1.3, b'abcde'),
dtype=[('myint', '<i8'), ('myfloat', '<f8'), ('mystring', 'S5')])
An example with fixed-width columns
>>> s = StringIO(u"11.3abcde")
>>> data = np.genfromtxt(s, dtype=None, names=['intvar','fltvar','strvar'],
... delimiter=[1,3,5])
>>> data
array((1, 1.3, b'abcde'),
dtype=[('intvar', '<i8'), ('fltvar', '<f8'), ('strvar', 'S5')])
An example to show comments
>>> f = StringIO('''
... text,# of chars
... hello world,11
... numpy,5''')
>>> np.genfromtxt(f, dtype='S12,S12', delimiter=',')
array([(b'text', b''), (b'hello world', b'11'), (b'numpy', b'5')],
dtype=[('f0', 'S12'), ('f1', 'S12')])
In the docstring above, all arguments - except for fname
- are marked as optional, meaning we only need to pass them if the default value doesn’t work for us. Let’s see what happens if we only pass filename.
raw_data = np.genfromtxt("data/gaussians.csv")
raw_data
array([nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
nan, nan, nan, nan, nan, nan, nan, nan, nan, nan])
Well, that doesn’t look great. All these “nan”s show us, that Python wasn’t able to read anything.
Let’s look at the first few lines of the csv file to figure out what went wrong (you can also open the file in a text reader, or even in Jupyter). To keep everything in one place, here we will use Python instead).
We open the file as text file and print out the first few lines. The syntax for safely opening a file in python uses a so called “context manager” or with
block. While discussing the intricacies of context managers goes beyond the scope of this course, we don’t really need to understand them to use them. Just remember, that files are opened in python in the following way:
with open("data/gaussians.csv", "r") as f:
# we only have access to the file in this block
# it is closed, as soon as we leave it
print("first line:", f.readline())
print("second line:",f.readline())
first line: # first column contains x values, other columns y values
second line: -5.000000000000000000e+01,1.663668552769293465e-44,2.107693412240174823e-27,9.230723952874861570e-19,8.396961391172795506e-14,1.064264733787643876e-10,1.242708024595863142e-08,3.458050799114254380e-07,3.858869813994044932e-06,2.341178771255181268e-05,9.315909810606924179e-05,2.743190629113324212e-04,6.478774868841780934e-04,1.297114829420015344e-03,2.289187047352660240e-03,3.663127777746836081e-03
Looking back to the docstring for genfromtxt
we see that the default value for the delimiter is any whitespace character. However, our file uses commas. Hence, we need to adapt the parameters we pass to the function call. Since the first line begins on a #
genfromtxt
automatically disregards it.
We will take another look at loading data in the next section.
raw_data = np.genfromtxt("data/gaussians.csv",
delimiter=",")
raw_data
array([[-5.00000000e+01, 1.66366855e-44, 2.10769341e-27, ...,
1.29711483e-03, 2.28918705e-03, 3.66312778e-03],
[-4.90000000e+01, 8.72716035e-43, 2.31304455e-26, ...,
1.58733595e-03, 2.73567600e-03, 4.29184782e-03],
[-4.80000000e+01, 4.22605890e-41, 2.41848193e-25, ...,
1.93458473e-03, 3.25750237e-03, 5.01241265e-03],
...,
[ 4.80000000e+01, 4.22605890e-41, 2.41848193e-25, ...,
1.93458473e-03, 3.25750237e-03, 5.01241265e-03],
[ 4.90000000e+01, 8.72716035e-43, 2.31304455e-26, ...,
1.58733595e-03, 2.73567600e-03, 4.29184782e-03],
[ 5.00000000e+01, 1.66366855e-44, 2.10769341e-27, ...,
1.29711483e-03, 2.28918705e-03, 3.66312778e-03]])
This already looks much better. Now, luckily, we also know how to interpret the data. The first column is the x-axis, and the following columns represent y-values.
raw_data
is a numpy ndarray. An efficient way to store multidimensional data of one type, in this case floats.
type(raw_data)
numpy.ndarray
The .shape
attribute of the numpy array shows the shape/size of the array
raw_data.shape
(101, 16)
This array has 101 rows and 16 columns. To select a specific row or column, we use indexing (as with tuples and lists). The difference here is, that ndarray support multidimensional indexing and slicing.
For example, to select the first column, we use a colon for the first index (to select all rows) and 0 for the second index. That returns our x-axis:
raw_data[:,0]
array([-50., -49., -48., -47., -46., -45., -44., -43., -42., -41., -40.,
-39., -38., -37., -36., -35., -34., -33., -32., -31., -30., -29.,
-28., -27., -26., -25., -24., -23., -22., -21., -20., -19., -18.,
-17., -16., -15., -14., -13., -12., -11., -10., -9., -8., -7.,
-6., -5., -4., -3., -2., -1., 0., 1., 2., 3., 4.,
5., 6., 7., 8., 9., 10., 11., 12., 13., 14., 15.,
16., 17., 18., 19., 20., 21., 22., 23., 24., 25., 26.,
27., 28., 29., 30., 31., 32., 33., 34., 35., 36., 37.,
38., 39., 40., 41., 42., 43., 44., 45., 46., 47., 48.,
49., 50.])
Let’s write a for
loop, that plots all the columns of the dataset. First, we create a figure and add axes. Then we loop over all columns, starting at column 1. Column 0 contains the x axis values, so we pass it as x
to all plot commands. The column selected by the current index is passed for y
.
We also label the plot, using .format
string formatting to label each line. And then display the legend using ax.legend()
.
# Create figure and axes
fig = plt.figure()
ax_data = fig.add_subplot()
#we start at 1 because column 0 is the x-axis
#we use the shape of the array as the upper end of the range
for col_idx in range(1,raw_data.shape[1]):
ax_data.plot(raw_data[:,0],
raw_data[:,col_idx],
label="Col {}".format(col_idx))
ax_data.legend();
In case we don’t care about the labels, we can also take a shortcut to get to this plot, by using a slice for the second argument. In that case, every column is plotted as one line.
fig = plt.figure()
ax_data = fig.add_subplot()
ax_data.plot(raw_data[:,0],raw_data[:,1:]);
5.4. Plot Layout¶
5.4.1. Colors¶
Now that these basics are out of the way, we can go back to our plot. Let’s talk about lines. By default, matplotlib
uses a series of colors for plotting. This can be overridden by using the color
argument. Possible choices for colors are listed in the matplotlib docs. In short: RGB values can be typed in starting with a “#”, e.g. the TUW color is “#006699”. Some oft-used colors are available as single letter abbreviations (‘b’, ‘g’, ‘r’, ‘c’, ‘m’, ‘y’, ‘k’, ‘w’), then there is a list of named colors from the CSS4 standard and from the xkcd color survey (these need to be prefixed with “xkcd:”).
We can also label lines by using the label
argument. This argument takes a string that is then displayed, together with the line, when we create a legend. Let’s draw some lines and try out these options:
# create figure and add axes
fig = plt.figure()
ax_data = fig.add_subplot()
# let's make colorful lines
ax_data.plot(raw_data[:,0],raw_data[:,2], # x and y values
color="r", # set the line color
label="this line is red") # and add a label
ax_data.plot(raw_data[:,0],raw_data[:,3], # again x and y
color="#006699", # Actual RGB code from TUW CI guideline
label="this line is TUW blue")
ax_data.plot(raw_data[:,0],raw_data[:,4], # x and y
color="xkcd:banana", # using color names from the xkcd survey
label="this line is banana")
# and adding a legend
ax_data.legend()
<matplotlib.legend.Legend at 0x7fdc7d605eb0>
5.4.2. Linestyles and Markers¶
Let’s look at the docstring of ax_data.plot
to see what else we can do:
help(ax_data.plot)
Help on method plot in module matplotlib.axes._axes:
plot(*args, scalex=True, scaley=True, data=None, **kwargs) method of matplotlib.axes._subplots.AxesSubplot instance
Plot y versus x as lines and/or markers.
Call signatures::
plot([x], y, [fmt], *, data=None, **kwargs)
plot([x], y, [fmt], [x2], y2, [fmt2], ..., **kwargs)
The coordinates of the points or line nodes are given by *x*, *y*.
The optional parameter *fmt* is a convenient way for defining basic
formatting like color, marker and linestyle. It's a shortcut string
notation described in the *Notes* section below.
>>> plot(x, y) # plot x and y using default line style and color
>>> plot(x, y, 'bo') # plot x and y using blue circle markers
>>> plot(y) # plot y using x as index array 0..N-1
>>> plot(y, 'r+') # ditto, but with red plusses
You can use `.Line2D` properties as keyword arguments for more
control on the appearance. Line properties and *fmt* can be mixed.
The following two calls yield identical results:
>>> plot(x, y, 'go--', linewidth=2, markersize=12)
>>> plot(x, y, color='green', marker='o', linestyle='dashed',
... linewidth=2, markersize=12)
When conflicting with *fmt*, keyword arguments take precedence.
**Plotting labelled data**
There's a convenient way for plotting objects with labelled data (i.e.
data that can be accessed by index ``obj['y']``). Instead of giving
the data in *x* and *y*, you can provide the object in the *data*
parameter and just give the labels for *x* and *y*::
>>> plot('xlabel', 'ylabel', data=obj)
All indexable objects are supported. This could e.g. be a `dict`, a
`pandas.DataFrame` or a structured numpy array.
**Plotting multiple sets of data**
There are various ways to plot multiple sets of data.
- The most straight forward way is just to call `plot` multiple times.
Example:
>>> plot(x1, y1, 'bo')
>>> plot(x2, y2, 'go')
- Alternatively, if your data is already a 2d array, you can pass it
directly to *x*, *y*. A separate data set will be drawn for every
column.
Example: an array ``a`` where the first column represents the *x*
values and the other columns are the *y* columns::
>>> plot(a[0], a[1:])
- The third way is to specify multiple sets of *[x]*, *y*, *[fmt]*
groups::
>>> plot(x1, y1, 'g^', x2, y2, 'g-')
In this case, any additional keyword argument applies to all
datasets. Also this syntax cannot be combined with the *data*
parameter.
By default, each line is assigned a different style specified by a
'style cycle'. The *fmt* and line property parameters are only
necessary if you want explicit deviations from these defaults.
Alternatively, you can also change the style cycle using
:rc:`axes.prop_cycle`.
Parameters
----------
x, y : array-like or scalar
The horizontal / vertical coordinates of the data points.
*x* values are optional and default to ``range(len(y))``.
Commonly, these parameters are 1D arrays.
They can also be scalars, or two-dimensional (in that case, the
columns represent separate data sets).
These arguments cannot be passed as keywords.
fmt : str, optional
A format string, e.g. 'ro' for red circles. See the *Notes*
section for a full description of the format strings.
Format strings are just an abbreviation for quickly setting
basic line properties. All of these and more can also be
controlled by keyword arguments.
This argument cannot be passed as keyword.
data : indexable object, optional
An object with labelled data. If given, provide the label names to
plot in *x* and *y*.
.. note::
Technically there's a slight ambiguity in calls where the
second label is a valid *fmt*. ``plot('n', 'o', data=obj)``
could be ``plt(x, y)`` or ``plt(y, fmt)``. In such cases,
the former interpretation is chosen, but a warning is issued.
You may suppress the warning by adding an empty format string
``plot('n', 'o', '', data=obj)``.
Returns
-------
list of `.Line2D`
A list of lines representing the plotted data.
Other Parameters
----------------
scalex, scaley : bool, default: True
These parameters determine if the view limits are adapted to the
data limits. The values are passed on to `autoscale_view`.
**kwargs : `.Line2D` properties, optional
*kwargs* are used to specify properties like a line label (for
auto legends), linewidth, antialiasing, marker face color.
Example::
>>> plot([1, 2, 3], [1, 2, 3], 'go-', label='line 1', linewidth=2)
>>> plot([1, 2, 3], [1, 4, 9], 'rs', label='line 2')
If you make multiple lines with one plot call, the kwargs
apply to all those lines.
Here is a list of available `.Line2D` properties:
Properties:
agg_filter: a filter function, which takes a (m, n, 3) float array and a dpi value, and returns a (m, n, 3) array
alpha: float or None
animated: bool
antialiased or aa: bool
clip_box: `.Bbox`
clip_on: bool
clip_path: Patch or (Path, Transform) or None
color or c: color
contains: unknown
dash_capstyle: {'butt', 'round', 'projecting'}
dash_joinstyle: {'miter', 'round', 'bevel'}
dashes: sequence of floats (on/off ink in points) or (None, None)
data: (2, N) array or two 1D arrays
drawstyle or ds: {'default', 'steps', 'steps-pre', 'steps-mid', 'steps-post'}, default: 'default'
figure: `.Figure`
fillstyle: {'full', 'left', 'right', 'bottom', 'top', 'none'}
gid: str
in_layout: bool
label: object
linestyle or ls: {'-', '--', '-.', ':', '', (offset, on-off-seq), ...}
linewidth or lw: float
marker: marker style string, `~.path.Path` or `~.markers.MarkerStyle`
markeredgecolor or mec: color
markeredgewidth or mew: float
markerfacecolor or mfc: color
markerfacecoloralt or mfcalt: color
markersize or ms: float
markevery: None or int or (int, int) or slice or List[int] or float or (float, float) or List[bool]
path_effects: `.AbstractPathEffect`
picker: unknown
pickradius: float
rasterized: bool or None
sketch_params: (scale: float, length: float, randomness: float)
snap: bool or None
solid_capstyle: {'butt', 'round', 'projecting'}
solid_joinstyle: {'miter', 'round', 'bevel'}
transform: `matplotlib.transforms.Transform`
url: str
visible: bool
xdata: 1D array
ydata: 1D array
zorder: float
See Also
--------
scatter : XY scatter plot with markers of varying size and/or color (
sometimes also called bubble chart).
Notes
-----
**Format Strings**
A format string consists of a part for color, marker and line::
fmt = '[marker][line][color]'
Each of them is optional. If not provided, the value from the style
cycle is used. Exception: If ``line`` is given, but no ``marker``,
the data will be a line without markers.
Other combinations such as ``[color][marker][line]`` are also
supported, but note that their parsing may be ambiguous.
**Markers**
============= ===============================
character description
============= ===============================
``'.'`` point marker
``','`` pixel marker
``'o'`` circle marker
``'v'`` triangle_down marker
``'^'`` triangle_up marker
``'<'`` triangle_left marker
``'>'`` triangle_right marker
``'1'`` tri_down marker
``'2'`` tri_up marker
``'3'`` tri_left marker
``'4'`` tri_right marker
``'s'`` square marker
``'p'`` pentagon marker
``'*'`` star marker
``'h'`` hexagon1 marker
``'H'`` hexagon2 marker
``'+'`` plus marker
``'x'`` x marker
``'D'`` diamond marker
``'d'`` thin_diamond marker
``'|'`` vline marker
``'_'`` hline marker
============= ===============================
**Line Styles**
============= ===============================
character description
============= ===============================
``'-'`` solid line style
``'--'`` dashed line style
``'-.'`` dash-dot line style
``':'`` dotted line style
============= ===============================
Example format strings::
'b' # blue markers with default shape
'or' # red circles
'-g' # green solid line
'--' # dashed line with default color
'^k:' # black triangle_up markers connected by a dotted line
**Colors**
The supported color abbreviations are the single letter codes
============= ===============================
character color
============= ===============================
``'b'`` blue
``'g'`` green
``'r'`` red
``'c'`` cyan
``'m'`` magenta
``'y'`` yellow
``'k'`` black
``'w'`` white
============= ===============================
and the ``'CN'`` colors that index into the default property cycle.
If the color is the only part of the format string, you can
additionally use any `matplotlib.colors` spec, e.g. full names
(``'green'``) or hex strings (``'#008000'``).
First, I want to focus on linestyle
, linewidth
and marker
.
linestyle
selects if the line is solid (default), dashed (--
), dotted (:
), dash dotted (-.
)linewidth
is the width of the line in pointsmarker
places a symbol at every point of your line. The full list of possible symbols is given in the docstring.
Here are some examples:
fig = plt.figure()
ax_data = fig.add_subplot()
# let's make colorful lines
ax_data.plot(raw_data[:,0],raw_data[:,2],
color="xkcd:ugly pink", #using xkcd names
# this sets the linewidth to 10pts
linewidth=10,
label="a fat, ugly pink line")
ax_data.plot(raw_data[:,0],raw_data[:,5],
# color entered here as a name
color="purple",
# add markers at each datapoint
marker="x",
label="this is a line with crosses in purple")
ax_data.plot(raw_data[:,0],raw_data[:,9],
# set the linestyle to dashed "--"
linestyle="--",
# and the linewidth to 4pts
linewidth=4,
# the order of keyword arguments doesn't matter
color="xkcd:banana",
label="this is a dashed, banana line")
ax_data.plot(raw_data[:,0],raw_data[:,15],
# remove teh linestyle, to only shop markers
linestyle="none",
# asterisk markers
marker="*",
color="gold",
label="gold stars")
# and add a legend
ax_data.legend();
5.4.3. Formatting axis¶
We already know how to label the axis of your plots. However, to produce publication ready plots, we need to control axis limits, tick label positions and spacings, the axis scale (linear scale, log scale) …
Axis limits are set by using ax.set_xlim(<left edge>,<right edge>)
and ax.set_ylim(<left edge>,<right edge>)
. If you don’t care what the axis limits are but just want them to be inverted (IR spectroscopists, represent!), you can useax.set_xlim(ax.set_xlim()[::-1])
.
You can either place axis ticks manually by using ax.set_xticks([<tick positions>])
or you can use the automatic formatters:
plt.MultipleLocator(<number>)
: places on tick every multiple of<number>
plt.LinearLocator()
: a “nice looking” amount of ticks along the axis, the optionalnumticks
argument can be used to set the number of ticksplt.MaxNLocator(<nbins>,<steps>)
: places at most<nbins>
- 1 ticks along the axis and tries out intervals from<steps>
plt.LogLocator(<base>)
: places ticks for a logarithmic axis
A locator can be assigned either as major or as minor locator. Major ticks are by default labeled with numbers, minor locators are smaller. They are set using the set_major_locator
and set_minor_locator
functions of xaxis and yaxis.
Let’s first start with the manual way using set_xticks
:
# create figure and axes, then plot one line.There is nothing new here.
fig = plt.figure()
ax_data = fig.add_subplot()
ax_data.plot(raw_data[:,0],raw_data[:,2],
color="xkcd:ugly pink",
linewidth=10,
label="a fat, ugly pink line")
# this command sets the ticks
ax_data.set_xticks([-20,0,20]);
The manual way of setting tick locations has the advantage of allowing us the get fancy with tick labels (because we already know where the ticks are). Here is an example:
# setting tick labels
ax_data.set_xticklabels(["minus twenty", "center", "plus twenty"])
fig
Now, let’s look at the more automated version. These give us some control over the layout, but make it very easy to set up nice looking ticks. The MultipleLocator
s put a marker at every multiple of a given number. MaxNLocator
s put at most N markers at “nice” locations along the axis.
The major locator of an axis gets larger ticks and tick labels, the minor locator only gets smaller ticks.
# create a new figure again
fig = plt.figure()
ax_data = fig.add_subplot()
ax_data.plot(raw_data[:,0],raw_data[:,2],
color="xkcd:ugly pink",
linewidth=10,
label="a fat, ugly pink line")
[<matplotlib.lines.Line2D at 0x7fdc7d4f3520>]
# we add locators to the xaxis
ax_data.xaxis.set_major_locator(plt.MultipleLocator(10))
ax_data.xaxis.set_minor_locator(plt.MultipleLocator(1))
# and the the yaxis
ax_data.yaxis.set_major_locator(plt.MaxNLocator(5))
ax_data.yaxis.set_minor_locator(plt.MaxNLocator(50))
#and then display the figure
fig
The above mentioned set_xlim
and set_ylim
methods are used to change the limits of our axis. If you are happy with the limits of an axis but want to invert it, you can use .invert_yaxis()
(IR spectroscopy represent!).
# shifting the limits
ax_data.set_xlim(-20,10)
# flipping the direction of the yaxis
ax_data.invert_yaxis()
fig
If you want to change the color scheme or line style in a plot, without setting the color manually in every .plot
call, you can define a cycler to do it for you.
When you want to modify different parameters, you can “add” cyclers together. Here, for example, we create a cycle that goes through the TU Wien corporate design colors once with solid lines and then a second time with dotted lines:
# import the cycler
from cycler import cycler
# create a figure and add a subplot (we know this)
figure_TU_colors = plt.figure()
ax_data = figure_TU_colors.add_subplot()
# create lists for colors and styles, the mu
TU_colors = ["#5485AB", "#ba4682", "#E18922", "#646363",
"#72add5", "#cd81a8", "#eeb473", "#9d9d9c"]
linestyles = ["-"]*len(TU_colors) + [":"]*len(TU_colors)
# list multiplication is used to make both lists the same length
my_prop_cycler = (cycler(color=TU_colors*2)+\
cycler(linestyle=linestyles))
# the new step: set the prop cycler
ax_data.set_prop_cycle(my_prop_cycler)
# plot all data in one go
ax_data.plot(raw_data[:,0],raw_data[:,1:]);
5.4.4. Annotating plots¶
The ax.annotate
function places a text label and an optional arrow in your plot. We can choose two coordinates for this function, the location of the point we want to annotate xy
, passed as tuple, and the location of the text xytext
. If we only pass xy
, then the text is placed right on top of that point. Otherwise, an arrow is drawn between both points.
For both parameters, we can also choose a coordinate system passed in the xycoords
and textcoord
parameters (see docstring):
Value |
Description |
---|---|
‘figure points’ |
Points from the lower left of the figure |
‘figure pixels’ |
Pixels from the lower left of the figure |
‘figure fraction’ |
Fraction of figure from lower left |
‘axes points’ |
Points from lower left corner of axes |
‘axes pixels’ |
Pixels from lower left corner of axes |
‘axes fraction’ |
Fraction of axes from lower left |
‘data’ |
Use the coordinate system of the object being annotated (default) |
For textcoord
, we can also choose:
Value |
Description |
---|---|
‘offset points’ |
Offset (in points) from the xy value |
‘offset pixels’ |
Offset (in pixels) from the xy value |
Furthermore, we can choose the horizontal or vertical alignment of the text, i.e. how it is positioned relative to its anchor point.
Here are some examples for annotations:
# create the figure
fig = plt.figure()
ax_data = fig.add_subplot()
ax_data.set_ylim(0,.5)
ax_data.plot(raw_data[:,0],raw_data[:,1],
color="xkcd:ugly pink",
linewidth=10,
label="a fat, ugly pink line")
[<matplotlib.lines.Line2D at 0x7fdc7d709e20>]
# basic annotation placed at xy
ax_data.annotate("basic annotation" , xy=(-20, .3))
fig
# horizontal and vertical alignment position the text on top of the point
ax_data.annotate("annotation placed on top of point" ,
xy=(0, .45),
verticalalignment="bottom",
horizontalalignment="center")
fig
# when we add an xytext argument,
# then text is placed at xytext
# and the arrow points towards xy
ax_data.annotate("annotation with arrow" ,
xy=(0, .45),
xytext=(0.1, 0.5),
textcoords="axes fraction", # this text will always be in the same position of the axes
arrowprops={"arrowstyle":"->"})
fig
# arrowprops allow bent arrows to point from xytext to xy
ax_data.annotate("annotation with arrow and fancier line" ,
xy=(0, .45),
xytext=(0.6, 0.5),
textcoords="axes fraction",
arrowprops={"arrowstyle":"->", "connectionstyle":"angle3"})
fig
5.4.5. Mathematical expressions in plot labels/legends¶
You can use \(\LaTeX\) expressions to label plots in matplotlib
. Whenever you pass string that will be displayed as a text in matplotlib, you can use $...$
to indicate the parts that are supposed to be rendered as \(\LaTeX\) formula.
In \(\LaTeX\) formulas, you can write subscripts and superscripts using _
and ^
respectively. If you want to sub/superscript multiple symbols, you can put them in {}
curly braces. Greek letters are available as \alpha
to \omega
for lower case and \Alpha
to \Omega
for upper case.
There is a one caveat: backslashes \
in python strings are interpreted as escape characters that can be used to encode e.g. newlines \n
or tabs \t
. You can tell python not to use \
to escape in a string by prefixing the begin of the string with an r
.
The available \(\LaTeX\) syntax for matplotlib
can be found in the docs as well.
# create another figure
fig = plt.figure()
ax_data = fig.add_subplot()
# set limits because matplotlib doesn't autoscale to annotations
ax_data.set_ylim(-.5,.5)
ax_data.set_xlim(-.5,.5)
#add some annotations with LaTeX
ax_data.annotate(r"$\int_0^{\infty}x dx$" , xy=(0, 0))
ax_data.annotate(r"$\cos(y)$" , xy=(.1, .1))
ax_data.annotate(r"$\Im(y)$" , xy=(-.1, .1))
ax_data.annotate(r"$\Re(x)$" , xy=(-.1, -.1))
ax_data.annotate(r"$\frac{\partial x}{\partial t}$" , xy=(.1, -.1))
ax_data.set_xlabel(r"x / 1")
ax_data.set_ylabel(r"$\cos(x)$ / 1")
ax_data.set_title(r"$\sum \left(\frac{1}{x}\right)$");
5.4.6. Example: Smoothing, peak-picking, plotting¶
One of the most powerful aspects of using matplotlib for visualization is that we can generate plot inputs from python code. That means any thing we can calculate from our data, can be integrated into the plot with a few lines of code. For example, we can load an IR spectrum, display and then use a scipy peak picking function to find peaks.
First, this is what the spectrum looks like (with the traditional reversed x-axis so the gods of FTIR don’t get angry):
# load data (more on that in the next section)
EtOH = np.genfromtxt(fname="data/64-17-5-IR.csv",
skip_header=1,
unpack=True)
# create figure and axes
EtOH_fig = plt.figure()
EtOH_ax = EtOH_fig.add_subplot()
# and plot the x and y data
EtOH_ax.plot(EtOH[0], EtOH[1])
# set labels
EtOH_ax.set_xlabel(r"$\tilde{\nu}$ / $\rm{cm^{-1}}$")
EtOH_ax.set_ylabel(r"$Absorption / 1")
# invert xaxis
EtOH_ax.invert_xaxis()
The data looks a bit noisy, so step 1 is smoothing. Typically, IR people like to use a Savitzky-Golay filter to smooth spectra. Conceptually, the Savitzky-Golay filter performs a least squares fit of a polynomial to data points inside a small window and uses that polynomial to determine the smoothed value at the center point of the window.
The two parameters we need choose are the degree of the polynomial and the size of the window. Higher degree leads to more noise in the output but better adherence to the spectrum, whereas a larger window size leads to a smoother spectrum with the potential of “ironing out” sharp spectral features.
The scipy.signal
function savgol_filter
implements this filter. We can use matplotlib to determine the right window size (we will set the degree of the polynomial to 2, which seems to be a commonly accepted default value).
In the plot below, we first plot the raw spectrum and then plot spectra with increasing smoothing on top:
from scipy.signal import savgol_filter
# create figure and plot original spectrum
EtOH_fig = plt.figure()
EtOH_ax = EtOH_fig.add_subplot()
EtOH_ax.plot(EtOH[0], EtOH[1], color="black", linewidth=4, label="original")
[<matplotlib.lines.Line2D at 0x7fdc62ad81c0>]
# for loop to iterate over window sizes
for window_length in range(3,17,2):
EtOH_ax.plot(EtOH[0],
savgol_filter(EtOH[1],window_length=window_length, polyorder=2),
label="window: {}".format(window_length))
# and one extreme candidate
EtOH_ax.plot(EtOH[0],
savgol_filter(EtOH[1],window_length=55, polyorder=2),
label="window: {}".format(55))
# add legend and labels
EtOH_ax.legend()
EtOH_ax.set_xlabel(r"$\tilde{\nu}$ / $\rm{cm^{-1}}$")
EtOH_ax.set_ylabel("Absorption / 1")
EtOH_ax.invert_xaxis()
EtOH_fig
It looks like smoothing up to window lengths of 13 is still ok. The very large window size of 55 was added to show how too much smoothing would look like. Is the smoothing enough to remove the noise in the baseline e.g. between 2000 cm-1 and 1500 cm-1? We can zoom our plot in that range to check it out.
When we zoom in by setting the x limits, matplotlib will still keep the y axis scaled to the maximum extension of the data. Hence we need to set x limits as well:
EtOH_ax.set_xlim([2000, 1500])
EtOH_ax.set_ylim(0,.05)
EtOH_fig
A window length of 11 seems to be a good compromise. We will store a smoothed version of this spectrum:
# important note: numpy arrays are mutable
# -> if you assign the same array to multiple variables
# changing one, changes all of them
# copy prevents that
EtOH_smooth = EtOH.copy()
# store smoothed spectrum in second row of array:
EtOH_smooth[1] = savgol_filter(EtOH[1], window_length=11, polyorder=2)
And now we can plot our smoothed spectrum again:
EtOH_smooth_fig = plt.figure()
EtOH_ax = EtOH_smooth_fig.add_subplot()
EtOH_ax.plot(EtOH_smooth[0], EtOH_smooth[1], label="smoothed")
EtOH_ax.set_xlabel(r"$\tilde{\nu}$ / $\rm{cm^{-1}}$")
EtOH_ax.set_ylabel(r"Absorption / 1")
EtOH_ax.invert_xaxis()
Next, we want to mark the peak positions in this spectrum. The function find_peaks
from scipy_signal
does that for us. Let’s import it.
from scipy.signal import find_peaks
find_peaks
gives us peak positions in terms of data indices. We can use these to index the numpy array for wavenumbers and intensity to get peak locations in ax and y. Then we will plot the peak positions as cross hairs, by int .plot
method setting the linestyle="none"
and marker="x"
# the usual: create a plot and plot the spectrum
EtOH_smooth_fig = plt.figure()
EtOH_ax = EtOH_smooth_fig.add_subplot()
EtOH_ax.plot(EtOH_smooth[0], EtOH_smooth[1])
EtOH_ax.set_xlabel(r"$\tilde{\nu}$ / $\rm{cm^{-1}}$")
EtOH_ax.set_ylabel(r"Absorption / 1")
EtOH_ax.invert_xaxis()
# this returns a list of peak indices and associated properties
peaks, properties = find_peaks(EtOH_smooth[1])
# we index the x and y arrays to get the x and y coordinates of the peaks
peaks_x = EtOH_smooth[0][peaks]
peaks_y = EtOH_smooth[1][peaks]
# plot as x markers
EtOH_ax.plot(peaks_x, peaks_y,linestyle="none", marker="x", label="peak positions")
EtOH_ax.legend()
EtOH_smooth_fig
It seems, the find_peaks
has marked a lot of parts of the spectrum as peak that we probably wouldn’t consider to be one. Luckily, we can use optional arguments to fine tune it. For example, we can set a minimum width, to reject single point local maxima as peaks and set a minimum peak height:
peaks, properties = find_peaks(EtOH_smooth[1],
width=1.5,
height=.05)
peaks_x = EtOH_smooth[0][peaks]
peaks_y = EtOH_smooth[1][peaks]
EtOH_ax.plot(peaks_x, peaks_y,linestyle="none", marker="o", label="better peak positions")
EtOH_ax.legend()
EtOH_smooth_fig
Next, we don’t just want the peaks to be marked with crosses, but also display their wavenumber using .annotate
. We already have positions of the peaks, so we just need to write a bit of code to loop over all positions and put annotations there:
# create another figure
EtOH_smooth_fig = plt.figure()
EtOH_ax = EtOH_smooth_fig.add_subplot()
EtOH_ax.plot(EtOH_smooth[0], EtOH_smooth[1])
EtOH_ax.invert_xaxis()
EtOH_ax.set_xlabel(r"$\tilde{\nu}$ / $\rm{cm^{-1}}$")
EtOH_ax.set_ylabel(r"Absorption / 1")
Text(0, 0.5, 'Absorption / 1')
# plot x markers
EtOH_ax.plot(peaks_x, peaks_y,linestyle="none", marker="x", label="peak positions")
# iterate over all peak indices,
for peak in peaks:
peak_x = EtOH_smooth[0][peak]
peak_y = EtOH_smooth[1][peak]
an = EtOH_ax.annotate(text="{:.0f}".format(peak_x),
xy=(peak_x, peak_y),
xycoords = "data", # this is the default anyway
xytext = (peak_x, 20),
textcoords = ("data", 'offset pixels'), #move label a bit away from data
rotation=90, # rotate by 90 degrees
ha="center") # center
# increae top a bit to fit all labels
EtOH_ax.set_ylim(top=1.35)
EtOH_smooth_fig
Or we can add the peak labels at the top of the figure by changing the y coordinate of textcoords to “axes fraction” and the vertical alignment va
to “top”. In this mode, it also makes sense to add a line between the peak and the label.
# create another figure
EtOH_smooth_fig = plt.figure()
EtOH_ax = EtOH_smooth_fig.add_subplot()
EtOH_ax.plot(EtOH_smooth[0], EtOH_smooth[1])
EtOH_ax.invert_xaxis()
EtOH_ax.set_xlabel(r"$\tilde{\nu}$ / $\rm{cm^{-1}}$")
EtOH_ax.set_ylabel(r"Absorption / 1")
EtOH_ax.set_ylim(top=1.5)
(-0.05475795580419598, 1.5)
for peak in peaks:
peak_x = EtOH_smooth[0][peak]
peak_y = EtOH_smooth[1][peak]
an = EtOH_ax.annotate(text="{:.0f}".format(peak_x),
xy=(peak_x, peak_y),
xycoords = "data", # this is the default anyway
xytext = (peak_x, 1),
textcoords = ("data", 'axes fraction'), #move label a bit away from data
rotation=90,
ha="center",
va="top",
arrowprops={"arrowstyle":"-",
"shrinkA":5,
"shrinkB":20})
EtOH_smooth_fig
5.5. More plot types?¶
Matplotlib can do significantly more than just create line plots. First, you can look at additional ways to plot into cartesian coordinates. Some examples:
5.5.1. Pie charts¶
A pie chart takes fractional sizes of the pie slices. When the normalize
argument is set to True
then the the sizes of slices are normalized to a full circle. Pie charts also allow a significant amount of customization. Read the docstring to learn more. We can set labels, change colors, change how the slices are oriented and so on:
# create figure
fig = plt.figure()
ax_box= fig.add_subplot()
ax_box.set_title("Programming is:")
#create pie chart
ax_box.pie([2,2,96],
labels=["knowing syntax",
"experience",
"Googling 'How do I X in python?'"],
normalize=True);
For pie charts, too, there are quite a few arguments for customization. By setting labeldistance
to None
to removes the labels, startangle
set the rotation of the pie chart. Also, legend
works on pie charts too.
fig = plt.figure()
ax_pie= fig.add_subplot()
ax_pie.pie([.8, .2],
labels=["Pac man", "sometimes Pac man"],
normalize=False,
colors=["xkcd:banana", "lightgrey"],
labeldistance=None,
startangle=30)
ax_pie.legend()
<matplotlib.legend.Legend at 0x7fdc6071d790>
5.5.2. Visualizing distributions: histogram, box plot, violin plot¶
These three plot types are useful for understanding the distribution of data, e.g. for exploratory data analysis. These plots take as inputs an array (or a 2D array) of data but they don’t depict the actual data but their distribution, i.e. the density of values within the array.
We will use the scipy.stats package to generate datasets with values of three different distributions: exponential, normal and a bimodal normal distribution.
# we use the scipy.stats package to create some randomly distributed data
# following three different distributions
import scipy.stats as sps
# 200 samples of normally distributed data:
norm = sps.norm(4).rvs(200)
# 200 samples of exponentially
expo = sps.expon(1).rvs(200)
# 100 samples each of two normally distributed datasets
bimod = (np.hstack([sps.norm(loc=-2).rvs(100), sps.norm(loc=2).rvs(100)]))
#shuffle to mix the two distributions
np.random.shuffle(bimod)
If we just plot these samples, we have a hard time to see how the data is distributed:
fig = plt.figure()
ax_pointcloud = fig.add_subplot()
ax_pointcloud.plot(norm, label="norm",linestyle="", marker="x")
ax_pointcloud.plot(expo, label="expo",linestyle="", marker="x")
ax_pointcloud.plot(bimod, label="bimod",linestyle="", marker="x")
ax_pointcloud.legend()
<matplotlib.legend.Legend at 0x7fdc606e04f0>
The hist
plot creates histograms by putting the samples into several bins and then drawing bars with heights corresponding to the number of samples per bin. This makes it much easier to see the distribution:
fig = plt.figure()
ax3 = fig.add_subplot()
# we pass the three datasets in a list
ax3.hist([norm,expo, bimod],
bins=20, # the number of bins to use
label=["norm", "expo", "bimod"])
ax3.legend()
<matplotlib.legend.Legend at 0x7fdc609eaac0>
The boxplot is often used to check if datasets have similar distributions. By default, they show the median of the dataset as a horizontal line, the quartiles and outliers in the dataset (points, also called fliers). To quickly check if two variables are potentially significantly different, you can compare their boxplots. If they show little overlap (i.e. the median of one lies outside the box of the other), than that is a good sign for a significant difference.
fig = plt.figure()
ax3 = fig.add_subplot()
ax3.boxplot([norm,expo, bimod])
# labeling box plots is a bit annoying
# we set ticks to have integer numbers:
ax3.set_xticks([1,2,3])
# and then label starting from tick 1
ax3.set_xticklabels(["norm", "exp", "bimod"])
[Text(1, 0, 'norm'), Text(2, 0, 'exp'), Text(3, 0, 'bimod')]
Violinplots are similar to boxplots, but they also allow us to gain some insight into shape of the distribution. They work by estimating the the probability density function using an empirical method and plotting them in place of the box plot:
fig = plt.figure()
ax3 = fig.add_subplot()
ax3.violinplot([norm,expo, bimod])
# labelling with like with box plots
ax3.set_xticks([1,2,3])
ax3.set_xticklabels(["norm", "exp", "bimod"])
[Text(1, 0, 'norm'), Text(2, 0, 'exp'), Text(3, 0, 'bimod')]
5.5.3. 2.5D plots: contour, contourf, pcolormesh, imshow¶
These plot types can be used to display are 2 dimensional datasets of the type \(f(x,y)\).
Contour and contourf draw isolines from a 2D dataset, like you would see on a topographic map. The difference is, that contourf fills between the lines, while contour draws just the lines. Let’s start with a 2D dataset. We generate to vectors with equal spacing between -1 and 1 and then use the np.meshgrid
function to turn get two 2D arrays where one has column wise values of x and the other row wise values of y. Then we use them to calculate the difference of two 2D gaussians:
# linspace to create equally spaced data
x = np.linspace(-1,1,50)
y = np.linspace(-1,1,50)
# meshgrid to create 2D arrays for x ans y
X, Y = np.meshgrid(x,y)
z = 1.1*np.exp(- ((X-.1)**2 +(Y-.1)**2)/.5**2) -\
1.0*np.exp(- ((X-.1)**2 +(Y+.1)**2)/.5**2) +\
0.9*np.exp(- ((X+.1)**2 +(Y+.1)**2)/.5**2) -\
1.2*np.exp(- ((X+.1)**2 +(Y-.1)**2)/.5**2)
Ignore the plotting commands for now, this is what the X and Y arrays look like:
contour_fig = plt.figure()
Xax, Yax= contour_fig.subplots(ncols=2, sharex=True, sharey=True)
mappableX=Xax.pcolormesh(np.arange(len(x)), np.arange(len(y)), X, shading="nearest")
mappableY=Yax.pcolormesh(np.arange(len(x)), np.arange(len(y)), Y, shading="nearest")
Xax.set_aspect(1)
Yax.set_aspect(1)
contour_fig.colorbar(mappableX, ax=Xax)
contour_fig.colorbar(mappableY, ax=Yax)
<matplotlib.colorbar.Colorbar at 0x7fdc606066a0>
If we pass just the z array to contour, then it generates a plot of isolines that looks as follows:
# create figure and axes, as per usual
contour_fig = plt.figure()
contour_ax = contour_fig.add_subplot()
# contour plot
cont_mappable=contour_ax.contour(z);
When x and y should have the same spacing/unit, we can set the aspect ratio to 1 using set_aspect()
and get the correct shape
contour_ax.set_aspect(1)
contour_fig
The x and y values here are indices. The shading of the lines corresponds to the value of z
. We can add a colorbar to get an idea what those values are. Note that the method is contour_fig.colorbar
and not contour_ax.colorbar
# remember that `cont_mappable` was the output of the contour function
cbar = contour_fig.colorbar(cont_mappable)
contour_fig
The colorbar can be formatted like a plot axis. We can add a label to is, we can also set tick locations and formats.
cbar.set_ticks([-.2, +.2])
cbar.set_label("height / 1")
contour_ax.set_aspect(1)
contour_fig
We can also add the x and y vectors to the function call to have the correct units along the axes. contour
and contourf
would accept either the 2D arrays or the 1D inputs to meshgrid
:
# create figure
contour_fig = plt.figure()
contour_ax = contour_fig.add_subplot()
# plot contour, with x and y vectors
cont_mappable = contour_ax.contour(x,y,z)
# colorbar and formatting
cbar = contour_fig.colorbar(cont_mappable)
cbar.set_label("height / 1")
contour_ax.set_aspect(1)
contourf
works exactly the same way as contour
. The only difference is that it fills the areas between isolines:
# create figure:
contour_fig = plt.figure()
contour_ax = contour_fig.add_subplot()
# the same arguments as before_
cont_mappable = contour_ax.contourf(x,y,z)
# colorbar and formatting
cbar = contour_fig.colorbar(cont_mappable)
cbar.set_label("height / 1")
contour_ax.set_aspect(1)
It’s weird though, that we added four gaussians and now can only make out three in this image. It seems one of them is too low to actually show up in the plot. We can increase the number of lines to change that:
# figure
contour_fig = plt.figure()
contour_ax = contour_fig.add_subplot()
# contourf plot
cont_mappable = contour_ax.contourf(x,y,z,
# more levels:
levels=100)
# formatting and colorbar
cbar = contour_fig.colorbar(cont_mappable)
cbar.set_label("height / 1")
contour_ax.set_aspect(1)
We can also combine contour and contourf in a single plot. We set the colors of the contour lines to black (otherwise they take a color corresponding to the height and are invisible) and use the .add_lines
method of the colorbar to add the isolines to the bar as well:
# create the figure
contour_fig = plt.figure()
contour_ax = contour_fig.add_subplot()
# store outputs of contourf and contour in different variables
contf_mappable = contour_ax.contourf(x,y,z,levels=100)
cont_mappable = contour_ax.contour(x,y,z,levels=5, colors="black")
# colorbar for contourf
cbar = contour_fig.colorbar(contf_mappable)
# add_lines: isolines in colorbar
cbar.add_lines(cont_mappable)
# formatting
cbar.set_label("height / 1")
contour_ax.set_aspect(1)
For large datasets generating a smooth plot using contourf can lead to problems because matplotlib figures out the location of the required isolines by interpolating the dataset.
If we are want to generate a smooth false color image from a 2D array, then pcolormesh
is more suitable. It creates a continuous image by interpolating between the specified grid points. The call signature is similar to that of contourf, however, we don’t need to set a high number of levels to get a smooth image anymore. And we need to select how the interpolation between data points is generated, using the shading
argument. Here “nearest” can sometimes look pixelated for images with few pixels. “gouraud” makes the images smoother
# create figure
contour_fig = plt.figure()
contour_ax = contour_fig.add_subplot()
# pcolormesh generate the image
contf_mappable = contour_ax.pcolormesh(x,y,z,
# shading should either be nearest or gouraud
shading='nearest')
# we can combine this with contour lines as well
cont_mappable = contour_ax.contour(x,y,z,levels=5, colors="black")
# and create a colorbar
cbar = contour_fig.colorbar(contf_mappable)
cbar.add_lines(cont_mappable)
cbar.set_label("height / 1")
contour_ax.set_aspect(1)
contour_fig = plt.figure()
contour_ax = contour_fig.add_subplot()
contf_mappable = contour_ax.pcolormesh(x,y,z,
# this is the only change from the previous figure
shading='gouraud'
)
cont_mappable = contour_ax.contour(x,y,z,levels=5, colors="black")
cbar = contour_fig.colorbar(contf_mappable)
cbar.add_lines(cont_mappable)
cbar.set_label("height / 1")
contour_ax.set_aspect(1)
5.5.4. Colormaps¶
The choice of colormap and the extent of the color range are crucial for depicting 2D data.
The matplotlib documentation has a really helpful discussion on different types of colormaps, so I will only give brief intro here. We will focus on these three types of colormaps:
Sequential colormaps continuously change from a low value to a high value
Diverging colormaps have two different colors at the edges that converge to the same color in the center
Cyclic colormaps have the same color on both edges and change in between
5.5.4.1. Sequential Colormaps¶
We have already been using a sequential colormap (matplotlibs default viridis). Simpler ones use a monotonic change from white to blue. We select colormaps by passing them to the cmap
argument of contour, contourf or pcolormesh. Matplotlib’s built-in colormaps are found (among other places) in plt.cm
.
# create figure
contour_fig = plt.figure()
contour_ax = contour_fig.add_subplot()
# create pcolomesh
contf_mappable = contour_ax.pcolormesh(x,y,z,
# we select a differnt colormap
cmap=plt.cm.Blues,
shading='gouraud')
# the rest is the same as before
cont_mappable = contour_ax.contour(x,y,z,levels=5, colors="black")
cbar = contour_fig.colorbar(contf_mappable)
cbar.add_lines(cont_mappable)
cbar.set_label("height / 1")
contour_ax.set_aspect(1)
# this is the copper colormap
contour_fig = plt.figure()
contour_ax = contour_fig.add_subplot()
contf_mappable = contour_ax.pcolormesh(x,y,z,
shading="gouraud",
#copper
cmap=plt.cm.copper)
cont_mappable = contour_ax.contour(x,y,z,levels=5, colors="black")
cbar = contour_fig.colorbar(contf_mappable)
cbar.add_lines(cont_mappable)
cbar.set_label("height / 1")
contour_ax.set_aspect(1)
5.5.4.2. Diverging Colormaps¶
Diverging colormaps are used to depict data sets where the absolute location of the value above a plane matters. For example, when we want to depict charge densities it matters if we are looking at a positive or negative charge. Examples are “RdBu” or “bwr”:
#again, only the colormap changes
contour_fig = plt.figure()
contour_ax = contour_fig.add_subplot()
contf_mappable = contour_ax.pcolormesh(x,y,z,
shading="gouraud",
cmap=plt.cm.bwr)
cont_mappable = contour_ax.contour(x,y,z,levels=5, colors="black")
cbar = contour_fig.colorbar(contf_mappable)
cbar.add_lines(cont_mappable)
cbar.set_label("height / 1")
contour_ax.set_aspect(1)
Unfortunately, since it matters where exactly the white “zero” ends up in our plot for diverging colormaps, it is important that we set the edges of their range correctly. A very comfortable way to do this is the Normalize class provided by matplotlib.
This class converts input float values to values between 0 and 1. We can let it set the range automatically from our data or manually adjust it as needed. When the Normalize object is first used to scale data (in this code that happens in the call to contourf, since we passed it as optional argument there), it automatically sets its limits.
# we create a new Normalize object
norm = plt.Normalize()
contour_fig = plt.figure()
contour_ax = contour_fig.add_subplot()
contf_mappable = contour_ax.pcolormesh(x,y,z,
shading="gouraud",
cmap=plt.cm.bwr,
# norm is passed as kw argument for norm
norm=norm)
cont_mappable = contour_ax.contour(x,y,z,levels=5,
colors="black",
# and also for the contour plot
norm=norm)
# everything else stays the same
cbar = contour_fig.colorbar(contf_mappable)
cbar.add_lines(cont_mappable)
cbar.set_label("height / 1")
contour_ax.set_aspect(1)
That per se does not make the color map symmetric around the zero plane. To achieve that, we need to set the maximum and minimum values of the colormap from the limits of the dataset. By using the maximum absolute value of z
we make sure that the largest and the smallest value are included in the colormap. And by setting vmin to -vmax, we make it symmetric around 0:
# again, create the Normalie
norm = plt.Normalize()
# set the minimum and maximum values to the absolute maximum
norm.vmax = np.max(np.abs(z))
# make it symmetric by setting the minimum to - max
norm.vmin = -norm.vmax
# nothing new here (norm=norm as before)
contour_fig = plt.figure()
contour_ax = contour_fig.add_subplot()
contf_mappable = contour_ax.pcolormesh(x,y,z,
shading="gouraud",
cmap=plt.cm.bwr,
norm=norm)
cont_mappable = contour_ax.contour(x,y,z,levels=5,
colors="black",
norm=norm)
cbar = contour_fig.colorbar(contf_mappable)
cbar.add_lines(cont_mappable)
cbar.set_label("height / 1")
contour_ax.set_aspect(1)
When we are looking at noisy measurements, it might also make sense to exclude the extremes of the dataset in the colorbar, especially when they are actually outliers. In these cases, using the quantiles of the data ensures that most of the dataset is still within the colorbar but we don’t have to manually set limits ourselves.
Let’s add some outliers and noise to our height data and store it in the variable z_ noise
#generate noisy data
z_noise = z + .001/(1-np.random.exponential(1, size=z.shape))
If we just pass this data to pcolormesh
the result is not great: a single outlier drags our scale very far from where most of the changes in the dataset happen
# here, only the z data changed
norm = plt.Normalize()
norm.vmax = np.max(np.abs(z_noise))
norm.vmin = -norm.vmax
contour_fig = plt.figure()
contour_ax = contour_fig.add_subplot()
contf_mappable = contour_ax.pcolormesh(x,y,z_noise,
shading="gouraud",
cmap=plt.cm.bwr,
norm=norm)
cbar = contour_fig.colorbar(contf_mappable)
cbar.set_label("height / 1")
contour_ax.set_aspect(1)
Obviously, we are not interested in those spikes, but in the smaller, slower changes in the center of the data.
We use np.quantile
to get the 1% and 99% quantiles of our data and use that to set vmin and vmax of the Normalize instance. The data is still noisy and has spikes, but at least we now see the overal pattern and not just a few outliers.
norm = plt.Normalize()
# we first set the upper limit using a quantile
norm.vmax = max(abs(np.quantile(z_noise, .99)), abs(np.quantile(z_noise, .01)))
# then the again - vmax for the lower limit
norm.vmin = -norm.vmax
contour_fig = plt.figure()
contour_ax = contour_fig.add_subplot()
contf_mappable = contour_ax.pcolormesh(x,y,z_noise,
# nearest works better here
shading="nearest",
cmap=plt.cm.bwr,
norm=norm)
cbar = contour_fig.colorbar(contf_mappable)
cbar.set_label("height / 1")
contour_ax.set_aspect(1)
Cyclical colormaps are typically used to depict information like phases and angles. An angle between two vectors can’t be larger than 360°, and 0° and 360° are identical angles. Hence, cyclical colormaps start and end on the same color.
In this example below, we calculate the gradient of our data set and from the direction (angle) of the biggest increase of the values.
# the gradient of the dataset points towards the largest slope
grad = np.gradient(z,x,y)
# this calculates the angle of the gradient
angle = np.arctan2(grad[0],grad[1])/np.pi*180
# we make sure the min and max of the scale are +/-180
norm = plt.Normalize()
norm.vmin=-180
norm.vmax=180
# nothing new here
contour_fig = plt.figure()
contour_ax = contour_fig.add_subplot()
contf_mappable = contour_ax.pcolormesh(x,y,angle,
shading="gouraud",
cmap=plt.cm.twilight,
norm=norm)
cont_mappable = contour_ax.contour(x,y,z, colors="k")
cbar = contour_fig.colorbar(contf_mappable)
cbar.set_label("angle / °")
contour_ax.set_aspect(1)
The same plot with a sequential colormap has jumps as we go from +180 to -180.
# nothing new here
contour_fig = plt.figure()
contour_ax = contour_fig.add_subplot()
contf_mappable = contour_ax.pcolormesh(x,y,angle,
shading="gouraud",
cmap=plt.cm.Blues,
norm=norm)
cont_mappable = contour_ax.contour(x,y,z, colors="k")
cbar = contour_fig.colorbar(contf_mappable)
cbar.set_label("angle / °")
contour_ax.set_aspect(1)
To encode direction and magnitude, we can also use the quiver plot. This plot type takes x and y coordinates and two additional parameter u and v that are the magnitude of a vector in x and x direction, respectively. It then draws arrows that point in the direction of those vectors and have a length corresponding to the vector’s magnitude. Typically, such a plot would be used to depict force fields or velocity fields.
# create the figure and the angle plot, as before
contour_fig = plt.figure()
contour_ax = contour_fig.add_subplot()
contf_mappable = contour_ax.pcolormesh(x,y,angle,
shading="gouraud",
cmap=plt.cm.twilight,
norm=norm)
# we only add a few arrows here, by slicing x,y, and the gradient
contour_ax.quiver(x[::5],y[::5],
#x component of arrow
grad[0][::5,::5],
#y component of arrow
grad[1][::5,::5])
#formatting
cbar = contour_fig.colorbar(contf_mappable)
cbar.set_label("angle / °")
cbar.set_ticks([-180,90,0,90,180])
contour_ax.set_aspect(1)
You can also quite easily create custom colormaps from lists of colors. We import LinearSegmentedColormap
from matplotlib.colors
and then use its .from_list
classmethod to create a new colormap. The first argument is the name of the colormap, the second argument is a list of colors. Between these colors, matplotlib will interpolate linearly.
# import
from matplotlib.colors import LinearSegmentedColormap
# create colormap and store in variable
my_nice_map = LinearSegmentedColormap.from_list("fluorescence", # name of the colormap
["black", "#8ffe09"]# list of colors
)
norm = plt.Normalize()
# we set the lower end of norm to 0 to give nice bright peaks
# against a dark background. Not really sensible.
norm.vmin=0
contour_fig = plt.figure()
contour_ax = contour_fig.add_subplot()
contf_mappable = contour_ax.pcolormesh(x,y,z,
shading="gouraud",
cmap=my_nice_map,
norm=norm)
cbar = contour_fig.colorbar(contf_mappable)
cbar.add_lines(cont_mappable)
cbar.set_label("height / 1")
contour_ax.set_aspect(1)
Finally, imshow is used to draw pixel images with equal pixel spacing in each direction. Those images can either be false color like the previous examples or an RGB image. Which one is used depends on he dimensions of the input array to imshow. If that array is 2D then a false color image is drawn using he selected colormap. If an array of dimension MxNx3 is passed, then each layer is use as one color channel.
First, the false color image:
norm = plt.Normalize()
contour_fig = plt.figure()
contour_ax = contour_fig.add_subplot()
# imshow does not accept x and y!
contf_mappable = contour_ax.imshow(z,
cmap=plt.cm.Blues,
norm=norm)
cbar = contour_fig.colorbar(contf_mappable)
cbar.add_lines(cont_mappable)
cbar.set_label("height / 1")
contour_ax.set_aspect(1)
Two things immediately jump out at us:
the image appears flipped relative to previous plots
it also looks quite pixelated
we can’t pass x and y arrays for the coordinates.
The first problem we can solve quite easily by setting the image origin to the lower left corner. We can also change the interpolation method to create a smoother image.
norm = plt.Normalize()
contour_fig = plt.figure()
contour_ax = contour_fig.add_subplot()
# there are some differences in the imshow command
contf_mappable = contour_ax.imshow(z,
cmap=plt.cm.Blues,
norm=norm,
origin="lower",#changed origin
interpolation="bilinear")# interpolation
cbar = contour_fig.colorbar(contf_mappable)
cbar.add_lines(cont_mappable)
cbar.set_label("height / 1")
contour_ax.set_aspect(1)
The location of the image can be set using the extent
argument. This argument requires a list that gives the left , right, bottom and top coordinates of the image edges:
norm = plt.Normalize()
contour_fig = plt.figure()
contour_ax = contour_fig.add_subplot()
contf_mappable = contour_ax.imshow(z,
cmap=plt.cm.Blues,
norm=norm,
origin="lower",
interpolation="bilinear",
extent=(x[0],x[-1],y[0],y[-1]))# added extents
cbar = contour_fig.colorbar(contf_mappable)
cbar.add_lines(cont_mappable)
cbar.set_label("height / 1")
contour_ax.set_aspect(1)
Displaying a an RGB image is straight forward. We load it using plt.imread and then pass the loaded data to plt.imshow.
logo = plt.imread("figures/logo.png")
fig_logo = plt.figure()
ax_logo = fig_logo.add_subplot()
ax_logo.imshow(logo)
<matplotlib.image.AxesImage at 0x7fdc5fc42f10>
5.6. Saving plots and preparing for publication¶
Once our plot is finished, we of course want to export it. As we already saw, this can be done using the figure.savefig
method. You will notice that sometimes, exported figures end up looking differently than they looked in jupyter. This typically happens, when the resolution used for display in jupyter (in dpi) is different than the one used in savefig
. Therefore, if you want high-res figures (>100 dpi), it is a good idea to set the resolution and the final size of the figure (in inches) in the figure call.
This next figure look very large and blodgy in jupyter, but that is because it is supposed to be displayed at a resolution of 500 dpi and should be much small once exported:
fig_noice = plt.figure(figsize=(3,2), dpi=500)
ax_data = fig_noice.add_subplot()
ax_data.plot(raw_data[:,0],raw_data[:,1:]);
ax_data.set_xlabel("position")
ax_data.set_ylabel("height")
fig_noice.savefig("figures/noice_figure.png",dpi="figure")
The figure that was exported above looks like this:
Not great, it seems, because the labels are cut off. The first question is, why it was looking nice in the juypter output but terrible as png? The reason is, that when jupyter displays the image inline, it internally recalculates the bounding box of the figure first, by adding bbox_inches='tight'
to savefig. However, that means that the output figure does not have the wanted size in inches anymore. So, the better approach is to force juypter to stop doing that. This needs a magic command:
%config InlineBackend.print_figure_kwargs = {'bbox_inches':None}
Now, the internal display also looks wrong. Yey!
fig_noice = plt.figure(figsize=(3,2), dpi=500)
ax_data = fig_noice.add_subplot()
ax_data.plot(raw_data[:,0],raw_data[:,1:]);
ax_data.set_xlabel("position")
ax_data.set_ylabel("height")
Text(0, 0.5, 'height')
We can start to can either manually nudge the bounding box of the figure in the correct locations or rely on matplotlibs helper functions. To show the edge of the figure, I will also enable the figure edge here. Then we will try if matplotlib is smart enough to get the shape right, when we call figure.tight_layout()
fig_noice = plt.figure(figsize=(3,2), dpi=500,
frameon=True, # draw the frame
edgecolor="black",# make it a black line
linewidth=.1 #give it non-zero width
)
ax_data = fig_noice.add_subplot()
ax_data.plot(raw_data[:,0],raw_data[:,1:]);
ax_data.set_xlabel("position")
ax_data.set_ylabel("height")
fig_noice.tight_layout()
This looks much better, however, sometimes we really want to use the full size of the figure without any border. We can use the pad
argument of figure.tight_layout()
to reduce the white space around the edges:
fig_noice = plt.figure(figsize=(3,2), dpi=500,
frameon=True, # draw the frame
edgecolor="black",# make it a black line
linewidth=.1 #give it non-zero width
)
ax_data = fig_noice.add_subplot()
ax_data.plot(raw_data[:,0],raw_data[:,1:]);
ax_data.set_xlabel("position")
ax_data.set_ylabel("height")
fig_noice.tight_layout(pad=0)
As a last resort, you can manually set the location of each axes using their set_position()
method. Here, we input the bounding box of the the axes is a list that contains the cordinates of the bottom left corner and the width and height of the axes: [left, bottom, width, height]
in fractions of the figure size.
fig_noice = plt.figure(figsize=(3,2), dpi=500,
frameon=True, # draw the frame
edgecolor="black",# make it a black line
linewidth=.1 #give it non-zero width
)
ax_data = fig_noice.add_subplot()
ax_data.plot(raw_data[:,0],raw_data[:,1:]);
ax_data.set_xlabel("position")
ax_data.set_ylabel("height")
ax_data.set_position([.1, .2, .5, .5])
For fine tuning, you could first call .tight_layout()
, then get the positions using ax.get_position()
and slightly modify those values using .set_position()
.
5.7. Other things to check out in matplotlib¶
5.7.1. Changing styles using style sheets¶
Matplotlib has a kind of style sheet that allows to define plot layouts beforehand and reuse these layouts. The layouts can be activates using plt.style.use
. Built-in styles can be found at https://matplotlib.org/stable/gallery/style_sheets/style_sheets_reference.html
To switch back to the default style, you can use plt.style.use('default')
plt.style.use("fivethirtyeight")
fig = plt.figure()
ax_box= fig.add_subplot()
ax_box.pie([33,33,33],
labels=["for the analysis",
"to feel superior",
"for Clare Malone's hot takes"],
normalize=True)
ax_box.set_title("I listen to fivethirtyeight")
plt.style.use("default")
5.7.2. xkcd mode¶
Matplotlib also has an “xkcd” mode that can be activated using plt.xkcd()
. This mode makes all lines wobbly and replaces the font with a hand written looking font:
with plt.xkcd():
fig = plt.figure()
ax_bar= fig.add_subplot()
ax_bar.bar(x=[-1,0,1], height=[1,2,3])
ax_bar.set_title("Bars of height less than or equal to current bar")
ax_bar.set_xticks([]);
5.7.3. Animations¶
Matplotlib plots can be animated using for example the FuncAnim
class. Here, we define a function that modifies our plot. FuncAnim repeatedly calls this function with arguments taken from a list or iterator. The state of the figure after each call is recorded.
(The rc...
command in the second line makes sure that he animation is output to HTML.)
from matplotlib.animation import FuncAnimation
from matplotlib import rc
rc('animation', html='html5')
Our animation is pretty straight forward: we are going to create an x and y vector corresponding to the coordinates of a spiral. Our animation will be to plot an ever longer segments of that spiral.
For each step of the animation, animfunc
will called with an increments integer. We will use this integer as the end of our slice into the x and y vector and set the data of the line we are drawing to that coordinate:
# create time axis
t = np.linspace(0, 1, 500)
# x and y components of a spiral
spiral_x = t * 8 * np.cos(2*np.pi*10*t)
spiral_y = t * 8 * np.sin(2*np.pi*10*t)
# the animation function
def animfunc(idx):
"""sets line x and y data up to idx"""
line.set_xdata(spiral_x[:idx])
line.set_ydata(spiral_y[:idx])
return (line,)
To run the animation, we create a new plot, add a line to it - by plotting the full vector firsts, we ensure that the axes limits are big enough to hold the final spiral. Then we create the FuncAnimation object, which will start to run as soon as we display the funcanim
.
(fig.clf()
clears the figure so it doesn’t show up under the cell twice.)
# create the figure and add a subplot
fig = plt.figure()
ax = fig.add_subplot()
# create a single line that will be animated
line = ax.plot(spiral_x,spiral_y)[0]
# square axes look nicer here
ax.set_aspect(1)
# create the animation
funcanim = FuncAnimation(fig, animfunc, range(len(t)),interval=50)
#display
display(funcanim)
fig.clf();
<Figure size 640x480 with 0 Axes>
5.8. Summary¶
5.8.1. Loading data:¶
Use np.genfromtxt
to load data from CSV files. If it doesn’t work as expected, check the delimiters. More in the next section.
5.8.2. Indexing¶
Square brackets indicate indexing
:
are used to select ranges and step sizes
5.8.3. Plotting¶
create figure
fig = plt.figure()
create axes
ax = fig.add_subplot()
plot
ax.plot()
and much, much more.fig.savefig()
5.8.4. Getting help¶
The help()
function prints the docstring.