2. And now, Python¶
This is the first lesson in the python for chemists crash course. In this lesson, we will get to know a few important data types and perform some simple calculation.
2.1. Using Python as a calculator¶
The simplest way Python can help you is as a calculator. You can type a calculation into a cell and get results printed in the output cell. Here, you can see the operators for all fundamental math operations.
Addition
Multiplication
Division
Subtraction
Exponentiation
# addition
1 + 1
2
# multiplication
2 * 3
6
# division
2.5 / 5
0.5
# subtraction
10 - 9
1
# exponentiation
2 ** 4
16
2.1.1. Order of operation¶
Order of operation and exponentiation work as you would expect:
** before * and \ before + and -
1 + 2 * 3
7
Braces can be used to change the order of evaluation. The operations inside the braces are evaluated first.
(1 + 2) * 3
9
2.1.2. More math¶
More complicated math operations (e.g. roots) can be imported from a package. For example the built-in math package. The import statement loads an installed package and make it available for use. Functions from that package can be accessed using . notation.
To see, which functions a package provides, you can check the documentation. In jupyter notebook, you can also take a short cut: after importing the package, type in math. and then press the TAB key. A list of all functions, classes and objects in the package will pop up.
import math
math.sqrt(2)
1.4142135623730951
2.2. Variables and variable names¶
As your calculations grow more complex, it might be a good idea to store some intermediary results. This is what variables are for.
You store a value in a variable by using the = assignment operator. The value on the right is then stored in the variable on the left. You are free to choose any variable name you want with a few restrictions:
2.2.1. Storing values¶
<variable_name> = value
2.2.2. Variable naming rules:¶
the name must start on a letter or
_the name may only contain letters, numbers or
_variable names are case sensitive
The code below stores integer values in the variables radius and diameter using the assignment operator =.
radius = 20
When the variable is typed in an expression (not on the left side of an assignment operator), it’s value is read during execution of that expression an used instead of the variable name.
This code looks up what radius currently is, multiplies it with 2 and then stores the result in diameter.
diameter = radius * 2
The value of that variable can be displayed, by putting it on the last line of a cell. This way, we can display the values of radius and diamater.
radius
20
diameter
40
Another option is to call the print() function on a variable anywhere in the code cell:
print(radius)
print(diameter)
20
40
circumference = diameter * math.pi
circumference
125.66370614359172
The value of a variable is assigned when the assignment is executed. If value of the right side is changed afterwards, the value of the variable does not change. diameter here still contains the value of 2 * 10 we assigned above, even though the value of radius has since increased to 20.
radius = 20
diameter
40
3. Data types¶
A computer only stores information in binary format, that means as a series of ones and zeros. Each one or zero is called a bit. A group of eight bits are called a byte (four bits are called a nybble. This is a piece of information you will never ever need again).
The meaning of a series of bytes is entirely up to convention. Several data types have been developed and standardized (e.g. in the C standard) to fulfill most common needs. These data types include storing integers (whole numbers), fractional numbers of a given precision, true/false values, complex values, and text.
3.1. Simple data types¶
3.1.1. Integers¶
The int data type stores whole numbers …, -2, -1, 0, 1, 2, …
Integers are stored in binary format, the 2 based equivalent to the decimal (10 based) system. Hence, when we write the number 14 in decimal, what we actually mean is \( \mathbf{1} \cdot 10^1 + \mathbf{4} \cdot 10^0\). The same number in binary is 1110 which stands for \( \mathbf{1} \cdot 2^3 + \mathbf{1} \cdot 2^2 + \mathbf{1} \cdot 2^1 + \mathbf{0} \cdot 2^0\).
To encode negative numbers, binary uses “twos complement”: A binary number is converted to its negative by first replacing every 0 with a 1 and every 1 with a 0 and then adding 1.
Integer constants are typed in as numbers without a decimal point or exponent e notation.
a = 10
To the determine the data type of a variable the python type() function can be used.
type(a)
int
For every type, that we are going to discuss below, there is also a function (actually a class, but the usage here is the same), that can be used to convert other types. For example, the int function converts other numerical types as well as strings to integers.
The inverted commas ' create a string constant, which the int function then converts to an integer.
int('100001')
100001
3.1.2. Booleans¶
Booleans encode a single 0/1, true/false, yes/no or any other binary choice. They can either take the value of True or False.
Booleans are a fundamental part of controlling program flow (we will get to that later in the next chapter). As a brief preview: we can tell python to execute a block of code if boolean contains the value True and do something else otherwise.
We can also combine booleans using boolean math to combine several True/Falsevalues into a single one.
3.1.2.1. Boolean operators¶
Python (and programming languages in general) knows several operators for booleans (similar to +,-,*,… we know for numeric types). These operators are:
notandor
operator |
what it does |
|---|---|
|
|
|
|
|
|
3.1.2.2. Comparison operators¶
To get boolean values that we need to make decision/control program flow, we need to be able to ask questions about the content of our variables. We do this using “comparison operators”.
Comparison operators return boolean values, but don’t necessarily operate on booleans. These operators behave exactly like you would expect them to from math class:
operator |
function |
|---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
(is and in have been included for completeness sake and will be discussed later)
Boolean operators and comparison operators can be combined. When they are combined, than comparison operators take precedence over boolean operators.
In this code snipped, we check if the content of a variable is greater than 4 and smaller than 6. Both expressions evaluate to True, hence their combination using and also evaluates to True.
a = 5
4 < a and a < 6
True
Math operators have higher priority than either comparison or boolean:
4*a > 19 and 4*a < 21
True
3.1.3. Floats (float type)¶
Fractional numbers are commonly stored as floating point data type. These are types that store a numeric value in the form of an integer (the so called mantissa) and a power of two by which this integer is multiplied (the exponent).
float constants have either a decimal point or an e or E to signify the exponent:
the_number_pi = 3.14
speed_of_light = 2.998E8
The type function returns float
type(the_number_pi)
float
And the constructor/converter is float():
float(4)
4.0
There are two important facts to remember about floating point data types:
floats have limited precision - there are gaps between numbers they can represent
like integers, floats are encoded in binary.
There is a smallest and biggest absolute possible numeric value that a float of a given size can encode (they also don’t grow in size!). The limits of the float type depend a bit on the version of python (OS, processor type) you are using. You find out the limits on your machine them using sys.float_info
We import the sys package and display the float_info structure:
import sys
sys.float_info
sys.float_info(max=1.7976931348623157e+308, max_exp=1024, max_10_exp=308, min=2.2250738585072014e-308, min_exp=-1021, min_10_exp=-307, dig=15, mant_dig=53, epsilon=2.220446049250313e-16, radix=2, rounds=1)
float_info contains the maximum (max) and minimum (min) absolute value that can be encoded as well as the maximum exponent in binary and decimal (max_exp, max_10_exp). eps is the distance from the float 1.0 to the next largest float, this is called the relative precision.
The binary encoding and the limited precision taken together can have an interesting effect, which we can see by checking for equality of 3*0.1 and 0.3.
3*0.1 == 0.3
False
Of course your would expect these two expressions to be equal. What is happening?
The number 0.1 is a recurring fraction in binary and therefore can’t be accurately converted to binary. At some point, the number is truncated, incurring a rounding error. If we convert these numbers back to decimal, they are slightly off, which we see when we use the format function to convert the binary representation back to a decimal one:
print(format(3 * 0.1, '.17f'), format(0.3, '.17f'))
0.30000000000000004 0.29999999999999999
It follows that checking for equality of floats is not entirely straight forward do to the limited precision and the chance to incur tiny errors in our calculations. Instead of checking for equality, we check if they are close to a certain tolerance, e.g. using math.isclose().
import math
math.isclose(3 * 0.1, 0.3)
True
3.1.4. complex¶
Complex numbers use the complex type. Complex literals notation uses j to denote the imaginary part. In their internal representation, the complex numbers are stored as two floats, one for the real part and one for the imaginary part.
num = 3 + 5j
num
(3+5j)
The real and imaginary parts of a complex are accessed by the .real and .imag attributes:
num.real
3.0
num.imag
5.0
3.2. Compound data types¶
are data types that consist of smaller subunits.
3.2.1. Strings (str)¶
are used to store series of symbols and texts. str constants are denoted by double single quotes '...' or double double quotes "...".
They can be concatenated using the + operator.
a = "Hello "
b = "World!"
a + b
'Hello World!'
Strings can also be “multiplied” by an integer. This replicates the string multiple times:
a = "la"
3*a
'lalala'
The length of a string (and many other objects) is returned by the len() function.
len("lalala")
6
3.2.1.1. Type conversion¶
However, python does not know how to convert str to a numerical type without some input. Hence, when we tell it to add a number and a string, it will throw an error. For example, here we are trying to add the integer one and the string '1'.
1 + '1'
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-35-7ff5cb60d31b> in <module>
----> 1 1 + '1'
TypeError: unsupported operand type(s) for +: 'int' and 'str'
We need to tell python how to convert, using converter functions such as int or float:
1 + int("1")
2
Similarly, there are different way to convert numbers to strings. The most versatile being the string .format method. Here, {}(curly braces) are used in a string as place holders. The values passed to the .format are then converted to strings and inserted instead of the place holder. Inside the place holders, we can fine tune the formatting. Python documentation shows available format specifications (link).
In the example below, we format the float to two digits past the decimal point:
"{} * pi = {:.2}".format(2, 2*math.pi)
'2 * pi = 6.3'
3.2.1.2. Indexing¶
Strings are the first type we encounter, that support indexing. Indexing is used to pick one or several elements out of a longer list of values. The syntax for indexing are square brackets placed next to a variable name enclosing the index of the selected element as an integer.
Let’s first create a string.
a = "The quick brown fox jumps over the lazy dog!"
The left most element has the index 0.
a[0]
'T'
The right most element has the index len(a) - 1 or just -1 as a shortcut. Any negative index will start to count from the back of the string.
a[-1]
'!'
Using : between two indices returns multiple elements. This is called slicing. The value corresponding to the first value is included in the slice, the last value isn’t:
b = "0123456789"
b[1:7]
'123456'
The first element we select (1) has the index 1 because we start counting at 0. The character “7” in this string has the index 7 (what a coincidence). Because the end of the range is not included, it is not selected in the slice.
A “missing” number before or after the first : is interpreted as start or the end of the list, respectively.
b = "0123456789"
When we just set end, then python selects everything from the first element on:
b[:2]
'01'
When we only set the beginning (and use a colon to indicate that we are slicing and not just indexing) then python selects everything starting from the given number:
b[2:]
'23456789'
A colon without any indexes select everything:
b[:]
'0123456789'
Slicing can also take steps wider than the default of 1. The step length entered after a third, optional, colon.
b = "0123456789"
b[0:10:2]
'02468'
Here we start slicing a the index 0, then take steps of size 2 to select additional elements, 0, 2, 4, 6, 8 until the full step width doesn’t fit into the remaining elements anymore.
We can also use negative step widths. This inverts the string.
b[::-1]
'9876543210'
When negative numbers are used as index, they are a considered to start the count from the back end of the string. Thus, when we start the count at -3, in our list of digits, we start counting backwards from the last character, “0”. In the example here, we start counting at -3, that is the 9 - 8 - 7 and end counting at -1, which is the character “9”, but since the last element is not included in a slice, only 7 and 8 are selected.
b = "0123456789"
b[-3:-1]
'78'
Slices and indexes can be replaced with int variables and keep on working the same way as if we were using integer constants, as the following example shows.
Here, we define three integer variable (start, stop and step) and use them for slicing the string.
b = "0123456789"
start = 3
stop = 8
step = 2
b[start:stop:step]
'357'
Finally, the the in and not in operators we had previously mentioned work in strings. They are used to check if a given character or substring is included in a string. If a given sub string is part of a string, in evaluates to True:
"test" in "this is a test"
True
And, on the flip side not in return true, if a value is not included in a string.
"an" not in "this is a test"
True
in and not in can be generally used to test membership of an element in a larger collection of elements.
In the case of strings, this only concerns characters and pieces of text, but for the data types we are looking at next, this could be pretty much anything.
3.2.2. tuples¶
Tuples can contain multiple elements, however, in contrast to strings, the type of the elements in the tuple can be arbitrary.
Tuples are created using (...) or tuple(...). Individual elements are separated by , commas.
tupi = (1,2, "abc", 2)
Indexing and slicing work the same as we saw for string the previous sections. A single integer in square brackets selects one element, while the slicing syntax with colons returns a list of selected items.
A single integer index, returns one element:
tupi[2]
'abc'
Using the slicing syntax with colons returns a tuple with multiple elements:
tupi[:3]
(1, 2, 'abc')
And like we saw with strings, we can use the + operator to concatenate two tuples. This return one larger tuple with all elements kept in the same order:
tupi = (1,2, "abc", 2)
tupi + ("some", "more", "items")
(1, 2, 'abc', 2, 'some', 'more', 'items')
We can also put the content of a variable inside a tuple. Like in any other case, where the variable is not on the left side of an assignment, the value of the variable is read and put into the the tuple, not the variable itself:
a_value = 1000
another_value = 1001
mytuple = (1, a_value, another_value)
mytuple
(1, 1000, 1001)
Finally, the sorted function returns a sorted version of the tuple.
Warning
This only works, if python actually knows how to compare different elements in the tuple.
tupi2 = (1, 0.1, 10, -1, 0, 100)
sorted(tupi2)
[-1, 0, 0.1, 1, 10, 100]
3.2.3. list¶
Lists are very similar to tuples, except for one important difference: they are mutable. That means, they can be changed after they’ve been created.
To create a new list, you can use square brackets [], again individual items are separated by commas.
a = []
a
[]
Like tuples, lists can contain any type.
b = [1,2,"a", "y", "coconut", .5]
b
[1, 2, 'a', 'y', 'coconut', 0.5]
It is also possible to create a new list by concatenating two lists using + or repeating them using *.
a = [ "element in the other list"]
b + a
[1, 2, 'a', 'y', 'coconut', 0.5, 'element in the other list']
The content of lists can be changed after creation by adding or removing elements. The append method adds an element at the end, the insert method inserts the element at a given index and pop removes the last element and returns it.
First, we .append the element “banana” to the end of the list:
b.append("banana") # add one element at the end of the list
b
[1, 2, 'a', 'y', 'coconut', 0.5, 'banana']
Then we insert the element “inserted” as at index three into the list. This shifts the elements with index three or more backwards:
b.insert(3, "inserted") # insert one element at index I
b
[1, 2, 'a', 'inserted', 'y', 'coconut', 0.5, 'banana']
Finally, we .pop the last element - “banana” - from the list and store it in the variable popped. The list is now shorter:
popped = b.pop() # remove and return one item
b
[1, 2, 'a', 'inserted', 'y', 'coconut', 0.5]
And variable popped contains the removed element:
popped
'banana'
Slicing of a list returns another list. You can see this is in the snipped below, when we call type the returned object.
c = [1,2,3,4,5,6]
l1 = c[3:5]
l1, type(l1)
([4, 5], list)
However, indexing returns the element itself. The element at index 5 in the list c is a an int. Hence, when we call the type function on it, the returned type is int.
c = [1,2,3,4,5,6]
o1 = c[5]
o1, type(o1)
(6, int)
For lists, indexing can also be used on the left side of an assignment. In that case the indexed element is replaced with the element on the right side of the assignment operator.
The snipped below creates a list of numbers, then replaces the number at index 0 with another element and finally displays the list:
list_of_numbers = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
list_of_numbers[0] = "replaced element"
list_of_numbers
['replaced element', 1, 2, 3, 4, 5, 6, 7, 8, 9]
When you assign a mutable object to a variable, the variable now points to that list. Thus, when you assign the content of that variable to another variable, both now point to the same list. This provides some chances for exciting, hard to find bugs in programs.
Here is an example of that behavior: First, we create a list and assign it to the variable list_1. Then we assign the content of list_1 to list_2
list_1 = [1,2,3,4]
list_2 = list_1
Both list_1 and list_2 now point to the same list.
list_1
[1, 2, 3, 4]
list_2
[1, 2, 3, 4]
When we change one of the elements of list_2 the change is mirrored in list_1.
list_2[2] = "bananas"
list_2
[1, 2, 'bananas', 4]
list_1
[1, 2, 'bananas', 4]
Since list_1 and list_2 contain the same list, changing either one, changes the other. However, if a new object is assigned to one of the lists, the change is not mirrored, since they now point to different objects.
list_2 = ["bananas", "coconut", "noodles"]
list_2
['bananas', 'coconut', 'noodles']
list_1
[1, 2, 'bananas', 4]
Do you still remember the is comparison operator that we met right at the beginning of this lesson? It tells us if two variables contain the same object. One use for it is to check if two lists just contain the same elements or are actually the same list.
Here, we create two distinct lists that have the same content, list_1 and list_2. We also create a variable that points to list_1 called list_1_again.
list_1 = [1,2,3]
list_2 = [1,2,3]
list_1_again = list_1
When we use the equality operator == on any combinations of the the lists we just created, it will always return True, as they all have the same length and the elements at each of the indices are equal.
list_1 == list_2
True
list_1 == list_1_again
True
However, the is operator only returns True for list_1 and list_1_again since they point to the same object, not just two equal objects.
list_1 is list_2
False
list_1 is list_1_again
True
3.2.4. dicts¶
dictionaries use nearly arbitrary values to select arbitrary values. Keys can be any immutable data type (numbers, strings, tuples). For numbers equality rules are used (so 1 is the same as 1.0).
dict can be instantiated using either the dict() function or using curly braces and key - colon - value syntax:
d = {<key1>:<value1>, <key2>:<value2>,...}
a = {1:"hello", "1":"world", "eins":"!"}
a
{1: 'hello', '1': 'world', 'eins': '!'}
Dictionaries can be indexed like lists and tuples. However, selection here is done via the keys of the dictionary, not via a continuous series of integers.
a[1]
'hello'
When we use numerical keys, then python checks for equality. Hence the int 1 and the float 1.0 select the same element.
a[1.0]
'hello'
The str “1” does not:
a["1"]
'world'
Items are removed from dictionaries using del. Here first we create a dictionary a, then display its contents.
a = {1:"hello", "1":"world", "eins":"!"}
a
{1: 'hello', '1': 'world', 'eins': '!'}
After calling the del statment on the element at index 1 that element is removed from the dictionary:
del a[1]
a
{'1': 'world', 'eins': '!'}
A final, important piece of information is, that only immutable types can be used as key in a dictionary. This includes str and tuple. Mutable types can’t be used for that purpose.
3.2.5. Another look at indexing, or: {(1,2,3),:[0,1,2][0], "blub":tuple("ftw"[:])[::-1]¶
Try to take a second to think what the output of the following line of code could be: {(1,2,3),:[0,1,2][0], "blub":tuple("ftw"[:])[::-1] (no cheating: don’t copy it into a notebook to check what it does).
Were you able to figure it out? I wasn’t.
There are two potential sources of confusion here:
Python re-uses square brackets to create lists and to index.
It also allows to index something that has been returned by an indexing operation, hence:
listception[0][-1][::2]is valid python code (and also not that uncommon - on the other hand herpes is also not that uncommon, so make of that what you will).
Let’s tackle one problem after the other: how does python decide if [1]creates a list with one element - 1 or selected the 1st element from a list. The distinction is that if the next statement on the left side of the left square bracket [ is either a variable or something that could be put into a variable python will try to index, otherwise it will create a list.
So, the following piece of code create two lists (nothing that could be put in a variable to the left of the square brackets:
[1,2,3] + [4,5,6]
[1, 2, 3, 4, 5, 6]
While this piece of code tries to index a variable (no matter that is not defined):
is_not_defined[0]
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
<ipython-input-8-33309220553b> in <module>
----> 1 is_not_defined[0]
NameError: name 'is_not_defined' is not defined
And, finally, this beauty creates a list (“something that could be put into a variable”) and then indexes it:
["this", "is", "a", "new", "list"][0]
'this'
While indexing a list that you just created might not make a lot of sense, indexing a dict that was just created sometimes does. For example, if you want to turn a day of the week into a number, you could write:
day_string = input("Please input the name of the day: ")
day_number = {"Monday":0, "Tuesday":1, "Wednesday":2, "Thursday":3, "Friday":4, "Saturday":5, "Sunday":6}[day_string]
day_number
Please input the name of the day: Sunday
6
We will see another way to do the same thing in the next section, but this construct using a dict makes sense if it makes your code more readable and shorter.
Now for the second question: indexing something that has already been indexed. First, let’s create a turducken of dictionaries:
turducken = {"turkey":{"duck":{"chicken":"filling"}}}
turducken
{'turkey': {'duck': {'chicken': 'filling'}}}
The dictionary turducken contains a key “turkey” that points to a dictionary. That dictionary contains a single key called “duck” which points at another dictionary, which contains key “chicken” that points a the string “filling”. If we want to get to the filling, we have to work our way in from the outside. To get to “duck”, we select “turkey” from turducken
turducken["turkey"]
{'duck': {'chicken': 'filling'}}
We can either store the result in a variable, like so:
probably_still_a_sign_that_cuisine_has_gone_to_far = turducken["turkey"]
probably_still_a_sign_that_cuisine_has_gone_to_far
{'duck': {'chicken': 'filling'}}
And then index again to go further into this culinary crime against humanity:
probably_still_a_sign_that_cuisine_has_gone_to_far["duck"]
{'chicken': 'filling'}
On the other hand, if we want to go right to the filling, we don’t need to take the additional step of storing the intermediate values in a variable. Instead, we can just chain several indexing operations:
turducken["turkey"]["duck"]["chicken"]
'filling'
dict_of_dict_of_lists = {"inner_dict_1": [1,2,3,4,5],
"inner_dict_2": ["a", "b", "c"],
"inner_dict_3": [(1,2), (9,10), "bananas"]}
3.2.6. dicts and lists¶
Python dicts are great way to store info a nicely ordered fashion. For example, if we want to keep track of chemical flasks in our cabinet, we might be interested in recording the name of the chemical, the total volume and the remaining volume.
cabinet = [{"chemical":"ethanol 30%",
"volume_L":1,
"remaining_volume_L":.1},
{"chemical":"peroxide",
"volume_L":.2,
"remaining_volume_L":.2},
{"chemical":"methanol p.a.",
"volume_L":10,
"remaining_volume_L":1},]
3.3. Summary¶
1. data types
We have now seen the most important data types that python knows without additional packages. The recount, these were
boolintfloatcomplexstrlisttupledict
2. mutable vs. immutable
We have seen the important distinction between mutable types that can be changes after creation and immutable types that cannot.
mutable
Can be changed after creation
Can’t be used as key for dictionary
Beware of multiple variables containing the same object
listdict
immutable
Can’t be changed after creation
Can be used as key for dictionary
boolintfloatcomplextuplestr
This concludes the first chapter in this crash course. In order to be ready for your first exercises, you should have a look at the second chapter on flow control as well.