All content here is under a Creative Commons Attribution CC-BY 4.0 and all source code is released under a BSD-2 clause license.
Please reuse, remix, revise, and reshare this content in any way, keeping this notice.
This is the first module of several (11, 12, 13, 14, 15 and 16), which refocuses the course material in the prior 10 modules in a slightly different way. It places more emphasis on
In short: *how to extract value from your data*.
This is the first of 6 modules. We cover
Requirements before starting
In all the cases below, we show an example. Copy these into the empty cell below, edit the code where necessary, then hit the Run button (or Ctrl-Enter).
print('Hi, my name is ______.')
long_string = """If you really want to write paragraphs,
and paragraphs of text, you do it with the triple quotes. Try it"""
print(long_string)
long_string
will be printed. Does Python put a line break where you expect it?
You can also create longer strings in Python using the bracket construction. Try this:
print('Here is my first line.',
'Then the second.',
'And finally a third.',
'But did you expect that?')
The reason for this is stylistic. Python, unlike other languages, has some recommended rules, which we will introduce throughout these modules. One of these rules is that you don't exceed 79 characters per line (more recently we see source code going to 99 characters per line as a guide).
It helps to keep your code width narrow: you can then put two or three code files side-by-side on a widescreen monitor.
We already saw above how a variable was created: long_string = """If you really..... Try it."""
.
You've created variables plenty of times in other programming languages; almost always with an "=". We prefer to refer to "=" as the "assignment" operator; not as "equals".
What goes on the left hand side of the assignment must be a 'valid variable name'.
Which of the following are valid variable names, or valid ways to create variables in Python?
my_integer = 3.1428571
_my_float = 3.1428571 # variables like this have a special use in Python
__my_float__ = 3.1428571 # variables like this have a special use in Python
€value = 42.95
cost_in_€ = 42.95
cost_in_dollars = 42.95
42.95 = cost_in_dollars
dollar.price = 42.95
favourite#tag = '#like4like'
favourite_hashtag = '#일상'
x = y = z = 1
a, b, c = 1, 2, 3 # tuple assignment
a, b, c = (1, 2, 3)
i, f, s = 42, 12.94, 'spam'
from = 'my lover'
raise = 'your glass'
pass = 'you shall not'
fail = 'you will not'
True = 'not False'
pay day = 'Thursday'
NA = 'not applicable' # for R users
a = 42; # for MATLAB users: semi-colons are never required in Python
A = 13 # like most languages, Python is also case-sensitive
What's the most interesting idea/concept you learned from the above examples?
Do you know C, C++ or Java? With those languages each variable must have a type
, which is must match what is on the right hand side of the "=" sign. In these languages, you must write something like:
int a, b; // first declare your variables
float result;
a = 5; // then you get to use them
b = 2;
result = a / b; // you will get an unexpected value if you had defined "result" as "int"
It is different in Python, where there is dynamic typing. Python figures it out from the context:
a = 5
b = 25.1
result = a / b
Repeat these lines of Python code below, then add the following:
type(a)
type(result)
What is the output you see?
Each variable always has a type. Usually you know what the type is, because you created the variable yourself at some point.
But on occasion you use someone else's code and you get back an answer that you don't know the type of. Then it is useful to check it with the type(...)
function.
Try these lines in Python:
type(99)
type(99.)
type(9E9)
type('It\'s raining cats and dogs today!') # How can you rewrite this line better?
type(r'Brexit will cost you £8E8. Thank you.')
type(['this', 'is', 'a', 'vector', 'of', 7, 'values'])
type([])
type(4 > 5)
type(True)
type(False)
type(None)
type({'this': 'is what is called a', 'dictionary': 'variable!'}) # we learn about dictionaries later
type(('this', 'is', 'called', 'a', 'tuple')) # tuples are another data type in Python
You can convert most variables to a string type, as follows: str(...)
Try these conversions to make sure you get what you expect:
str(45)
type(str(45))
str(92.5)
str(None)
str(print)
The next step is to perform some calculations with the variables.
The standard expressions exist in Python:
Operation | Symbol |
---|---|
Addition | + |
Subtraction | - |
Multiplication | * |
Division | / |
Power of | ** |
Please note: "power of" is not with the ^ operator, and can mislead you. Try this:
print(2 ^ 4)
print(2**4)
Given the above, use Python as a calculator to find the values of these expressions:
If a = 5
and b = 9
a / b
a * b
The distance $d$ travelled by an object falling for time $t$, given in seconds, is $$d=\frac {1}{2}gt^{2}$$ where $g$ is the gravitational constant = $9.8\, \text{m.s}^{-2}$. Calculate the distance that you will travel in free-fall gravity in 10 seconds:
t = ____ # seconds
d = ____ # meters
print('The distance fallen is ' + str(d) + ' meters.')
# The better way to do the above in recent versions of Python is to use an "f-string" (format string):
print(f'The distance fallen is {d} meters after {t} seconds.')
Try it now the other way around: the time taken for an object to fall is: $$ t= \sqrt {\frac {2d}{g}}$$
We will introduce the sqrt
function in the next section, but for now you can also calculate the square root using a power of 0.5: as in $\sqrt{x} = x^{0.5}$.
Using that knowledge, how long will it take for an object to fall from the top of the building you are currently in:
# Creates a string value in variable 'd'. Verify that it is a string type.
d = input('The height of the building, in meters, which I am currently in is: ')
d = float(d) # convert the string variable to a floating point value
t = ____ # seconds
# You might also want to investigate the "round" function at this point
# to improve the output for variable t.
print('The time for an object to fall from a building',
'of ' + str(d) + ' meters tall is ' + str(t) + \
' seconds.')
Python, like other languages, has the order of operations rules (same as the PEMDAS rules you might have learned in school):
So what is the result of these statements?
a = 1 + 3 ** 2 * 4 / 2
b = 1 + 3 ** (2 * 4) / 2
c = (1 + 3) ** 2 * 4 / 2
While it is good to know these rules, the general advice is to always use brackets to clearly show your actual intention.
Never leave the reader of your code guessing: someone will have to maintain your code after you; including yourself, a few years/months later 😉
Test yourself: Write code for the following:
Divide the sum of a and b by the product of c and d, and store the result in x.
You can start with the code below, and edit it:
a, b, c, d = 2, 3, 5, 6
# write your code here
x = _
print(x)
The above operators return results which are either int
or float
.
There are another set of operators which return *bool*ean values: True
or False
. We will use these frequently when we make decisions in our code. For example:
if __<condition> __ then __<action>__
We cover if-statements in a later module:
But for now, try out these <condition>
statements:
3 < 5
5 < 3
4 <= 4
4 <= 9.2
5 == 5
5. == 5 # float on the left, and int on the right. Does it matter?
5. != 5 # does that make sense?
True != False
False < True
Related to these operators are some others, which you can use to combine up: and
and not
and or
Try these out. What do you get in each case?
True and not False
True and not(False)
True and True
not(False) or False
In the quadratic equation $$ax^2 + bx + c=0$$ the short-cut solution is given by $$ x= -\frac{b}{2a}$$ but only if two conditions are met: $b^2 - 4ac=0$ and $a \neq 0$.
Verify if you can use this short-cut solution for these combinations:
a, b, c = 3, -1, 2 # using tuple-assignment here to create these 3 variables in 1 line of code!
a, b, c = 0, -1, 2
a, b, c = 3, 6, 3
Write the single line of Python code that will return True
if you can use the shortcut, or False
if you cannot.
You will certainly need to calculate logs, exponentials, square roots, or require the value of $e$ or $\pi$ at some point.
In this section we get a bit ahead, and load a Python library to provide this for us. Libraries - we will see later - are a collection of functions and variables that pre-package useful tools. Libraries can be large collections of code, and are for special purposes, so they are not loaded automatically when you launch Python.
In MATLAB you can think of Toolboxes as being equivalent; in R you have Packages; in C++ and Java you also use the word Library for the equivalent concept.
In Python, there are several libraries that come standard, and one is the math
library. Use the import
command to load the library. The math
library can be used as follows:
import math
radius = 3 # cm
area_of_circle = math.pi * radius**2
print('The area of the circle is ' + str(area_of_circle))
Now that you know how to use the math
library, it is worth searching what else is in it:
All built-in Python libraries are documented in the same way. Searching this way usually brings up the link near the top. Make sure you look at the documentation for Python version 3.x.
Now that you have the documentation ready, use functions from that math
library to calculate:
a = 3.7
b = 3.7
c = -2.9
d = 100
e = 100
math
library to calculate this!] (math.sqrt
, math.pi
, math.exp
and math.pow
.)<put your code here>
print('The true value of 9! is ' + ___ + ', while the Stirling approximation is ' + ___)
g
= $2\pi$ radians is indeed 1.0
The population of a country could be approximated by the formula $$ p(t) = \dfrac{197 273 000}{1 + e^{− 0.03134(t − 1913)}}$$ where the time $t$ is in years.
We will cover creating, adding, accessing and using lists of objects.
A list is a basic Python type: it is a collection of objects.
Create a list with the square bracket characters: [
and ]
.
For example: words = ['Mary', 'loved', 'chocolate.']
Try it: one of the most useful functions in Python is len(...)
. Verify that it returns an integer value of 3. Does it have the type you expect?
The entries in the list can be mixed types (contrast this to most other programming languages where all entries in the list must have the same type!)
group = ['yeast', 'bacillus', 994, 'aspergillus' ]
An important test is to check if the list contains something. Try these pieces of code below.
'aspergillus' in group
499 in group
Like we saw with strings, you can use the *
and +
operators:
group * 3
group + group # might not do what you expect!
group - group # oooops
And like strings, you refer to them based on the position counter of 0:
group[0]
# but this is also possible:
group[-3]
# however, is this expected?
group[4]
Lists, also have have some methods that you can use. Lists in fact have far fewer methods than strings. To get a list of methods:
dir(___) # and then fill in an example of the object you want to know the methods
dir('sometext')
dir([]) # even an empty list is OK
How many methods do you see which you can apply to a list?
Let's try a few of them out:
append
a new entry to the group
list you created above: add the entry "Candida albicans"reptiles = ['crocodile', 'turtle']
and then try: group.extend(reptiles)
.crocodile
entry from the list. Print it again to verify it succeeded.group.reverse()
, and print the group
variable to the screen.group = group.reverse()
and print the group
variable to the screen. What happened this time?group = ['yeast', 'bacillus', 'aspergillus' ]
and try group.sort()
. Notice that .sort()
, like the .reverse()
method operate in-place: there is no need to assign the output of the action to a new variable. In fact, you cannot.group = ['yeast', 'bacillus', 994, 'aspergillus' ]
; and now try group.sort()
. What does the error message tell you?
Lists behave like a stack, or a queue: you can add things to the end of the queue using .append()
and you can remove them again with .pop()
.
Think of a stack of plates: last appended, first removed.
Try it:
species = ['chimp', 'bacillus', 'aspergillus']
species.append('hoooman')
first_out = species.pop()
print(first_out)
arachnid
between chimp
and bacillus
using the .insert()
command. Print the list to verify it.If you don't know how to use the
.insert()
method, but you know if exists, you can typehelp([].insert)
at the command prompt to get a quick help. Or you can search the web which gives more comprehensive help, with examples.
.index()
function to find the index of "bacillus". Then use the .pop()
method to remove it. In other words, do not directly provide .pop()
the integer index to remove. Assign the popped entry to a new variable.
Comments are often as important as the code itself. But it takes time to write them.
Comments should be added in these places and cases:
The choice of variable names is related to the topic of comments. In many ways, the syntax of Python makes the code self-documenting, meaning you do not need to add comments at all. But it definitely is assisted by choosing meaningful variable names:
for genome in genome_list:
command_to_do_something_with_genome_goes_here
This quite clearly shows that we are iterating over the all genomes in some iterable (it could be a list, tuple, or set, for example) container variables of sequenced genomes.
Now compare it with this code:
for k in seq:
<do something with k>
It is not clear what k
represents. It is also not clear what seq
is either. Choosing good variable names helps the reader.