All content here is under a Creative Commons Attribution CC-BY 4.0 and all source code is released under a BSD-2 clause license.
Please reuse, remix, revise, and reshare this content in any way, keeping this notice.
We cover the following topics here:
On the side we will cover some aspects of debugging.
Also recommended:
"my_string"[0:5]
"my_string "[5:100:2]
list_A + list_B
?output
after this command: output = my_list.reverse()
?[9, 7, 5, 3, 2]
In the prior module we focused on lists and strings separately. We also saw they have a lot in common, in terms of behaviour. But there was one key difference: lists are mutable, and strings are immutable (unchangable).
Some things are just more intuitive with lists. For example, if we have:
my_string = 'A long sentence, with text and faulty spacing.'``
How many words are in that sentence? It is easier if you can convert it to a list.
String to list
my_string = 'A long sentence, with text and faulty spacing.'``
string_as_list = my_string.split(' ')
print('There are {} words in the string.'.format(len(string_as_list)))
The .split(...)
method is extremely useful, but we did not cover it last time. You can split on any character, or characters. Try it:
my_string.split() # mmmmm, what does this do?
my_string.split('e')
my_string.split('en')
List back to a string
Well to recombine a split string, you can use the ''.join()
method. Notice that I wrote it as ''.join()
, indicating it is a method for string types.
Try these lines of code one-by-one, and ensure you can figure out what the join
method does.
my_list = ['Divided', 'we', 'fall,', 'but', 'united', 'we', 'stand.']
print(''.join(my_list))
print(' '.join(my_list))
print('\n'.join(my_list))
print('\t'.join(my_list))
print(')('.join(my_list))
Which one of the above are you most likely to use the most?
Pro tip: if you are generating an automated report, you can build up your report paragraph by paragraph:
report = []
report.extend(...) # add a new section
report.extend(...) # add the next section
...
# Convert the whole report to a (long) string
print('\n'.join(report))
Like in other languages, Python also has the ability to create branches in the code.
if __<condition> __ then __<action>__
They can also have an else
part:
if __<condition> __ then __<action>__ else __<some other action>__
Or even multiple if else
checks. These are the equivalent of the switch
or case
constructions found in other languages.
Indentation is important, as shown in this example.
slope = ... # some code goes here to calculate the slope
if slope > 0:
sign_of_slope = 'positive'
elif slope < 0:
sign_of_slope = 'negative'
else:
sign_of_slope = 'zero'
print('The slope was observed to be {}.'.format(sign_of_slope))
Note: you can have zero or multiple elif
sections in this if-else ladder. The else
part, if required, must go at the end of the ladder.
Use the above code to create a prototype for a robotic system which will automatically titrate a solution to a neutral pH, depending on the value of variable pH
. Print the appropriate string, depending on the condition:
Run your code for some/all of these values of pH values to ensure all branches in your code are working as you expected: -4.2, 0, 3.721, 5.5, 5.500001, 8.5, 10.98765, 14, 140
This is called code testing. In the *Advanced section* of the course we will come back to formal methods of code testing.
In the prior module we were writing code to automatically write a report for us. The code generated this output:
The regression trend of 45.9 mg/day was detected for this product, with a p-value of 0.00341. This indicates that there is a rising trend over time.
Again, use the above code as starting point, but add to it. At the end, the code should be able to produce all 4 variants of the outputs shown below, depending on the value of slope
and p_value
.
slope
is either considered to be rising or falling.p_value
greater than 0.20 requires that an extra phrase be added.Variant 1: The regression trend of 12.4 mg/day was detected for this product, with a p-value of 0.0141. This indicates that there is a rising trend over time, which indicates an important influence.
Variant 2: The regression trend of 12.4 mg/day was detected for this product, with a p-value of 0.425. This indicates that there is a rising trend over time, but it likely has no impact on the system.
Variant 3: The regression trend of -5.2 mg/day was detected for this product, with a p-value of 0.142. This indicates that there is a falling trend over time, which indicates an important influence.
Variant 4: The regression trend of -5.2 mg/day was detected for this product, with a p-value of 0.209. This indicates that there is a falling trend over time, but it likely has no impact on the system.
Check that your code correctly produces the output when:
slope = 0.00542
and p_value = 0.0419
slope = -521
and p_value = 0.2000001
We will cover simple reading from a text file here. The general, simple way to read a file containing any regular text is:
filename = "myfile.txt"
f = open(filename, "r")
all_lines_as_string = f.read()
print(type(all_lines_as_string))
# Do something with the list ``all_lines``. One entry per line.
# Close the file afterwards:
f.close()
Try these steps:
Go to the same directory as where your Python script is being saved.
Create a new text file, called myfile.txt
, and write more than one line of text to your file.
Save your text file.
Run the above code, modifying it so that it will:
all_lines_as_string
, in uppercase.all_lines_as_string
?Change the above code so that the second line contains: all_lines_as_list = f.readlines()
Verify that all_lines_as_list
is indeed a list.
Write a for-loop that prints the length of each line:
Line 1 has __ characters
Line 2 has __ characters
etc
The above is a good start with files, but there are some shortcomings which we can improve on:
myfile.txt
in the same directory as where you are running Python.myfile.txt
to a different directory on your computer.full_filename
variable :base_folder_mac_or_linux = '/users/home/yourname'
base_folder_windows = r'C:\Users\home\yourname' # do you remember why we use the r'...' string?
filename = 'myfile.txt'
import os
full_filename = os.path.join(base_folder_windows, filename)
f = open(full_filename, "r")
# Do something with variable ``f``
f.close()
base_folder_mac_or_linux
or base_folder_windows
variable and modify it, based on the type of computer you are working on.full_filename
is indeed what you expect it to be: the full path name to your text file.We still have not solved the last shortcoming, regarding closing the file. Read this Stackoverflow page on why you should close files. Python can automatically close the file for you, if you use this structure shown below. Notice the code is essentially the same, expect you replace one line, and remove the f.close()
statement.
# This is the preferred way to open and use files in Python
base_folder_mac_or_linux = '/users/home/yourname'
base_folder_windows = r'C:\Users\home\yourname' # why the r'...' string?
filename = 'myfile.txt'
import os
full_filename = os.path.join(base_folder_windows, filename)
with open(full_filename, "r") as f:
# Do something with variable ``f``, for example:
file_contents = f.readlines()
# What "type" is ``file_contents``?
# Other statements go in the with block, if required.
# The file will be closed at this point, when the ``with``
# block is exited.
Copy and paste your file handling code above, and modify it to use the with
block instead.
Sometimes in our code we want to check if a variable has a certain value, or a certain condition, before we continue with the rest of the code.
Try this: print(4/0)
and you will get a ZeroDivisionError
. We say that Python has thrown an error.
Try throwing another error:
import math
pH = -0.024
log_pH = math.log(pH)
and you will get an error, specifically an error of the type called ValueError
.
[1, 2, 3].pop(4)
will throw which type of error? Does it make sense that that error is thrown?There are several ways to deal with these errors. The crudest is to simply stop the program if a condition is not met.
pH = 2.87
assert(pH > 0)
pH = -0.024
assert(pH > 0)
What difference do you see in the output between the two assert(...)
statements?
A more sophisticated way to deal with this is to create an if-else
branch in your code. In the above examples you can:
.pop(4)
.
Writing an if-else for every anticipated situation can be messy, and lead to slower code.
A cleaner way to deal with this is with a try-except structure. You try to run the instructions, and should an exception occur, then you catch the error that was thrown by the software.
import math
try:
# Some code goes here that calculates or gets the pH value
pH = -0.024
log_pH = math.log(pH)
print('The logged pH is {}'.format(log_pH))
except ValueError as error:
print('Cannot calculate a negative log.')
print('Python returned this error: {}'.format(error))
All the code in the try block will be executed and completed if no exceptions occur. The code in the except block will only run if the exception is triggered.
Try it yourself
Try writing a try-except block for the situation where you calculate the roots of the quadratic equation: $ax^2 + bx + c = 0$. In other words, what are the values of $x$ that set the equation equal to zero. We saw this in a prior session.
$$x={\frac {-b\pm {\sqrt {b^{2}-4ac\ }}}{2a}}$$Write the code, using a try-except structure, to calculate the value of $x$ for these 2 situations:
# this will go through the 'try' block
# this will go through the 'except' block
Another try-except exercise, but this time for files.
try:
with open('non-existent-file.txt') as file:
read_data = file.read()
except FileNotFoundError as error_nofile:
print("Could not open file: {}.format(error_nofile))
Using try-except structures in your code makes your code robust.
For later, read this page, https://realpython.com/python-exceptions/, which nicely shows how to extend your knowledge, and use try-except-else-finally
structures.
Next you should proceed to do either Challenge 1, or Challenge 2 (*or both!*).
After that you should complete Challenge 3. Challenge 4 is a variation of Challenge 3, which you can easily complete as well.
Several websites provide random DNA sequences that you can use in your code. We would like to calculate various statistics on the DNA string, e.g. dna_string = 'CGAGATCAGATACGATTCTTATATTCTCAATGAGGAGCCAT'
.
It is always a good tip to develop your code on some input that you already know the answer to. So we will create a string of 4000 entries, with approximately 1000 C, G, T and A bases in the string. Then we know the above 5 statistics should approximately be:
To create our "known input" we will use the numpy
library in Python. We will see this library later on, but for now you can just use this code as-is:
import numpy as np
bases = ['C', 'G', 'T', 'A']
length = 4000
# Uniformly select letters from the above list (creates a balanced sample)
dna_list = np.random.choice(bases, length).tolist()
dna_string = ''.join(dna_list) # see how useful the .join() function is?
# Next, write the code here to calculate the statistics on this DNA sequence
2. Generate a string of 10000 DNA bases from the internet (https://www.bioinformatics.org/sms2/random_dna.html) and copy/paste them directly to a file on your computer.
Open that file, using the file reading code above, to calculate various statistics on this string. Do not hard-code any variables into your code: your code should be reusable for any length string.
Do the statistics from the Bioinformatics site seem to come from a uniform distribution (25% chance for C, G, A or T), or do the distribution of the base pairs follow the distribution [discovered by Chargaff](https://en.wikipedia.org/wiki/Chargaff%27s_rules#Percentages_of_bases_in_DNA)? If so, which organism do they approximate?
This challenge is to read in a constant stream of values and calculate the moving average of them. This is, again, based, on a real case that happens rather frequently.
The concentration of ammonia values can be downloaded and saved to your computer. Using the code shown above, create a with
block, and read the values from the file line-by-line:
You should check that what you get here in Python matches what you see in Excel, or some other software that can open the CSV file for you.
# Read the file directly from your local computer
filename = 'ammonia.csv'
with open(filename) as f:
for index, concentration in enumerate(f.readlines()):
# Skip the first line of the file: it is a text heading.
if index == 0:
continue
# Convert the text to a float, and then do something with it ...
print(float(concentration))
Building up the problem:
As solution: the first moving average value is 36.92, then the next one is 38.476, etc.
Modify your code only in 1 place to repeat the calculations, but with a window size of $n=15$ steps. In other words, your window size should not be hard-coded into your Python code.
This challenge is to create a very crude integrator for an ordinary differential equation. Based on Newton's law of cooling, placing an object, like a bottle of water in a cold environment, like a fridge, the temperature of the water, $T$, changing over time $t$, can be modeled as: $$ \dfrac{dT}{dt} = -k (T-F)$$
The fridge has a constant temperature, $F=5$°C; for this system, the value of $k = 0.08$. The equation can be rewritten as: $$ \dfrac{\Delta T}{\delta t} = -k (T - F)$$ for a short change in time, $\delta t = 2$ minutes. $$T_{i+1} - T_i = -k (\delta t)(T_i - F)$$
which shows how the temperature at time point $i+1$ (one step in the future) is related to the temperature now, at time $i$.
You can rewrite the equation as: $$ T_{i+1} = T_i -k (\delta t)(T_i - F)$$
In a loop, show how the temperature changes over time, starting from the temperature $T_{i=0} = 25$°C. Your output should look something like this:
At time 0 minutes the temperature is 25.0
At time 2 minutes the temperature is 21.8
Like in the previous challenge we want to integrate an equation, but this time for bacteria growing on a plate.
The equation for growth is
$$ \dfrac{dP}{dt} = rP $$where $P$ is the number of bacteria in the population, and $r$ is their rate of growth [number of bacteria/minute]. Integrating this equation will show exponential growth. This is not realistic. Eventually the bacteria will run out of space and their food source. So the equation is modified:
$$ \dfrac{dP}{dt} = rP - aP^2$$where they are limited by the factor $a$ in the equation.
The differential equation can be re-written as: $$P_{i+1} - P_i = \left[\,rP_i -a\,P_i^2\,\right]\delta t$$
which shows how the population at time point $i+1$ (one step in the future) is related to the population size now, at time $i$ over a short interval of time $\delta t$ minutes. You can read more about these logistic equation models.
In a loop, show how the population changes over time, starting from an initial population of $P_{i=0} = 500$ bacteria. The growth rate for this culture is $r=0.032$ and the coefficient $a = 1.4 \times 10^{-7}$.
Your printed output should look something like this:
At time 10 minutes there are 660 bacteria
At time 20 minutes there are ___ bacteria
Try integrating over
Later on in the course we will see how to collect these follow and plot them. But looking at the number, what shape does the curve have? Is it what you expect?
To cover during the interactive session:
# IGNORE this. Execute this cell to load the notebook's style sheet.
from IPython.core.display import HTML
css_file = './images/style.css'
HTML(open(css_file, "r").read())