All content here is under a Creative Commons Attribution CC-BY 4.0 and all source code is released under a BSD-2 clause license.
Please reuse, remix, revise, and reshare this content in any way, keeping this notice.
We cover a diverse range of topics:
They seem unrelated, but they hang together conceptually: they are all about sequences, or collections: characters in a strings, items in a list, and loops to process the sequence. We will formally compare all sequence types later. For now let us just use them.
At the end, and in between these sections we will cover some topics related to commenting.
You should cover these resources (it can take quite some time!)
Strings are some of the simplest objects in Python. In the prior module you created several strings. Now create this string in Python:
s = """Secretly under development for the past three years, Bezos said the
"Blue Moon" lander, using a powerful new hydrogen-powered engine generating up
to 10,000 pounds of thrust, will be capable of landing up to 6.5 metric tons
of equipment on the lunar surface."""
Now use the above string to perform the following actions. Look up the Standard library help files for strings
(like we showed last time) to find the methods required.
print(s * 8)
.print(s + s)
. Do these two mathematical operations make sense for strings?Secretly
appear? How does this differ with MATLAB?Bezos
appear?True
or False
if the string endswith
a full stop.
The above are all effectively done using what are called *methods*.
A method an attribute of an object.
In the above, a string
is your object and objects have one or more attributes.
Some tips:
dir(...)
command.s = """Secretly under development for ... the lunar surface."""
dir(s)
['__add__', '__class__', '__contains__', '__delattr__', '__dir__', '__doc__',
'__eq__', '__format__', '__ge__', '__getattribute__', '__getitem__', '__getnewargs__',
'__gt__', '__hash__', '__init__', '__iter__', '__le__', '__len__', '__lt__', '__mod__',
'__mul__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__rmod__',
'__rmul__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', 'capitalize',
'casefold', 'center', 'count', 'encode', 'endswith', 'expandtabs', 'find', 'format',
'format_map', 'index', 'isalnum', 'isalpha', 'isdecimal', 'isdigit', 'isidentifier',
'islower', 'isnumeric', 'isprintable', 'isspace', 'istitle', 'isupper', 'join', 'ljust',
'lower', 'lstrip', 'maketrans', 'partition', 'replace', 'rfind', 'rindex', 'rjust',
'rpartition', 'rsplit', 'rstrip', 'split', 'splitlines', 'startswith', 'strip',
'swapcase', 'title', 'translate', 'upper', 'zfill']
You can ignore all the attributes beginning and ending with a double underscore, for example __add__
. The attributes which are of practical use to you are the ones starting from capitalize
, all the way to the end.
You don't need to create a string s
first to get a list of the attributes. You can also use this shortcut:
dir('')
dir(str)
If you see an attribute that looks interesting, you can request help on it: help(''.startswith)
or help("".startswith)
. Notice the ''
in the brackets: it creates an empty string, and then accesses the attribute .startswith
and then asks for help on that.
You will get a piece of help text printed to the screen. This is helpful later on when you are comfortable with Python. In the beginning it is more helpful to search in a search engine, which will give you a page with examples. The built-in Python help is usually very very brief.
Use this knowledge know to figure out what the difference is between s.find
and s.index
. Make sense?
You can do what is called slicing on a string. Slicing is the ability to get sub-parts of a string:
word = 'landing'
print(word[1:4])
word[3:]
?word[3:99]
?word[2:6:3]
?word[6:2:-1]
word[-4:-7:-1]
Speaking of DNA ... create this sequence in Python:
seq = """TAGGGGCCTCCAATTCATCCAACACTCTACGCCTTCTCCAAGAGCTAGTAGGGCACCCTGCAGTTGGAAAGGGAACTATTTCGTAGGGCGAGCCCATACCGTCTCTCTTGCGGAAGACTTAACACGATAGGAAGCTGGAATAGTTTCGAACGATGGTTATTAATCCTAATAACGGAACGCTGTCTGGAGGATGAGTGTGACGGAGTGTAACTCGATGAGTTACCCGCTAATCGAACTGGGCGAGAGATCCCAGCGCTGATGCACTCGATCCCGAGGCCTGACCCGACATATCAGCTCAGACTAGAGCGGGGCTGTTGACGTTTGGGGTTGAAAAAATCTATTGTACCAATCGGCTTCAACGTGCTCCACGGCTGGCGCCTGAGGAGGGGCCCACACCGAGGAAGTAGACTGTTGCACGTTGGCGATGGCGGTAGCTAACTAAGTCGCCTGCCACAACAACAGTATCAAAGCCGTATAAAGGGAACATCCACACTTTAGTGAATCGAAGCGCGGCATCAGAATTTCCTTTTGGATACCTGATACAAAGCCCATCGTGGTCCTTAGACTTCGTGCACATACAGCTGCACCGCACGCATGTGGAATTAGAGGCGAAGTACGATTCCTAGACCGACGTACGATACAACTATGTGGATGTGACGAGCTTCTTTTATATGCTTCGCCCGCCGGACCGGCCTCGCGATGGCGTAG"""
GATTAG
in the sequence?TTTT
occur?A
entries with T
's and all C
entries with G
's.T
entries to A
and all A
entries to T
.
We will cover creating, adding, accessing and using lists of objects.
You have seen this before: create a list with the square bracket characters: [
and ]
.
For example: words = ['Mary', 'loved', 'chocolate.']
One of the most useful functions in Python is len(...)
. Verify that it returns an integer value of 3. Does it have the type you expect?
The entries in the list can be mixed types (contrast this to most other programming languages!)
group = ['yeast', 'bacillus', 994, 'aspergillus' ]
An important test is to check if the list contains something:
'aspergillus' in group
499 in group
Like we saw with strings, you can use the *
and +
operators:
group * 3
group + group # might not do what you expect!
group - group # oooops
And like strings, you refer to them based on the position counter of 0:
group[0]
# but this is also possible:
group[-3]
# however, is this expected?
group[4]
Lists, also have have some methods that you can use. Lists in fact have far fewer methods than strings. Remember how to get a list of methods from the prior module?
dir(....) # what do you fill in here?
How many methods do you see which you can apply to a list?
Let's try a few of them out:
append
a new entry to the group
list you created above: add the entry "Candida albicans"reptiles = ['crocodile', 'turtle']
and then try: group.extend(reptiles)
.crocodile
entry from the list. Print it again to verify it succeeded.group.reverse()
, and print the group
variable to the screen.group = group.reverse()
and print the group
variable to the screen. What happened this time?group = ['yeast', 'bacillus', 'aspergillus' ]
and try group.sort()
. Notice that .sort()
, like the .reverse()
method operate in-place: there is no need to assign the output of the action to a new variable. In fact, you cannot.group = ['yeast', 'bacillus', 994, 'aspergillus' ]
; and now try group.sort()
. What does the error message tell you?
Lists behave like a stack: you can add things to the end using .append()
and you can remove them again with .pop()
.
Think of a stack of plates: last appended, first removed.
Try it:
species = ['chimp', 'bacillus', 'aspergillus']
species.append('hoooman')
first_out = species.pop()
print(first_out)
arachnid
between chimp
and bacillus
using the .insert()
command. Print the list to verify it.If you don't know how to use the
.insert()
method, but you know if exists, you can typehelp([].insert)
at the command prompt to get a quick help. Or you can search the web which gives more comprehensive help, with examples.
.index()
function to find the index of "bacillus". Then use the .pop()
method to remove it. In other words, do not directly provide .pop()
the integer index to remove. Assign the popped entry to a new variable.
The for
loop is used to run a piece of code a certain number of times. The basic structure is shown, with an example that prints the integer values from 3 up to, and including, 8:
# This is one way to do it:
for i in range(3, 9):
# You can have many lines of code in the for-loop.
# As an example, two for-loop statements are shown here.
print(i)
print('-----')
Before the command print(i)
is a tab character or 4 spaces. Please use spaces, and not tabs. Especially if you will interact with other colleagues writing code. Therefore the letter p
from print
goes exactly under the i
.
That i
is the loop counter. The range(3, 9)
tells how many times the loop will iterate.
Use list(range(3, 9))
to see a list representation of the range()
function. Try creating these ranges:
range
command to create the values [-10, -40, -70]
Notice how these behave exactly as the string slices seen above.
Inside the for loop you can write one or more statements. In the above there are 2 statements and a comment. It is usual to start your comment - if it is required -- with an indent as well. This way it is clear the comment refers to the contents of the for-loop.
You can call the loop counter anything you like, as long as it is a valid variable name. Remember those from last time?
You can loop over many types of objects in Python. Try this:
reptiles = ['crocodile', 'turtle', 12.34, 'lizard', 'snake', False]
for animal in reptiles:
print('The "animal" object is of type ' + str(type(animal)))
and here you can see dynamic typing at its finest: the animal
variable is dynamically changing its type in the loop.
You can also iterate over the entries of a string!
sequence = "TAGGGGCCTCCA"
number = 1
for base in sequence:
print('Base number {} is {}'.format(number, base))
number += 1
In the above we introduced another concept: that you can print with the .format()
command. We will see more of this later, but then it won't be a surprise.
Now that you have seen how you can iterate over the items of a list, let's try to put this to use:
3 times 1 is 3
3 times 2 is 6
3 times 3 is 9 ...
.format()
command, as demonstrated above.[0, 3, 9, 12, 27, 35, 42, 50, 66]
19
. Note: don't worry about short code, or efficiency. Just find the answer. In the real example the list was thousands of entries long and was to find the closest time within $\pm$ 5 minutes. Then you need to worry about efficiency.
Advanced tip: sometimes you want to iterate through a list, but also know which entry you are iterating on. You can do both simultaneously with the enumerate
command.
names = ['Leonardo', 'Carl', 'Amiah', 'Yaretzi', 'Destiny', 'Alan']
for index, name in enumerate(names):
print('{} is number {} in the list'.format(name, index+1))
What enumerate
does is to create a tuple
with 2 entries. These two entries are dynamically assigned: the first one is an integer
assigned to index
and the second one is assigned to name
in this example. You are free to choose both variable names.
Further self-development:
enumerate
function, eliminating the manual number
tracking.reversed
keyword, which can be used inside enumerate
to run your for-loop in reverse.
Comments are often as important as the code itself. But it takes time to write them.
The choice of variable names is related to the topic of comments. In many ways, the syntax of Python makes the code self-documenting, meaning you do not need to add comments at all. But it definitely is assisted by choosing meaningful variable names:
for genome in genome_list: <do something with genome>
This quite clearly shows that we are iterating over the all genomes in some iterable (it could be a list, tuple, or set, for example) container variables of sequenced genomes.
But here the code structure is identical:
>```python
>for k in seq:
> <do something with k>
Later on in the code it might not be clear what k
represents. It is also not clear what seq
is, or contains.
Comments should be added in these places and cases:
To cover during the interactive session:
# %% text here (in Spyder)
# IGNORE this. Execute this cell to load the notebook's style sheet.
from IPython.core.display import HTML
css_file = './images/style.css'
HTML(open(css_file, "r").read())