All content here is under a Creative Commons Attribution CC-BY 4.0 and all source code is released under a BSD-2 clause license. Parts of these materials were inspired by https://github.com/engineersCode/EngComp/ (CC-BY 4.0), L.A. Barba, N.C. Clementi.
Please reuse, remix, revise, and reshare this content in any way, keeping this notice.
In the prior module you became comfortable doing calculations with vectors, matrices and arrays.
We now take a detour to look at *Python functions*.
Eventually the calculations you wrote code for in prior modules can be generalized. This code can be reused in the future, with other input data. Functions make this all possible.
So this module introduces new ideas, but also reuses the content from the prior modules.
Start a new (or use an existing) version controlled repository for your work. Commit your work regularly, where ever you see this icon.
You should have:
A standard feature in any programming language is the ability to write functions: a chunk of code that takes 0 or more inputs, and returns 0 or more outputs. Inputs are also sometimes referred to as function *arguments*.
Functions help you:
Please consider using this standard template for your functions:
def verb_description(...): """Comment of what the function does goes here.""" # Commands for the function a = ... return ...
plot_curves(...)
or transform_data(...)
or save_file(...)
, etc. Starting with a verb is not a fixed requirement, and sometimes a clear function name need not have verb: in a math library, the log(...)
function clearly calculates a log of the input; no need for calculate_log(...)
.""" Calculates the mean of ... """
return True
is a good signal to the user, to indicate the function did its work successfully.
Strange as it seems, some functions have no output. They are called for their side-effects. You have seen several of them already:
print(...)
a = print(...) # does not really have value assigning its output
annual_income = [40214, 66141, 8313, 97132, 8030124, 39120]
annual_income.sort() # no output
q = annual_income.sort() # again, no value to doing this
These are 2 examples where the function is called for what it does, not for the output it produces on the left-hand side. Can you remember any others you have used which are like this?
This might also seem strange, but such functions are useful. They often return the state of your device, the time, or something about the object they are attached to.
import os
os.getcwd() # cwd = current working directory
import sys
sys.getwindowsversion()
import datetime
datetime.datetime.now()
import time
cpu_time_start = time.process_time()
# do some heavy calculations
delta_time_used = time.process_time() - cpu_time_start
# also
exit()
Many functions in Python are of this type. Think for example about a list:
numbers = [1, 2, 3, 3, 3, 3, 2, 1]
numbers.count(3)
The .count(...)
function takes only 1 input and returns only 1 output. Try this: numbers.count()
or numbers.count(3, 2)
for example.
Now it is your turn.
Write a function which accepts that value $t$ as an input, and returns the population size $p(t)$ as output.
None
. You should use: isinstance(t, (float, int))
.Don't just copy/paste that: what does ininstance
do? Use help(isinstance)
to understand.
NaN
(not a number), which you can obtain either from the NumPy library np.nan
, or use the built-in NaN: float('nan')
.
Remember you can make your function provide:
return
or more explicitly return None
return answer
return (value_one, object_two, object_three)
In that last version we use a tuple
to create a single grouped output. Recall from module 1 where we saw you can create multiple variables in one line of code: a, b, c = (1, 2, 3)
.
In the same way you can make your function return multiple outputs and assign them. This code shows how the function is created and then used:
def calculate_summary_statistics(vector):
"""Calculates the mean, median, standard deviation and MAD."""
# code goes here
return (mean, median, stddev, mad)
x = ... # a NumPy vector
x_avg, x_median, x_std, x_mad = calculate_summary_statistics(x)
The tuple output from the function on the right-hand side is split across the 4 variables on the left-hand side.
Now it is your turn.
Complete the above code so that it will accept a NumPy vector and then return these 4 outputs:
You might need this definition for MAD: it is the median of the absolute deviations from the median: $$ \text{median} \big( \| x - \text{median}(x)\|\big)$$
First calculate the median, then the deviations from the median, then the absolute value, then the median of that.
Test it on this vector to understand the usefulness of the median and MAD:
x = [6, 9, 5, 6, 3, 8, 5, 72, 9, 6, 6, 7, 8, 0]
The standard deviation is more than twice as big as it should be, due to that single outlier.
Rather continue below, but if you have time, return back to the weather data from the previous module. Load the data from the Dutch meteorological service (KNMI), and use that as input for the above function you wrote.
There are also several functions of this sort. You have just used one of these above:
t = 45
isinstance(t, (float, int))
t = '45'
isinstance(t, (float, int))
isinstance(t, (float, int, str))
isinstance(t) # will raise an error
Now it is your turn to create a function with more than one input.
In module 3 you saw code, similar to what is below, that reads a text file:
import os
base_folder_mac_or_linux = '/users/home/yourname'
base_folder_windows = r'C:\Users\home\yourname'
filename = 'myfile.txt'
full_filename = os.path.join(base_folder_windows, filename)
N_lines = 15
with open(full_filename, "r") as f:
lines = []
for i in range(N_lines):
line = f.readline()
lines.append(line)
# The file is then closed at this point.
# Show the file preview
print(lines)
Do several things with this:
def preview_textfile(filename, N): # complete this part to return a preview in a list with `N` entries
lines
.print(preview_textfile(full_filename, 15))
Python allows you to specify the value of optional function inputs. In other words, you can specify default values if the user does not. The user can of course always override the values if they specify them.
With a small change, you can modify your function above:
def preview_textfile(filename, N=10): """Returns the first `N`(int) lines of `filename`. By default, the first 10 lines are returned."""
and ensure that the follow 3 instances of calling the function work as expected:
print(preview_textfile(full_filename))
print(preview_textfile(full_filename, 15))
print(preview_textfile(full_filename, N=5))
What do you need to change in your function to guard against user error?
print(preview_textfile(full_filename, N=5.0))
print(preview_textfile(full_filename, N=5.5))
print(preview_textfile(full_filename, '15'))
print(preview_textfile(full_filename, '5.5'))
As you can/should see, with a simple tweak, you can make your function far more tolerant of user input, and therefore more widely applicable.
It is time to bring all the above together.
Start a new file in your version control repository.
*Hint*: read the entire problem first.
In module 3 you had a challenge problem related to the cooling of an object in a fridge. The temperature of the object, $T$, changing over time $t$, can be modeled as: $$ \dfrac{dT}{dt} = -k (T-F)$$
The fridge has a constant temperature, $F=5$°C; and for this system, the value of $k = 0.08$. The equation can be rewritten as: $$ \dfrac{\Delta T}{\delta t} = -k (T - F)$$ for a short change in time, $\delta t = 0.5$ minutes. $$T_{i+1} = T_i -k (\delta t)(T_i - F)$$
which shows how the temperature at time point $i+1$ (one step in the future) is related to the temperature now, at time $i$. The object starts off with a temperature of 25 °C.
The challenge is to create a short function that the user can call:
time
of simulation and the temp
erature of the object.time, temp = simulate_cooling(time_final=30)
time, temp = simulate_cooling(time_final=30, initial_temp=25)
time, temp = simulate_cooling(time_final=30, initial_temp=25, delta_t=2.5)
time, temp = simulate_cooling(time_final=30, initial_temp=25, delta_t=2.5, show_plot=False)
*Hint*: to save yourself some time, you can get code to solve this problem already: https://github.com/kgdunn/python-basic-notebooks. Clone that repository, and look in the code
subdirectory for the file called fridge.py
. Modify that file to answer the above 4 questions.
Notice that the equation gives $T_i = T_{i=0}$ when $t=0$ (at the start of the simulation), and that as $t$ tends to get big, that the object temperature tends to the fridge temperature, $F$. 3. Add a 4th output, the simulation error. As you can see, this is quickly getting "ugly": all the outputs are vectors of the same length. Why not combine them into a matrix, with 4 columns. In the next module we will learn how.
Once 2 or more inputs are possible, it raises the question if the order of the inputs is important.
For example, if you have a function *signature* of:
def random_normal_values(size, mean=0.0, stddev=1.0):
""" Returns a vector of length `size` with randomly
distributed values. The values will come from a normal
distribution with `mean` and standard deviation of `stddev`."""
you can call it in several ways to get 3 values from the normal distribution $\mathcal{N}\left(\mu=6, \sigma=9 \right)$:
random_normal_values(3, 6, 9)
random_normal_values(3, 6, stddev=9)
random_normal_values(3, mean=6, stddev=9)
random_normal_values(size=3, mean=6, stddev=9)
# Yes, you can do this also!
random_normal_values(stddev=9, mean=6, size=3)
random_normal_values(mean=6, stddev=9, size=3)
random_normal_values(size=3, stddev=9, mean=6)
random_normal_values(stddev=9, size=3, mean=6)
random_normal_values(mean=6, size=3, stddev=9)
You can also use the default arguments, and specify only what you need to be different from the defaults:
random_normal_values(3, mean=6)
random_normal_values(3, stddev=9)
random_normal_values(3)
random_normal_values(size=3)
# But these will NOT work. Why?
random_normal_values(mean=6, stddev=9)
random_normal_values(mean=6)
random_normal_values(stddev=3)
This ability to call functions with default inputs, and inputs in different order provides tremendous flexibility. You don't need to remember the order of the arguments for a function. Specify them by name, and place them in a (logical!) order. In fact, always specifying the argument names when using a function is more explicit, and makes for clearer code.
Reading random_normal_values(3, 6, 9)
a few months after you wrote the code will almost certainly force you to go back to the original function to see what position 1, 2 and 3 of the inputs were. But seeing random_normal_values(size=3, mean=6, stddev=9)
is immediately clear; saving you (and others that use your code) substantial time. A little bit of extra typing now pays off in the future.
There are of course exceptions to the choice of argument order. Take the NumPy np.arange(start, stop, step)
function as an example:
np.arange(start=3, stop=10, step=2) # [3, 5, 7, 9]
np.arange(step=2, stop=10, start=3) # [3, 5, 7, 9]
The second version has the same output, but it is less readable and less intuitive. Therefore when using named inputs arguments, pick a logical order that aides (human) readability.
The location in computer memory of a variable in a function is different to the location of variables outside that function.
def funcA(x): x = x + 2 print('In the function the value of "x" is {}'.format(x)) return x x = 12 out = funcA(x) print('Outside and after the function "x" is still {}, while "out" = {}'.format(x, out))
will print
In the function the value of "x" is 14
Outside and after the function "x" is still 12, while "out" = 14
The memory location of the variable x
in the main part of the program is different to the memory location of x
inside the function, even though they have the same name. The identical name is coincidental. You can replace the x
inside the function with y
and it will still run the same: y = x + 2
.
We say the *scope* of the variable is different inside and outside the function. That is because variable x
inside the function on the left hand side of x = x + 2
is created for the duration of the function. We say that it has *local scope*. After the function is finished, that variable is freed up and deleted.
Scoping can be a tough concept to understand at first, and is also more complex than described here. We mention it here for now, so it is not surprising. You can read more about scoping in that notebook.
No introduction to functions is complete without applying it to Newton's method. We already saw Newton's law of cooling in the fridge simulation above, but he also spent his time thinking about mathematical *functions*, and when they cross the zero line on the $x$-axis.
The idea is if you have a math function, for example: $$f(x) = x^3 - 3x + 5$$ that the value of $x$ that makes $f(x)=0$ can have special meaning. It is also called the *zero* of a mathematical function. You can cheat, and plot the function, but this just gives you an idea of where the zero is, not the exact value.
Newton's method shows that if you start with a reasonable estimate of where the zero is $x_{i=0}$ (the value of $x$ for the iteration when $i=0$), that you can find that zero, by successive repetition: $$ x_{i+1} = x_i - \dfrac{f(x_i)}{f'(x_i)}$$
Start with $x_i$ on the right hand side, and update your estimate to $x_{i+1}$. Then repeat; every time you repeat, the value of $i$ increases, starting from zero: $i = 0, 1, 2, \ldots$.
You only need to know the original function value, and the derivative of the function, shown as $f'(x)$. You also need to know when to stop.
Complete this template code, which stops when 2 conditions are met. What are they?
def f_poly(x):
# Complete this: returns f(x) = x^3 - 3x + 5
pass
def f_derivative_poly(x):
# Complete this: returns f'(x) = 3x^2 - 3
pass
# Initial guess
x = ...
iterations = 0
max_error = 1.0E-8
max_iterations = 20
relative_error = 10000
while (relative_error > max_error) and (iterations <= max_iterations):
iterations += 1
x_prior = x
x = x - f_poly(x) / f_derivative_poly(x)
relative_error = abs((x - x_prior) / x)
# Add a print statement here to track the code's progress in the loop
print('The zero of f(x) was found to be {}'.format(x))
x
in those functions is different to the x
outside.Here's an interesting feature. Functions in Python are also objects. (Remember, we said *everything* in Python is an object). As such, a function can also be an input argument, since all arguments must be objects.
Do the following:
newton_zero(...)
and give it these inputs:Call your function several times. Use this function signature:
zero = newton_zero(guess = -3, f_x=f_poly, df_dx=f_derivative_poly)
In the prior module you wrote a short piece of code for linear regression. Linear regression (least squares) is a tool you might frequently use in your work.
$$ y = b_0 + b_1x + e$$There are ways to use it, which are built-in with NumPy, but it only does the basics. We usually want more information:
x
y
x
and y
are the same length, to make sure you have the right dataComplete this template, turning it into a function, and completing all the requirements:
import numpy as np
x = np.array(...)
y = np.array(...)
# Check that lengths are the same.
if not(...):
# Return early if lengths do not match
return (np.nan, np.nan, None, np.nan)
X = np.vstack([np.ones(len(x)), x]).T
# b is defined as (X^T X)^{-1} X^T y
b_vector = ...
# residuals = y - y_hat
residuals = ...
# Standard error:
se = ...
return (...)
Test your code:
x = [1, 2, 3, 4]
y = [1, 0, 2, 5, 7]
you should get an output of (nan, nan, None, nan)
x = [0.019847603, 0.039695205, 0.059542808, 0.07939041, 0.099238013, 0.119085616, 0.138933218]
y = [0.2, 0.195089996, 0.284090012, 0.37808001, 0.46638, 0.561559975, 0.652559996]
you should get:
0.06641
,4.08993
,0.03206
,-0.0336679
,0.052417
.This last step hints at what is called test-driven development (TDD). You actually first write tests to check your function. Then you start coding your function. You stop when all tests are successfully passed. In the above, the values are from another software package, which is known to be tested. In the Advanced classes we will return to TDD.
Wrap up this section by committing all your work. Have you used a good commit message? Push your work, to refer to later, but also as a backup.
*Feedback and comments about this worksheet?* Please provide any anonymous comments, feedback and tips.
# IGNORE this. Execute this cell to load the notebook's style sheet.
from IPython.core.display import HTML
css_file = './images/style.css'
HTML(open(css_file, "r").read())