Notebook

- 0.1 Preparing for this module###
1 Functions: a template
- 1.1 A template
2 Functions: no output
3 Functions: no inputs
4 Functions: one input, one output
5 Functions: one input, two outputs
- 5.1 Extend yourself (later?)
6 Functions: two or more inputs, one output
7 Functions: optional (default) inputs
8 Functions: multiple inputs and outputs
9 ➜ Challenge yourself 1: simulated cooling
- 9.1 Extend yourself (later?)
10 Functions: positional arguments
11 Functions: scope (variable values in a function)
12 ➜ Challenge yourself 2: zero of a function
13 ➜ Challenge yourself 3: linear regression

All content here is under a Creative Commons Attribution CC-BY 4.0 and all source code is released under a BSD-2 clause license. Parts of these materials were inspired by https://github.com/engineersCode/EngComp/ (CC-BY 4.0), L.A. Barba, N.C. Clementi.

Please reuse, remix, revise, and reshare this content in any way, keeping this notice.

Module 6: Overview¶

In the prior module you became comfortable doing calculations with vectors, matrices and arrays.

We now take a detour to look at *Python functions*.

Eventually the calculations you wrote code for in prior modules can be generalized. This code can be reused in the future, with other input data. Functions make this all possible.

So this module introduces new ideas, but also reuses the content from the prior modules.

Start a new (or use an existing) version controlled repository for your work. Commit your work regularly, where ever you see this icon.

Preparing for this module###¶

You should have:

Completed worksheet 5
Read sections 12.1 to 12.5 from Foundations of Python Programming (FOPP)

Functions: a template¶

A standard feature in any programming language is the ability to write functions: a chunk of code that takes 0 or more inputs, and returns 0 or more outputs. Inputs are also sometimes referred to as function *arguments*.

Functions help you:

split up your code,
make it modular,
allow you to reuse these modular blocks in the future, in other code;
help you with debugging: you can then isolate bugs in your code by elimination. Functions that have been tested and checked are not likely to be the source of a problem any more.

A template¶

Please consider using this standard template for your functions:

def verb_description(...):
   """Comment of what the function does goes here."""

   # Commands for the function
   a = ...

   return ...

Making the function name start with a verb makes it clear to you - and others - what it does. E.g. plot_curves(...) or transform_data(...) or save_file(...), etc. Starting with a verb is not a fixed requirement, and sometimes a clear function name need not have verb: in a math library, the log(...) function clearly calculates a log of the input; no need for calculate_log(...).
But a verb does prevent you from making the function do more than it should. It just does that verb.
Start the code with a triple-quote comment: """ Calculates the mean of ... """
Using the 4-space indent on the left, write the various statements and loops to do the work in the function.
It is OK if a function is just a single line! Why? In the future you might expand it a bit, e.g. to check the inputs, handle missing values, add extra inputs to the function. Don't think: "It's just one line; why bother?". If you suspect you will reuse the idea of that 1 line in several places, rather put it in a function.
Return something from your function. It is not mandatory, but it is clearer. Even return True is a good signal to the user, to indicate the function did its work successfully.

In [ ]:

Functions: no output¶

Strange as it seems, some functions have no output. They are called for their side-effects. You have seen several of them already:

print(...)  
a = print(...)  # does not really have value assigning its output


annual_income = [40214, 66141, 8313, 97132, 8030124, 39120]
annual_income.sort()  # no output
q = annual_income.sort()  # again, no value to doing this

These are 2 examples where the function is called for what it does, not for the output it produces on the left-hand side. Can you remember any others you have used which are like this?

In [ ]:

Functions: no inputs¶

This might also seem strange, but such functions are useful. They often return the state of your device, the time, or something about the object they are attached to.

import os
os.getcwd()  # cwd = current working directory

import sys
sys.getwindowsversion()

import datetime
datetime.datetime.now()

import time
cpu_time_start = time.process_time()
# do some heavy calculations
delta_time_used = time.process_time() - cpu_time_start

# also
exit()

In [ ]:

Functions: one input, one output¶

Many functions in Python are of this type. Think for example about a list:

numbers = [1, 2, 3, 3, 3, 3, 2, 1]
numbers.count(3)

The .count(...) function takes only 1 input and returns only 1 output. Try this: numbers.count() or numbers.count(3, 2) for example.

Now it is your turn.

In module 1 we considered the problem where the population of a country could be approximated by the formula $$p(t) = \dfrac{197273000}{1+e^{− 0.03134(t − 1913)}}$$ where $t$ was the time, measured in years.

Write a function which accepts that value $t$ as an input, and returns the population size $p(t)$ as output.

That was (or should have been) a short one-line function. Now expand your function to check that $t$ is a numeric value. If $t$ is numeric, then return $p(t)$, else return None. You should use: isinstance(t, (float, int)).

Don't just copy/paste that: what does ininstance do? Use help(isinstance) to understand.

Finally expand the function once more, to ensure that $t$ is larger than 1913. If not, return NaN (not a number), which you can obtain either from the NumPy library np.nan, or use the built-in NaN: float('nan').

In [ ]:

Functions: one input, two outputs¶

Remember you can make your function provide:

no output: return or more explicitly return None
1 output: return answer
more than 1 output: return (value_one, object_two, object_three)

In that last version we use a tuple to create a single grouped output. Recall from module 1 where we saw you can create multiple variables in one line of code: a, b, c = (1, 2, 3).

In the same way you can make your function return multiple outputs and assign them. This code shows how the function is created and then used:

def calculate_summary_statistics(vector):
    """Calculates the mean, median, standard deviation and MAD."""
    # code goes here

    return (mean, median, stddev, mad)

x = ... # a NumPy vector
x_avg, x_median, x_std, x_mad = calculate_summary_statistics(x)

The tuple output from the function on the right-hand side is split across the 4 variables on the left-hand side.

Now it is your turn.

Complete the above code so that it will accept a NumPy vector and then return these 4 outputs:

the mean
the standard deviation
the median [a robust mean]
the median absolute deviation (MAD) [a robust standard deviation]

You might need this definition for MAD: it is the median of the absolute deviations from the median: $$ \text{median} \big( \| x - \text{median}(x)\|\big)$$

First calculate the median, then the deviations from the median, then the absolute value, then the median of that.

Test it on this vector to understand the usefulness of the median and MAD:

x = [6, 9, 5, 6, 3, 8, 5, 72, 9, 6, 6, 7, 8, 0]

The standard deviation is more than twice as big as it should be, due to that single outlier.

Extend yourself (later?)¶

Rather continue below, but if you have time, return back to the weather data from the previous module. Load the data from the Dutch meteorological service (KNMI), and use that as input for the above function you wrote.

How similar are the mean and the median?
How similar are the standard deviation and the MAD? Does this make sense?

In [ ]:

Functions: two or more inputs, one output¶

There are also several functions of this sort. You have just used one of these above:

t = 45
isinstance(t, (float, int))

t = '45'
isinstance(t, (float, int))

isinstance(t, (float, int, str))

isinstance(t) # will raise an error

Now it is your turn to create a function with more than one input.

In module 3 you saw code, similar to what is below, that reads a text file:

import os
base_folder_mac_or_linux = '/users/home/yourname'
base_folder_windows = r'C:\Users\home\yourname' 
filename = 'myfile.txt'
full_filename = os.path.join(base_folder_windows, filename)
N_lines = 15
with open(full_filename, "r") as f:
    lines = []
    for i in range(N_lines):
        line = f.readline()
        lines.append(line)

    # The file is then closed at this point.

# Show the file preview
print(lines)

Do several things with this:

First copy the code as it is, and ensure it actually works on a file. You will need to modify the first few lines to make this work.
Next, modify the code to make it a function: put the actual activity of opening and getting the first few lines of text into a function.

def preview_textfile(filename, N):
    # complete this part to return a preview in a list with `N` entries

Complete your function to ensure that it returns the list called lines.
Modify the rest of the code, outside the function, so you can call your new function in a single line:

print(preview_textfile(full_filename, 15))

In [ ]:

Functions: optional (default) inputs¶

Python allows you to specify the value of optional function inputs. In other words, you can specify default values if the user does not. The user can of course always override the values if they specify them.

With a small change, you can modify your function above:

def preview_textfile(filename, N=10):
    """Returns the first `N`(int) lines of `filename`.
       By default, the first 10 lines are returned."""

and ensure that the follow 3 instances of calling the function work as expected:

print(preview_textfile(full_filename))
print(preview_textfile(full_filename, 15))
print(preview_textfile(full_filename, N=5))

What do you need to change in your function to guard against user error?

print(preview_textfile(full_filename, N=5.0))
print(preview_textfile(full_filename, N=5.5))
print(preview_textfile(full_filename, '15'))
print(preview_textfile(full_filename, '5.5'))

As you can/should see, with a simple tweak, you can make your function far more tolerant of user input, and therefore more widely applicable.

In [ ]:

Functions: multiple inputs and outputs¶

It is time to bring all the above together.

Start a new file in your version control repository.

➜ Challenge yourself 1: simulated cooling¶

*Hint*: read the entire problem first.

In module 3 you had a challenge problem related to the cooling of an object in a fridge. The temperature of the object, $T$, changing over time $t$, can be modeled as: $$ \dfrac{dT}{dt} = -k (T-F)$$

The fridge has a constant temperature, $F=5$°C; and for this system, the value of $k = 0.08$. The equation can be rewritten as: $$ \dfrac{\Delta T}{\delta t} = -k (T - F)$$ for a short change in time, $\delta t = 0.5$ minutes. $$T_{i+1} = T_i -k (\delta t)(T_i - F)$$

which shows how the temperature at time point $i+1$ (one step in the future) is related to the temperature now, at time $i$. The object starts off with a temperature of 25 °C.

The challenge is to create a short function that the user can call:

To always get 2 NumPy vectors containing the time of simulation and the temperature of the object.
The user should be able to call the function in several ways:

Only specify the total simulation time:

time, temp = simulate_cooling(time_final=30)

Specify the total time, and initial temperature of the object:

time, temp = simulate_cooling(time_final=30, initial_temp=25)

Specify the total time, and initial temperature of the object, and simulation resolution:
```
time, temp = simulate_cooling(time_final=30, initial_temp=25, delta_t=2.5)
```
Future-proof your function! We will learn in a later module how you can plot data. For now though, you can add a plotting option to your function, which will optionally plot the temperature against time. But because you don't know yet how to do so, at least add it to the function *signature*, for the future. Save your code in version control, and come back and add it later.
```
time, temp = simulate_cooling(time_final=30, initial_temp=25, delta_t=2.5, show_plot=False)
```

*Hint*: to save yourself some time, you can get code to solve this problem already: https://github.com/kgdunn/python-basic-notebooks. Clone that repository, and look in the code subdirectory for the file called fridge.py. Modify that file to answer the above 4 questions.

Extend yourself (later?)¶

Does the function give you the output you expect if you put an object in the fridge which is frozen already; i.e. the initial temperature is $-18$°C?
Modify the function to return a 3rd output, the true value of simulating the fridge: $$T_i = F +(T_{i=0} − F) e^{−Kt}$$

Notice that the equation gives $T_i = T_{i=0}$ when $t=0$ (at the start of the simulation), and that as $t$ tends to get big, that the object temperature tends to the fridge temperature, $F$. 3. Add a 4th output, the simulation error. As you can see, this is quickly getting "ugly": all the outputs are vectors of the same length. Why not combine them into a matrix, with 4 columns. In the next module we will learn how.

In [ ]:

Functions: positional arguments¶

Once 2 or more inputs are possible, it raises the question if the order of the inputs is important.

For example, if you have a function *signature* of:

def random_normal_values(size, mean=0.0, stddev=1.0):
    """ Returns a vector of length `size` with randomly
    distributed values. The values will come from a normal 
    distribution with `mean` and standard deviation of `stddev`."""

you can call it in several ways to get 3 values from the normal distribution $\mathcal{N}\left(\mu=6, \sigma=9 \right)$:

random_normal_values(3, 6, 9)
random_normal_values(3, 6, stddev=9)
random_normal_values(3, mean=6, stddev=9)
random_normal_values(size=3, mean=6, stddev=9)

# Yes, you can do this also! 
random_normal_values(stddev=9, mean=6, size=3)
random_normal_values(mean=6, stddev=9, size=3)
random_normal_values(size=3, stddev=9, mean=6)
random_normal_values(stddev=9, size=3, mean=6)
random_normal_values(mean=6, size=3, stddev=9)

You can also use the default arguments, and specify only what you need to be different from the defaults:

random_normal_values(3, mean=6)
random_normal_values(3, stddev=9)

random_normal_values(3)
random_normal_values(size=3)

# But these will NOT work. Why?
random_normal_values(mean=6, stddev=9)
random_normal_values(mean=6)
random_normal_values(stddev=3)

This ability to call functions with default inputs, and inputs in different order provides tremendous flexibility. You don't need to remember the order of the arguments for a function. Specify them by name, and place them in a (logical!) order. In fact, always specifying the argument names when using a function is more explicit, and makes for clearer code.

Reading random_normal_values(3, 6, 9) a few months after you wrote the code will almost certainly force you to go back to the original function to see what position 1, 2 and 3 of the inputs were. But seeing random_normal_values(size=3, mean=6, stddev=9) is immediately clear; saving you (and others that use your code) substantial time. A little bit of extra typing now pays off in the future.

There are of course exceptions to the choice of argument order. Take the NumPy np.arange(start, stop, step) function as an example:

np.arange(start=3, stop=10, step=2)   # [3, 5, 7, 9]
np.arange(step=2, stop=10, start=3)   # [3, 5, 7, 9]

The second version has the same output, but it is less readable and less intuitive. Therefore when using named inputs arguments, pick a logical order that aides (human) readability.

In [ ]:

Functions: scope (variable values in a function)¶

The location in computer memory of a variable in a function is different to the location of variables outside that function.

def funcA(x):
    x = x + 2
    print('In the function the value of "x" is {}'.format(x))
    return x

x = 12
out = funcA(x)
print('Outside and after the function "x" is still {}, while "out" = {}'.format(x, out))

will print

In the function the value of "x" is 14

Outside and after the function "x" is still 12, while "out" = 14

The memory location of the variable x in the main part of the program is different to the memory location of x inside the function, even though they have the same name. The identical name is coincidental. You can replace the x inside the function with y and it will still run the same: y = x + 2.

We say the *scope* of the variable is different inside and outside the function. That is because variable x inside the function on the left hand side of x = x + 2 is created for the duration of the function. We say that it has *local scope*. After the function is finished, that variable is freed up and deleted.

Scoping can be a tough concept to understand at first, and is also more complex than described here. We mention it here for now, so it is not surprising. You can read more about scoping in that notebook.

In [ ]:

➜ Challenge yourself 2: zero of a function¶

No introduction to functions is complete without applying it to Newton's method. We already saw Newton's law of cooling in the fridge simulation above, but he also spent his time thinking about mathematical *functions*, and when they cross the zero line on the $x$-axis.

The idea is if you have a math function, for example: $$f(x) = x^3 - 3x + 5$$ that the value of $x$ that makes $f(x)=0$ can have special meaning. It is also called the *zero* of a mathematical function. You can cheat, and plot the function, but this just gives you an idea of where the zero is, not the exact value.

Newton's method shows that if you start with a reasonable estimate of where the zero is $x_{i=0}$ (the value of $x$ for the iteration when $i=0$), that you can find that zero, by successive repetition: $$ x_{i+1} = x_i - \dfrac{f(x_i)}{f'(x_i)}$$

Start with $x_i$ on the right hand side, and update your estimate to $x_{i+1}$. Then repeat; every time you repeat, the value of $i$ increases, starting from zero: $i = 0, 1, 2, \ldots$.

You only need to know the original function value, and the derivative of the function, shown as $f'(x)$. You also need to know when to stop.

Complete this template code, which stops when 2 conditions are met. What are they?

def f_poly(x):
    # Complete this: returns f(x) = x^3 - 3x + 5
    pass

def f_derivative_poly(x):
    # Complete this: returns f'(x) = 3x^2 - 3
    pass

# Initial guess
x = ...
iterations = 0
max_error = 1.0E-8
max_iterations = 20
relative_error = 10000
while (relative_error > max_error) and (iterations <= max_iterations):
    iterations += 1
    x_prior = x

    x = x - f_poly(x) / f_derivative_poly(x)
    relative_error = abs((x - x_prior) / x)

    # Add a print statement here to track the code's progress in the loop

print('The zero of f(x) was found to be {}'.format(x))

Use NumPy to complete the two sub-functions. Understand that due to local scoping, the value of x in those functions is different to the x outside.
How many iterations did you require to converge to a zero?
Change parameters such as the maximum number of iterations, or the relative error tolerance to get a feel for the performance.
Try different initial guesses. Can you make the method crash? (For example, try an initial guess of 1.0).

Here's an interesting feature. Functions in Python are also objects. (Remember, we said *everything* in Python is an object). As such, a function can also be an input argument, since all arguments must be objects.

Do the following:

Check your existing code into version control. So you can always come back to this point.
Modify the code so you can call a function newton_zero(...) and give it these inputs:

an initial guess [required]
a function that calculates $f(x)$ [required]
a function that calculates $f'(x)$ [required]
the maximum iterations [optional; default is 25]
the maximum allowable error [optional; default is 10E-10]

Call your function several times. Use this function signature:

zero = newton_zero(guess = -3, f_x=f_poly, df_dx=f_derivative_poly)

Check your completed code into version control and submit it.
Try using this method now with a different function $f(x)=2x^2-x^4$ and its derivative; see if you can lead Newton's method astray.

In [ ]:

➜ Challenge yourself 3: linear regression¶

In the prior module you wrote a short piece of code for linear regression. Linear regression (least squares) is a tool you might frequently use in your work.

$$ y = b_0 + b_1x + e$$

There are ways to use it, which are built-in with NumPy, but it only does the basics. We usually want more information:

we want a function that accepts the vector x
the second required input is a vector y
the function must check that x and y are the same length, to make sure you have the right data
it must return several outputs in this order, in a tuple:

the regression intercept, $b_0$
the regression slope, $b_1$
the vector of residuals $e = y - \hat{y}$, where $\hat{y} = b_0 + b_1 x$
the standard error = $\sqrt{ \dfrac{\sum{ e_i^2 }} {n - 2}}$, where $n$ is the length of $x$ or $y$.

Complete this template, turning it into a function, and completing all the requirements:

import numpy as np
x = np.array(...)
y = np.array(...)

# Check that lengths are the same. 
if not(...):
    # Return early if lengths do not match
    return (np.nan, np.nan, None, np.nan)

X = np.vstack([np.ones(len(x)), x]).T

# b is defined as (X^T X)^{-1} X^T y
b_vector = ...

# residuals = y - y_hat
residuals = ...

# Standard error:
se = ...

return (...)

Test your code:

With these inputs:

x = [1, 2, 3, 4]
y = [1, 0, 2, 5, 7]

you should get an output of (nan, nan, None, nan)

With these inputs:

x = [0.019847603, 0.039695205, 0.059542808, 0.07939041, 0.099238013,  0.119085616, 0.138933218]
 y = [0.2,         0.195089996, 0.284090012, 0.37808001, 0.46638,      0.561559975, 0.652559996]

you should get:

an intercept of 0.06641,
a slope of 4.08993,
a standard error of 0.03206,
the smallest residual being -0.0336679,
and the largest one being 0.052417.

This last step hints at what is called test-driven development (TDD). You actually first write tests to check your function. Then you start coding your function. You stop when all tests are successfully passed. In the above, the values are from another software package, which is known to be tested. In the Advanced classes we will return to TDD.

In [ ]:

Wrap up this section by committing all your work. Have you used a good commit message? Push your work, to refer to later, but also as a backup.

*Feedback and comments about this worksheet?* Please provide any anonymous comments, feedback and tips.

In [ ]:

# IGNORE this. Execute this cell to load the notebook's style sheet.
from IPython.core.display import HTML
css_file = './images/style.css'
HTML(open(css_file, "r").read())

In [ ]:

Table of Contents

Module 6: Overview¶

Preparing for this module###¶

Functions: a template¶

A template¶

Functions: no output¶

Functions: no inputs¶

Functions: one input, one output¶

Functions: one input, two outputs¶

Extend yourself (later?)¶

Functions: two or more inputs, one output¶

Functions: optional (default) inputs¶

Functions: multiple inputs and outputs¶

➜ Challenge yourself 1: simulated cooling¶

Extend yourself (later?)¶

Functions: positional arguments¶

Functions: scope (variable values in a function)¶

➜ Challenge yourself 2: zero of a function¶

➜ Challenge yourself 3: linear regression¶