All content here is under a Creative Commons Attribution CC-BY 4.0 and all source code is released under a BSD-2 clause license. Parts of these materials were inspired by https://github.com/engineersCode/EngComp/ (CC-BY 4.0), L.A. Barba, N.C. Clementi.
Please reuse, remix, revise, and reshare this content in any way, keeping this notice.
We cover the following topics here:
This session appears lengthy, but it is a recap of very familiar topics.
Quickly go over what you are comfortable with; we hope to get everyone to the same level of understanding.
"Arrays store objects of the same type."
There's a lot in that sentence:
A quick recap might be helpful (refer to session 1 for a refresher)
type(object)
will tell you which type of object you have. For example type(45.2)
will give float
as a reply.So now you should understand that an *array* is just a collection of these objects. Let's take a look with an example.
Here is a collection of floating points objects:
[45.2, 91.2, 67.2, -23.78]
The type of the object is float
(we could have also used int
(integer) objects). The 4 objects are collected in a list, and that list is also an object.
Remember you can always confirm the *type* of an *object* as follows. Try it:
type(45.2)
type(42)
type('some text')
type([45.2, 91.2, 67.2, -23.78])
Let's quickly get a few definitions out of the way, and start. Start by collecting some objects together, first singly (scalar), then in a list (vector), then as a 'spreadsheet' (matrix), then as an array (3-dimensional, or higher dimensional).
If our collection of (numeric) objects coincidentally is only a single number, we call that a *scalar*.
scalar_1 = 45.2```
scalar_2 = 0```
scalar_3 = -12```
A collection of scalars in a single row, or column, is very much like a list
in regular Python. This collection we then call a *vector*.
list_1 = [1, 2, 6, -2, 0]```
list_2 = [0, 0, 0, 0, 0, 0, 0, 0]```
list_3 = [254.2, 501, 368.4, 697, 476.5, 188.2, 525.6, 451, 514]```
We say this collection has a single dimension: a single row of numbers, or a single column of numbers. If there coincidentally is 1 number in the collection, we simply call that a scalar. But in theory we can store as many numbers as we like in our vector.
Think, for example, the impeller speed of a batch reactor, measured every minute, during the duration of a batch. This 1-dimensional sequence is called a vector.
If we take several 1-dimensional vectors, but each one of the same length, and put them together, side-by-side then we get a *matrix*.
matrix_1 = [ [1, 2, 6, -2], [4, 3, 2, 1] ] # has 2 rows and 4 columns```
matrix_2 = [ [0, 0, 0], [0, 0, 0], [0, 0, 0] ] # has 3 rows and 3 columns```
matrix_3 = [ [9, 8, 7, 6], [5, 4, 4, 3] ] # also has 2 rows and 4 columns```
You could crudely store, as we showed above, a matrix by using a list of lists, where the main list (the outside list) contains objects which themselves are lists. This is perfectly valid in Python: remember that a list can contain objects of any type, including other lists. But while this "list-of-lists" approach can store your data, it would not be great for calculations.
Try this: (the result is complely unintuitive for mathematical operations)
matrix_1 + matrix_3
matrix_3 + 7
Another point to note is that a vector is simply a matrix, but where one of the dimensions is equal to 1: either 1 row, or 1 column.
Matrices are widely used in engineering and data analysis. Often each row is an object, or a sample, or an observation. And each column represents some sort of value measured on that object or sample. For example:
Measurement 1 | Measurement 2 | Measurement 3 | Measurement 4 | |
---|---|---|---|---|
Sample 1 | 5.5 | 0.55 | -23.4 | 561522.2 |
Sample 2 | 6.7 | 0.44 | -22.2 | 526616.4 |
Sample 3 | 4.9 | 0.61 | -38.1 | 612515.7 |
If we take several 2-dimensional matrices, but each one with the same number of rows and columns, and put them together, then we get a *3-dimensional array*.
A matrix was a list-of-lists. We can go up to a third dimension and make a list-of-lists-of-lists.
Why stop there? We can go to higher and higher dimensions. We use a general names for such a collection of (numeric) objects: an *array*.
An array is an n-dimensional structure of numbers. You can therefore say:
For example, a 3-dimensional array here shows data collected in a lab: we are performing the experiment several times (N
, the layers - each layer is a matrix actually - that lies on top of each other).
In each experiment we collect a matrix of data from several sensors. There are K
sensors. We set the sensors to collect data on a regular interval, once every 3 seconds, for example, so that we end up with exactly the same number of samples per sensor, J
values per sensor.
Storing the data like this is useful, because now you could perform calculations on all experiments over all time, for all sensors in array X.
For example: you can calculate the average in the direction of arrow J
, to reduce the array to a matrix. That matrix would be the average value of the sensor for the experiments. That reduced matrix would have N
rows and K
columns.
Engineering applications benefit from using vectors, or matrices or arrays: they are sequences of data all of the same type. Arrays behave a lot like lists in Python, except for the constraint that all elements have the same type.
There is an important Python library in science and engineering, called NumPy,
that provides support for n-dimensional array data structures (a.k.a, ndarray
).
Later on we will learn about the library called pandas
(Python Data Analysis Library), which is better suited than NumPy for many situations. But underneath each pandas dataframe (we will define that term later), exists a NumPy array. So understanding NumPy is key to understanding pandas. Learning NumPy is also an easy step for people coming from MATLAB.
Let us import NumPy and get started.
First, a word on importing libraries to expand your running Python session. Because libraries are large collections of code and are for special purposes, they are not loaded automatically when you launch Python (or IPython, or Jupyter). You have to import a library using the import
command. For example, to import NumPy, you can enter:
import numpy
Once you execute that command, you can call any NumPy function using the dot notation, prepending the library name. For example, some commonly used functions are:
Part of the community effort of creating the Python libraries, is also an effort at maintaining excellent documentation.
Click and read one of those links to explore the documentation - the pages each have the same layout, so once you know where to look, you can quickly search and refer to the documentation for other functions.
Also try:
dir(numpy)
. Do you remember what thedir(...)
function does?
The dir(...)
function applies to any *object* in Python, and numpy
here, once imported, is also an object.
What *type* is
numpy
?
You will find a lot of source code that uses a different syntax for importing. Most often you will see:
import numpy as np
All this does is create an alias for numpy
with the shorter string np
, so you then would call a NumPy function like this: np.linspace()
instead of the lengthier numpy.linspace()
.
This is just an alternative way of doing it. It is arguably better that you are explicit (using the full numpy.
), but practicality, code reuse, and screen real-estate often dictate that people write it simply as np
. Both are fine.
import numpy
import numpy as np # both do the same
To create a NumPy array from an existing Python list
of numbers, we use numpy.array()
, like this:
my_list = [3, 4, 7, -2, 11]
np.array(my_list)
# or more compactly, without the intermediate variable:
np.array([3, 4, 7, -2, 11])
Try it yourself:
Create an array of 11 numbers below, some negative, some positive, some integers, some floating point
# Create a vector of 11 numbers
import numpy as np
eleven = np.array([ ... ])
print(eleven)
print(len(eleven)) # verify the length
Python allows you to create lists of mixed types, for example, strings, floating point, integers, etc. What happens if you try creating a NumPy array from a mixed list of object types?
*What happens?*
In this list there are 3 objects, of 3 different types. Try running the code below to verify:
my_list = ['abc', 123, 456.7]
np.array(my_list)
NumPy offers many ways to create arrays. Also read this overview.
- Scroll through the first link above to see just how many ways there are.
- One of the simplest vectors we can create is a vector of just ones (1's). Try the
numpy.ones()
command below. We must tell NumPy how many array elements we would like.
# To try: change the '5' to some other integer number
import numpy as np
np.ones(shape=5) # Using the explicit function call
np.ones(5) # often we use this shortcut instead
There is also a command to create a vector of zeros:
np.zeros(shape=3)
np.zeros(3)
Here you see that Python functions can be called by specifying the function input name: in this example the single input
shape
is specified innp.zeros(shape=...)
.
For this we use the .ones()
or .zeros()
command, but we just specify the shape
argument to differently. Instead of an integer, we provide a tuple.
twoD = np.ones(shape=(5,7))
print(twoD)
# Verify that the shape is what you expect:
print(twoD.shape)
print('------------')
naughts = np.zeros((5,7))
print(naughts)
print(type(naughts)) # you have now created an object with type `numpy.ndarray`
Every NumPy array can be queried using the .shape
attribute. That means, add .shape
to the array, and you will ask Python to return the attribute of that array called shape
.
Why stop at two-dimensions? Create a 3-dimensional array with 2 rows, 3 columns and 4 layers: in other words a $2 \times 3 \times 4$ array.
Just adjust the tuple
provided to the shape
argument:
threeD = np.zeros(shape=(2,3,4))
print(threeD)
print(threeD.shape)
Is this what you expected to see? You might have to imagine the 3rd dimension going in and out of the screen.
Try to create a matrix with 4 rows and 5 columns, where every value in the matrix is the number 8. Do this by making a matrix of only
.ones()
and multiplying that matrix by the value of 8.Now do the same thing, using the
np.full
command. If you need help, please see the Numpy documentation for thenp.full
command .
# Step 1:
eights = np.ones( ___ ) * ___
print(eights)
# Step 2:
eight_again = np.full(shape=___, fill_value=___)
print(eight_again)
You have created vectors, matrices and arrays. These have a specific .shape
attribute that you can check.
There is are several attributes of interest, but one that you will find useful is the .ndim
(the number of dimensions). Try it on one of your prior arrays.
These objects are of the type numpy.ndarray
: an n-dimensional array.
In this section we will look at creating arrays, particularly matrices, in an efficient manner.
[0, 1, 2, 3, 4, ..., 9]
In the next section we will look at each one of these.
A square matrix with 1's on the diagonal and zeros everywhere else is known as an identity matrix. For example a $4\times 4$ identity matrix is: $$I_4 = \begin{bmatrix} 1 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 \\ 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 1 \end{bmatrix}$$
import numpy as np
# Read the help text for the `identity` function:
help(np.identity)
id5 = np.identity(n=5)
print(id5)
A similar function to np.identity(...)
is np.eye(...)
. It is a play on words, where eye
refers to the uppercase letter $I$. The above above $4\times 4$ matrix is often written as $I_4$ in mathematical notation.
Try the following, to see what they produce:
also_id5 = np.eye(5)
print(also_id5)
print('-----')
yet_again = np.eye(5, 5)
print(yet_again)
print('-----')
another_id5 = np.eye(5, 5, 0) # start the 1's in the 0th position (i.e. row 1 and column 1)
print(another_id5)
print('-----')
# What if we want diagonal ones, but not on the main diagonal,
# but starting in the first row and third column rather?
print(np.eye(5, 5, 2))
After the above, can you explain the difference between
np.identity()
andnp.eye()
?
For simulations it is often helpful to create and use arrays of random values. Each value might be a starting position or state. Or sometimes you just want to test a piece of code, not only with 1's and 0's, but any random values.
For this it is helpful to create arrays of any shape, filled with random values:
import numpy as np
# Random floats between 0 (included) and 1 (not included)
rnd_matrix = np.random.random(size=(4,3))
print(rnd_matrix)
# Or try a multi-dimensional array
rnd_array = np.random.random(size=(4, 2, 3))
print(rnd_array)
Sometimes we want random integers though, between some low
er and upper (high
) bounds. The random values may include the low
values, but will be till just under the high
value specified.
# Run this code a few times to verify that you get -3, but never a +7
rnd_int = np.random.randint(low=-3, high=7, size=(4, 5))
print(rnd_int)
Vectors containing a sequence, such as [0, 1, 2, ... 9]
or [2, 4, 6, 8, ... 12]
are often used as a starting point for calculations. To create these we use the numpy.arange()
and numpy.linspace()
commands.
Syntax:
numpy.arange(start, stop, step)
start
by default is zerostop
is not inclusive (in other words, NumPy will stop just before this value), andstep
has a default value of 1.As mentioned above, Python functions can be called by specifying the input arguments (start
and stop
and step
are the argument names).
Try it out below:
import numpy as np
np.arange(4)
# We could have also written, but you will
# agree that this is unnecessary, as the defaults
# are already good enough. But this is explicit:
np.arange(start=0, stop=4, step=1)
np.arange(start=2, stop=6, step=1)
# Leave `step` unspecified if it is just "1"
np.arange(start=2, stop=6)
# Most common usage: leave all arguments unspecified
np.arange(2, 6)
# Jump in steps of 2
np.arange(start=2, stop=9, step=2)
np.arange(2, 9, 2)
We saw the built-in Python
range
function in an earlier module. So what is the difference between the NumPy library'snp.arange
function and the built-inrange
function?
- Try replacing
np.arange(...)
withrange
and see what differences you notice.- Try using
np.arange(...)
, but step in increments of 0.5, or 0.33333 instead. Note that you cannot do this with therange(...)
function.- Create a sequence of values starting at $-4$ and ending just below $+4$, in steps of $1$
- Create a sequence of values starting at $-2$ and ending just below $+2$, in steps of $0.5$. How many elements are in the sequence? Remember the
len
function? What about the.shape
attribute?- Start at $+2$ and step *down* in increments of $0.25$, until just before $-2$. How many elements are in the sequence?
There is also the np.linspace()
command, which is similar to np.arange()
. The differences are:
stop
value *is included* by default, but it can be removed.It returns an array with evenly spaced numbers over the specified interval.
Syntax:
np.linspace(start, stop, num)
where the default value of num=50
. Type help(np.linspace)
to see how you can either include or exclude the endpoint.
- Confirm that you indeed get a sequence of 50 values when you do not specify
num
. Also confirm that thestop
value is the last value in the vector.- Try to get a vector with fewer elements, say 6, instead of 50.
- Go backwards again: create a sequence where the numbers decrease in value.
One you have a sequence of numbers in a long vector, you might want to fold them up in a matrix, or an multi-dimensional array.
Use the reshape
function of a NumPy array to do that.
vector = np.arange(12)
matrix = vector.reshape((3, 4))
Note the order! NumPy will first fill each row, so the first row will be [0, 1, 2, 3]
and then the next row will be [4, 5, 6, 7]
, and so on.
Try it:
vector = np.arange(12)
print('This is a vector with a shape of: ' + str(vector.shape))
matrix = vector.reshape((4, 3))
matrix = vector.reshape((2, 6))
print('This is a matrix with a shape of: ' + str(matrix.shape))
matrix = vector.reshape((4, 4)) # intentional error
Above we have created vectors, matrices and arrays in all sorts of formats. With ones, zeros, diagonals, random numbers, and sequences of numbers.
Next it is time to put these to use, and perform calculations on them. This is in the next module, module 5.
*Feedback and comments about this worksheet?* Please provide any anonymous comments, feedback and tips.
# IGNORE this. Execute this cell to load the notebook's style sheet.
from IPython.core.display import HTML
css_file = './images/style.css'
HTML(open(css_file, "r").read())