# Using Python

You're computer scientists, so you know how to code &mdash; and Python is so intuitive that you can just about pick it all up by looking at example code. This notebook is a quick review of standard Python syntax. The only distinctive bit is section 3.5 on Comprehensions, and section 4.1 on Functions. For the rest, please just skim through, and then try the (unassessed) warmup exercises in [ex0](ex0.html).

#### Contents

* [1. A first session](#1.-A-first-session)
* [2. Basic Python expressions](#2.-Basic-Python-expressions)
  * [2.1 MATHS AND LOGIC](#2.1-MATHS-AND-LOGIC)
  * [2.2 STRINGS AND FORMATTING](#2.2-STRINGS-AND-FORMATTING)
* [3 Collections and control flow](#3-Collections-and-control-flow)
  * [3.1 LISTS AND TUPLES](#3.1-LISTS-AND-TUPLES)
  * [3.2 SLICING](#3.2-SLICING)
  * [3.3 DICTIONARIES](#3.3-DICTIONARIES)
  * [3.4 CONTROL FLOW](#3.4-CONTROL-FLOW)
  * [3.5 COMPREHENSIONS](#3.5-COMPREHENSIONS)
* [4 Python as a programming language](#4-Python-as-a-programming-language)
  * [4.1 FUNCTIONS AND FUNCTIONAL PROGRAMMING](#4.1-FUNCTIONS-AND-FUNCTIONAL-PROGRAMMING)
  * [4.2 GENERATORS](#4.2-GENERATORS)
  * [4.3 NONE AND MAYBE, AND ENUMERATION TYPES](#4.3-NONE-AND-MAYBE,-AND-ENUMERATION-TYPES)
  * [4.4 DYNAMIC TYPING](#4.4-DYNAMIC-TYPING)
  * [4.5 OBJECT-ORIENTED PROGRAMMING](#4.5-OBJECT-ORIENTED-PROGRAMMING)

## 1. A first session

We can use Python interactively like a calculator. Here are some simple expressions and their values.
Try entering these yourself, in your own notebook, then press shift+enter or choose Cell | Run Cells
from the menu.

In [None]:
3+8

In [None]:
1.618 * 1e5

In [None]:
x = 3
y = 2.2
z = 1
x * y + z

If we want to type in a very long line, we can split it using a backslash.

In [None]:
"Perhaps the immobility of the things that surround us is forced " \
+ "upon them by our conviction that they are themselves, and not " \
+ "anything else, and by the immobility of our conceptions of them. "

Jupyter will only show the output from the last expression in a cell. If we want to see multiple values, print them explicitly.
Alternatively, let the last expression be a tuple.

In [None]:
print(x * y + z)
print(x * (y + z))

"A tuple of results:", x*y+z, x*(y+z)

Python does its best to print out helpful error messages. When something goes wrong, look first at
the last line of the error message to see what type of error it was, then look back up to see where it
happened. If your code isn't working and you ask for help in the Q&A forum on Moodle, please include the error message!

In [None]:
x = 'hello'
y = x + 5
y

<pre style="color:red">
      1 x = 'hello'
----> 2 y = x + 5
      3 y

TypeError: can only concatenate str (not "int") to str
</pre>

## 2. Basic Python expressions

### 2.1 MATHS AND LOGIC

All the usual mathematical operators work, though watch out for division which uses different syntax to Java.

In [None]:
7 / 3                                 # floating point division
7 // 3                                # integer division (rounds down)
min(3,4), max(3,4), abs(-10)
round(7.4), round(-7.4), round(3.4567, 2)
3**2                                  # power
5 <<1, 5 >> 2                         # bitwise shifting
7 & 1, 6 | 1                          # bitwise operations
(3+4j).real, (3+4j).imag, abs(3+4j)   # complex numbers

The usual logical operators work too, though the syntax is wordier than other languages. Python's truth values are `True` and `False`.

In [None]:
3**2 + 4**2 == 5**2                   # use == to test if values are equal
(x,y,z) = (5, 12, False)
x < y or y < 10                       # precedence: (x < y) or (y < 10)
x < y and not y < 15                  # precendence@ (x < y) and (not (y < 15))
(x == y) == z
'lower' if x < y else 'higher'        # same as Java's (x < y) ? 'lower' : 'higher' 

Some useful maths functions are found in the `maths` module. To use them you need to run 
`import math`. (It’s common to put your import statements at the top of the notebook, as they only need to be
run once per session, but they can actually appear anywhere.)

In [None]:
import math
math.floor(-3.4), math.ceil(-3.4)
math.pow(9, 0.5), math.sqrt(9)
math.exp(2), math.log(math.e), math.log(101, 10)
math.sin(math.pi*1.3), math.atan2(3,4)
import cmath                         # for functions on complex numbers
cmath.sqrt(-9)
cmath.exp(math.pi * 1j) + 1
import random                        # for generating random numbers
random.random(), random.random()

### 2.2 STRINGS AND FORMATTING

Python strings can be enclosed by either single quotes or double quotes. Strings (like everything else
in Python) are objects, and they have methods for various string-processing tasks. See the
[String Methods documentation](https://docs.python.org/3/library/stdtypes.html#string-methods) for a full list.

In [None]:
"shout".upper()                      # "SHOUT"
"hitchhiker".replace('hi', 'ma')     # "matchmaker"
'i' in 'team'                        # False
x = '''
Also, a multi-line string can be
entered with triple-quotes.
'''

A handy way to splice values into strings is with f-strings, i.e. strings with `f` before the opening quote. Each chunk of the string enclosed in {⋅} is evaluated, and the result is spliced back into the string. The chunk can also specify the output format. The [documentation](https://docs.python.org/3/reference/lexical_analysis.html#f-strings) describes more format specifiers.

In [None]:
name,age = 'Zaphod', 27
f"My name is {name} and I will be {age+1} next year"

f"The value of π to 3 significant figures is {math.pi:.3}"

If you do any serious data processing in Python, you will likely find yourself needing [regular expressions](https://docs.python.org/3/library/re.html).
The supplementary notebooks show how to use regular expressions for data cleanup.

In [None]:
import re
s = 'In 2024 there will be an election'
re.search(r'(\d+)', s)[0]                # '2024'
re.sub(r'a(n?) (\w+)ion', 'calamity', s) # 'In 2019 there will be calamity'

## 3 Collections and control flow

Python has four common types for storing collections of values: tuples, lists, dictionaries, and sets.
In IA courses on OCaml and Java we learnt about lists versus arrays. In those courses, and in
IA Algorithms, we study the efficiency of various implementation choices. In Python, you shouldn’t
think about these things, at least not in the first instance. The Pythonic style is to just go ahead and
code, and only worry about efficiency after we have working code. As the famous computer scientist
Donald Knuth said,

> Programmers waste enormous amounts of time thinking about, or worrying about, the
> speed of noncritical parts of their programs, and these attempts at efficiency actually have
> a strong negative impact when debugging and maintenance are considered. We should
> forget about small efficiencies, say about 97% of the time: premature optimization is the
> root of all evil. Yet we should not pass up our opportunities in that critical 3%.

Only when we have special requirements should we switch to a dedicated collection type, such as a
[deque](https://docs.python.org/3/library/collections.html#collections.deque) or a [heap](https://docs.python.org/3/library/heapq.html) or the specialized numerical types we’ll learn about in section 2.

### 3.1 LISTS AND TUPLES

Python lists and Python tuples are both used to store sequences of elements. They both support iterating
over the elements, concatenation, random access, and so on. They’re a bit like lists, and a bit like
arrays.

In [None]:
a = [1, 2, 'buckle my shoe']    # a list
b = (3, 4, 'knock at the door') # a tuple
len(a), len(b)
a[0], a[1], b[2]                # indexes start at 0
a[-1], a[-2]                    # negative indexes count from the end
3 in a, 3 in b                  # is this item contained in the collection?
a + list(b)                     # ℓ1+ℓ2 concatenates two lists
tuple(a) + b                    # t1+t2 concatenates two tuples
list(zip(a,b))                  # zip(ℓ1,ℓ2) gives [(ℓ1[0],ℓ2[0]), (ℓ1[1],ℓ2[1]), ...]

As you see, both lists and tuples can hold mixed types, including other lists or tuples. You can convert
a list to a tuple and vice versa, and extract elements. The difference is that lists are mutable, whereas tuples are immutable

In [None]:
a[0] = 5
a.append('then')
a.extend(b)
a                              # [5, 2, 'buckle my shoe', 'then', 3, 4, 'knock at the door']

b[0] = 5

<pre style="color:red">
----> 1 b[0] = 5

TypeError: 'tuple' object does not support item assignment
</pre>

To sort a list, we have a choice between sorting in-place or returning a new sorted list without changing
the original.

In [None]:
names = ['bethe', 'alpher', 'gamov']
sorted(names)                 # ['alpher', 'bethe', 'gamov'], returns a new list
names                         # ['bethe', 'alpher', 'gamov'], unchanged from before
names.sort()
names                         # ['alpher', 'bethe', 'gamov'], sorted in-place

Another common operation is to concatenate a list of strings. Python’s syntax for this is unusual:

In [None]:
', '.join(names) + ' wrote a famous paper on nuclear physics'

### 3.2 SLICING <a name="slice">

We can pick out subsequences using the slice notation, `x[start:end:sep]`.

In [None]:
x = list(range(10)) # [0,1,2,3,4,5,6,7,8,9]
x[1:3]     # start is inclusive and end is exclusive, so x[1:3] == [x[1],x[2]]
x[:2]      # first two elements
x[2:]      # everything after the first two
x[-3:]     # last three elements
x[:-3]     # everything prior to the last three
x[::4]     # every fourth element

We can assign into slices.

In [None]:
x[::4] = [None, None, None]

### 3.3 DICTIONARIES

The other very useful data type is the dictionary, what Java calls a Map or HashMap.

In [None]:
room_alloc = {'Adrian': None, 'Laura': 32, 'John': 31}
room_alloc['Guarav'] = 19     # add or update an item
del room_alloc['John']        # remove an item
room_alloc['Laura']           # get an item
room_alloc.get('Alexis', 1)   # get item if it exists, else default to 1
'Alexis' in room_alloc        # does this dictionary contain the key 'Alexis'?

To iterate over items in a dictionary, see the next example …

### 3.4 CONTROL FLOW

Python supports the usual control flow statements: `for`, `while`, `continue`, `break`, `if`, `else`.

To iterate over items in a list,
```python
for item in list:
    …  # do something with item
```

To iterate over items and their positions in the list together,

In [None]:
for i, name in enumerate(['bethe', 'alpher', 'gamov']):
    print(f"Person {name} is in position {i}")

To just do something a number of times, if we don't care about the index, it's conventional to call the loop variable `_`.

In [None]:
x = 2
for _ in range(5):
    x *= 2

To iterate over two lists simultaneously, `zip` them.

In [None]:
for x,y in zip(['apple','orange','grape'], ['cheddar','wensleydale','brie']):
    print(f"{x} goes with {y}")

We can also iterate over (key,value) pairs in a dictionary. Suppose we're given a dictionary of room allocations and we want to find the occupants of each room.

In [None]:
room_alloc = {'adrian': 10, 'chloe': 5, 'guarav': 10, 'shay': 11,
              'alexis': 11, 'rebecca': 10, 'zubin': 5}

occupants = {}
for name, room in room_alloc.items(): # iterate over keys and values
    if room not in occupants:
        occupants[room] = []
    occupants[room].append(name)

for room, occupants_here in occupants.items():
    ns = ', '.join(occupants_here)
    print(f'Room {room} has {ns}')

### 3.5 COMPREHENSIONS <a name="listcomprehension">

Python has a distinctive piece of syntax called a comprehension for creating lists. It’s a very common
pattern to write code that transforms lists, e.g.
```python
ℓ = ... # start with some list [ℓ0, ℓ1, . . . ]
f = ... # some function we want to apply, to get [f(ℓ0), f(ℓ1), . . . ]
res = []
for i in range(len(ℓ)):
    x = ℓ[i]
    y = f(x)
    res.append(y)
```
This is so common that Python has special syntax for it,
```python
res = [f(x) for x in ℓ]
```
There’s also a version which only keeps elements that meet a criterion,
```python
res = [f(x) for x in ℓ if t]
```
Here's a concrete example:

In [None]:
xs = range(10)
[x**2 for x in xs if x % 2 == 0]

## 4 Python as a programming language

This section of the notes is to compare and contrast the Python language to what you have learnt in the
courses so far using OCaml and Java. This section of the course is here for your general interest, and
it’s not needed for the Scientific Computing course, apart from section 1.4.1 on defining functions.

The development of the Python language is documented in [_Python Enhancement Proposals_
(PEPs)](https://www.python.org/dev/peps/). Significant changes in the language, or in the standard libraries, are discussed in mailing lists
and written up for review as a PEP. They typically suggest several ways a feature might be implemented,
and give the reason for choosing one of them. If consensus is not reached to accept the PEP, then the
reasons for its rejection are also documented. They are fascinating reading if you are interested in
programming language design.

### 4.1 FUNCTIONS AND FUNCTIONAL PROGRAMMING

The code snippet below shows how we define a function in Python. There are several things to note:

* The function is defined with a default argument, `c=0`. You can invoke it by either `roots(2,3,1)`
or `roots(2,3)`.

* Functions can be called with named arguments, `roots(b=3, a=2)`, in which case they can be
provided in any order.

In scientific computing, we’ll come across many functions that accept 10 or more arguments, all of
them with sensible defaults, and typically we’ll only specify a few of the arguments. This is why
defaulting and named arguments are so useful.

In [None]:
import math

def roots(a, b, c=0):
    """Return a list with the real roots of c*(x**2) + b*x + a == 0"""
    if b == 0 and c == 0:
        raise Exception("This polynomial is constant")
    if c == 0:
        return [-a/b]
    elif a == 0:
        return [0] + roots(b=c, a=b)
    else:
        discr = b**2 - 4*c*a
        if discr < 0:
            return []
        else:
            return [(-b+s*math.sqrt(discr))/2/c for s in [-1,1]]

Some more notes:
    
* This function either returns a value, or it throws an exception i.e. generates an error message
and finishes. If your function finishes without an explicit return statement, it will return None.
Unlike Java, it’s possible for different branches of your function to return values of different
types — at risk to your sanity.

* This function returns a single variable, namely a list. If you want to return several variables,
return them in a tuple, and unpack the tuple using multiple assignment as shown in section 1.1.

* It’s conventional to document your function by providing a documentation string as the first line.
You can see help for a function with ?. If we run `?roots` we’re shown
```
Signature: roots(a, b, c=0)
Docstring: Return a list with the real roots of c*(x**2) + b*x + a == 0
File: /path_to_notebook/<ipython-input-53-6cf3a0af9585>
Type: function
```

In Python as in OCaml, functions can be returned as results, assigned, put into lists, passed as arguments to other functions, and so on.

In [None]:
import random

def noisifier(σ):
    def add_noise(x):
        return x + random.uniform(-σ, σ)
    return add_noise

fs = [noisifier(σ) for σ in [0.1, 1, 5]]
[f(1.5) for f in fs]

In this example above, `noisifier` is a function that returns another function. The inner function ‘remembers’
the value of σ under which it was defined; this is known as a closure.

We can use `lambda` to define anonymous functions, i.e. functions without names. This often used to
fill in arguments.

In [None]:
def illustrate_func(f, xs):
    for x in xs:
        print(f"f({x}) = {f(x)}")

illustrate_func(lambda b: roots(1,b,2), xs = range(5))

### 4.2 GENERATORS

A generator (or lazy list, or sequence) is a list where the elements are only computed on demand. This
lets us implement infinite sequences. In Python, we can create them by defining a function that uses
the yield statement:

In [None]:
def fib():
    x,y = 1,1
    while True:
        yield x
        x,y = (y, x+y)

fibs = fib()
[next(fibs) for _ in range(10)]

When we call `next(fibs)`, the fib code runs through until it reaches the next `yield` statement, then it
emits a value and pauses. Think of `fibs` as an execution pointer and a call stack: it remembers where
it is inside the `fib` function, and calling next tells it to resume executing until the next time it hits `yield`.

We can also transform generators using syntax a bit like list comprehension:

In [None]:
even_fibs = (x for x in fib() if x % 2 == 0)
[next(even_fibs) for _ in range(10)]

### 4.3 NONE AND MAYBE, AND ENUMERATION TYPES

It’s often handy for functions to be able to return either a value, or a marker that there is no value.
For example, `head(list)` should return a value unless the list is empty in which case there’s nothing to
return. A common pattern in a language like OCaml is to have a datatype that explicitly supports this,
for example we’d define `head` to return an enumeration datatype 
`None | Some[’a]`. This forces everyone who uses head to check whether or not the answer is `None`.

In Python, the return type of a function isn’t constrained. It’s a common convention to return
`None` if you have nothing to return, and a value otherwise, and to trust that the person who called you
will do the appropriate checks.

Enumeration types are also used for type restriction, e.g. to limit what can be placed in a list.
When we actually do want to achieve this, Python isn’t much help. It does have an add-on [library for
enumeration types](https://docs.python.org/3/library/enum.html) but it’s a lot of work for little benefit.

One situation where enumeration types are very useful is when working with categorical values
in data. When working with data, the levels of the enumeration are decided at runtime (by the contents
of the data we load in), so pre-declared types are no use anyway.

### 4.4 DYNAMIC TYPING

Python uses dynamic typing, which means that values are tagged with their types during execution
and checked only then. To illustrate, consider the functions
```python
def double_items(xs):
    return [x*2 for x in xs]
def goodfunc():
    return double_items([1,2,[3,4]]) + double_items("hello world")
def badfunc():
    return double_items(10)
```
We won’t be told of any errors until `badfunc()` is invoked, even though it’s clear when we define it that
badfunc will fail.

Python programmers are encouraged to use _duck typing_, which means that you should test values
for what they can do rather than what they’re tagged as. “If it walks like a duck, and it quacks like a
duck, then it’s a duck”. In this example, `double_items(xs)` iterates through `xs` and applies `*2` to every
element, so it should apply to any `xs` that supports iteration and whose elements all support `*2`. These
operations mean different things to different types: iterating over a list returns its elements, while
iterating over a string returns its characters; doubling a number is an arithmetical operation, doubling
a string or list repeats it. Python does allow you to test the type of a value with e.g. 
`if isinstance(x, list): ...`, but programmers are encouraged not to do this.

Python’s philosophy is that library designers are providing a service, and programmers are
adults. If a library function uses comparison and addition, and if the end-user programmer invents
a new class that supports comparison and addition, then why on earth shouldn’t the programmer be
allowed to use the library function? (I’ve found this useful for simulators: I replaced ‘numerical
timestamp’ with ‘rich timestamp class that supports auditing, listing which events depended on which
other events’, and I didn’t have to change a single line of the simulator body.) Some statically typed
languages like Haskell and Scala support this via dynamic type classes, but their syntax is rather heavy.

To make duck typing useful, Python has a long list of [special method names](https://docs.python.org/3/reference/datamodel.html#special-method-names) so that you can
create custom classes supporting the same operations as numbers, or as lists, or as dictionaries. 
For
example, if you define a new class with the method [`__iter__`](https://docs.python.org/3/reference/datamodel.html#object.__iter__) then your new class can be iterated
over just like a list. (The special methods are sometimes called _dunder methods_, for "double underline".)

**Example: trees.** Suppose we want to define a tree whose leaves are integers and whose branches can
have an arbitrary number of children. Actually, in Python, there’s nothing to define: we can just start
using it, using a list to denote a branch node.

In [None]:
x = [1,[[2,4,3],9],[5,[6,7],8]]

To flatten a list like this we can use duck typing: given a node `n`, try to iterate over its children, and if
this fails then the node must be a leaf so just return `[n]`.

In [None]:
def flatten(n):
    try:
        return [y for child in n for y in flatten(child)]
    except TypeError as e:
        return [n]

flatten(x)

This would work perfectly well for trees containing arbitrary types — unless the end-user programmer
puts in leaves which are themselves iterable, in which case the duck typing test doesn’t work — unless
that is the user’s intent all along, to be able to attach new custom sub-branches …

A solution is to define a custom class for branch nodes, and use `isinstance` to test each element
to see if it’s a branch node. This is not very different to the OCaml solution, which is to declare nodes
to be of type ‘either leaf or branch’ — except that Python would still allow leaves of arbitrary mixed
type.

### 4.5 OBJECT-ORIENTED PROGRAMMING

Python is an object-oriented programming language. Every value is an object. You can see the class
of an object by calling `type(x)`. For example,

In [None]:
x = 10
type(x)   # reports int
dir(x)    # gives a list of x’s methods and attributes

It supports inheritance and multiple inheritance, and static methods, and class variables, and so on. It
doesn’t support interfaces, because they don’t make sense in a duck typing language.

Here’s a quick look at a Python object, and at how it might be used for the flatten function earlier.

In [57]:
class Branch(object):
    def __init__(self, children):
        self.children = children

def flatten(n):
    if isinstance(n, Branch):
        return [y for child in n.children for y in flatten(child)]
    else:
        return [n]

x = Branch([10,Branch([3,2]),"hello"])
flatten(x)

[10, 3, 2, 'hello']

Every method takes as its first argument a variable referring to the current object, `this` in Java. Python
doesn’t support private and protected access modifiers, except by convention: the convention is that
attributes and functions whose name beings with an underscore are considered private, and may be
changed in future versions of the library.

The next lines of code are surprising. You can ‘monkey patch’ an object, after it has been created,
to change its attributes or give it new attributes. Like so many language features in Python, this is
sometimes tremendously handy, and sometimes the source of infuriating bugs.

In [58]:
y = Branch([])
y.my_label = "added an attribute"