{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Lecture 3: Flow control" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This notebook will introduce ways of controlling the flow of execution in a program, including:\n", "- if / elif / else blocks\n", "- for loops\n", "- while loops\n", "- vectorized computations\n", "- functions" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2023-06-16T18:27:24.825259Z", "start_time": "2023-06-16T18:27:24.499714Z" } }, "outputs": [], "source": [ "# Import the usual stuff first\n", "import numpy as np\n", "import pandas as pd\n", "import matplotlib.pyplot as plt" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## `if` statements" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2023-06-16T18:27:25.273182Z", "start_time": "2023-06-16T18:27:25.265101Z" } }, "outputs": [], "source": [ "# If blocks allow blocks of code to be executed only under specific conditions.\n", "x = 5\n", "y = 4\n", "\n", "if x==y:\n", " print('In block 1')\n", " print('They are equal!')\n", " \n", "elif x>y:\n", " print('In block 2')\n", " print('x is more than y')\n", "\n", "else:\n", " print('In block 3')\n", " print('y is more than x')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Note the indentation within each code block. Code within the same block must have the same indentation level, since this is how Python code blocks. Although the amount of indentation doesn't actually matter, you should adhere to the [PEP 8](https://www.python.org/dev/peps/pep-0008/) standard that code blocks be indented with **4 spaces**, not with tabs. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## `for` loops" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In some fundamental sense, \"loops\" are what make a program a program. The most common type of loop in data analysis is the `for` loop." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2022-08-29T12:45:35.476477Z", "start_time": "2022-08-29T12:45:35.471954Z" } }, "outputs": [], "source": [ "# A for loop executes a code block once for each value in a collection of values that you specify\n", "\n", "# Define list of numbers\n", "num_list = list(range(10))\n", "print('num_list = ', num_list)\n", "\n", "# Print these numbers one by one\n", "for n in num_list:\n", " print('->', n)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2022-08-29T12:45:36.471071Z", "start_time": "2022-08-29T12:45:36.467704Z" } }, "outputs": [], "source": [ "# For loops can loop over any \"iterable\" object, such as a string.\n", "name = 'McClintock'\n", "for c in name:\n", " print('->', c)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2022-08-29T12:45:37.190546Z", "start_time": "2022-08-29T12:45:37.186513Z" } }, "outputs": [], "source": [ "# If we want to fill a list with content, we can write a for loop:\n", "tedious_list = []\n", "for c in name:\n", " tedious_list.append(c)\n", "print('tedious_list =', tedious_list)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2022-08-29T12:45:37.706483Z", "start_time": "2022-08-29T12:45:37.702673Z" } }, "outputs": [], "source": [ "# Alternatively, we can use a \"list comprehension\" to do this in one line\n", "simple_list = [c for c in name]\n", "print('simple_list = ', simple_list)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2022-08-29T12:45:38.277933Z", "start_time": "2022-08-29T12:45:38.273049Z" } }, "outputs": [], "source": [ "# If we want to use the index of each element in a list, we can create a counter\n", "# that is incremented in each run through the code block\n", "i=0\n", "for c in name:\n", " print(f'name[{i}] = {c}')\n", " i += 1" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2022-08-29T12:45:38.771328Z", "start_time": "2022-08-29T12:45:38.766994Z" } }, "outputs": [], "source": [ "# Alternatively, we can use enumerate(), which takes any iterable as input\n", "# and outputs a an iterable of index-value pairs. This will assign values \n", "# to TWO variables in each pass of the loop.\n", "# This strategy is said to be more \"Pythonic\"\n", "for i, c in enumerate(name):\n", " print(f'name[{i}] = {c}')" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2022-08-29T12:45:39.319929Z", "start_time": "2022-08-29T12:45:39.315910Z" } }, "outputs": [], "source": [ "# Range is the Pythonic way of iterating over consecutive integers\n", "for x in range(10):\n", " print(x)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2022-08-29T12:45:39.876884Z", "start_time": "2022-08-29T12:45:39.867959Z" } }, "outputs": [], "source": [ "# In Python, many functions that would seem to return a list actually return \n", "# an \"iterator\", where each subsequent element is computed on the fly. This is\n", "# fine for loops, which execute sequentially. But if you want a list instead instead\n", "# of an iterator, you must \"cast\" that iterator as a list. \n", "\n", "v_iter = range(10)\n", "print('iterator:', range(10))\n", "\n", "v_list = list(range(10))\n", "print('list: ', v_list, '\\n')\n", "\n", "e_iter = enumerate(name)\n", "print('iterator:', e_iter)\n", "\n", "e_list = list(enumerate(name))\n", "print('list: ', e_list, '\\n')\n", "\n", "d = {'A':'T', 'C':'G', 'G':'C', 'T':'A'}\n", "print('iterator:', d.keys())\n", "print('list: ', list(d.keys()))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## `while` loops" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The ``while`` loop keeps going as long as the argument it is passed evaluates to \"True\"." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2022-08-29T12:45:41.494943Z", "start_time": "2022-08-29T12:45:41.490825Z" } }, "outputs": [], "source": [ "# Assume you have a vial of P32\n", "half_life = 14.3 # days\n", "\n", "# Initially, the vial is at 100% activity\n", "current_activity = 100\n", "\n", "# As long as it has ~10% activity, it's still good to use for radioactive gels\n", "min_activity = 10\n", "\n", "# Compute how many days the vial is good for before it needs to be thrown out\n", "num_days = 0\n", "while current_activity > min_activity:\n", " current_activity /= 2**(1/half_life)\n", " num_days += 1\n", " \n", "print(f'P32 activity will be reduced to {current_activity:.1f}% by day {num_days}.')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "When using while loops, **make very sure that your loop will actually stop at some point**. If you nevertheless end up with a loop that doesn't stop, go to \"Kernel -> Interrupt\" in the menu above. If your computer still acts strange, select \"Kernel -> Restart\". You will then have to evaluate your Jupyter notebook from the beginning. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Exercises, part 1 of 2" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Before there were calculators, people had to compute numbers like $\\pi$ by hand. This was done by deriving an infinite series expression, then hand-computing the first $N$ terms using standard arithmetic. One way of computing $\\pi$ is the **Leibnitz series**:\n", "\n", "$$\\pi = 4 \\left(1 - \\frac{1}{3} + \\frac{1}{5} - \\frac{1}{7} + \\cdots \\right) = \\sum_{n=0}^\\infty 4 \\frac{(-1)^n}{2n+1}$$\n", "\n", "A different way of computing $\\pi$ is the **Madhava series**:\n", "\n", "$$ \\pi = \\sqrt{12} \\left( 1 - \\frac{1}{3 \\cdot 3} + \\frac{1}{5 \\cdot 3^2} - \\frac{1}{7 \\cdot 3^3} + \\cdots \\right) = \\sum_{n=0}^\\infty \\sqrt{12} \\frac{(-1)^n}{(2n+1)\\cdot 3^n} $$\n", "\n", "We will compare the accuracies of these series in two different ways." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**E3.1**: Using a `for` loop, estimate $\\pi$ using the first $100$ terms in both the Liebnitz and Madhava series. Which estimate is more accurate?" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2022-08-29T12:45:44.251662Z", "start_time": "2022-08-29T12:45:44.248951Z" } }, "outputs": [], "source": [ "# Answer here" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Vectorized computations" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You should avoid writing too many loops in Python. Whenever possible, perform computations using vector operations instead. This is possible using `numpy`." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2022-08-29T12:45:45.699067Z", "start_time": "2022-08-29T12:45:45.691017Z" } }, "outputs": [], "source": [ "# A numpy \"array\" is like a list, except all the elements are guarenteed to be of the same type.\n", "N = 100\n", "ns = np.arange(N)\n", "ns" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2022-08-29T12:45:46.312164Z", "start_time": "2022-08-29T12:45:46.304941Z" } }, "outputs": [], "source": [ "# Mathematical operations can be performed directly on numpy arrays\n", "# (this cannot be done on lists).\n", "\n", "# Compute terms in the Liebnitz series all at once\n", "terms_lieb = 4*((-1)**ns / (2*ns+1))\n", "terms_lieb" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2022-08-29T12:45:46.806125Z", "start_time": "2022-08-29T12:45:46.801571Z" } }, "outputs": [], "source": [ "# The sum of entries in a numpy array can be computed using the .sum() method\n", "pi_lieb = terms_lieb.sum()\n", "pi_lieb" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2022-08-29T12:45:47.322221Z", "start_time": "2022-08-29T12:45:47.319121Z" } }, "outputs": [], "source": [ "# Note: you can print non-ASCII symbols using unicode characters:\n", "print(f'π ≈ {pi_lieb} (Liebnitz series, {N} terms)')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Another way to compute $\\pi$ is the **dartboard method**: compute the fraction of random points in the unit square that are within distance 1/2 of the point (0.5,0.5). " ] }, { "cell_type": "code", "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2022-08-29T12:45:48.283252Z", "start_time": "2022-08-29T12:45:48.276493Z" } }, "outputs": [], "source": [ "# Draw N random x- and y-coordinates within the unit square\n", "N = 3000\n", "xs = np.random.rand(N) # Generate N random numbers between 0 and 1\n", "ys = np.random.rand(N) # Ditto\n", "print('xs =', xs)\n", "print('ys =', ys)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2022-08-29T12:45:49.976768Z", "start_time": "2022-08-29T12:45:49.872410Z" } }, "outputs": [], "source": [ "# Check to see what these xs and ys look like\n", "plt.figure(figsize=[5,5])\n", "plt.scatter(xs,ys,s=3)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2022-08-29T12:45:50.479341Z", "start_time": "2022-08-29T12:45:50.473799Z" } }, "outputs": [], "source": [ "# Compute distances from the point (.5, .5)\n", "dists = np.sqrt((xs-.5)**2 + (ys-.5)**2)\n", "print('dists =', repr(dists))" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2022-08-29T12:45:50.956274Z", "start_time": "2022-08-29T12:45:50.952390Z" } }, "outputs": [], "source": [ "# Compute whether points are in the circle\n", "hits = (dists < .5)\n", "print('hits =', repr(hits))" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2022-08-29T12:45:51.542573Z", "start_time": "2022-08-29T12:45:51.445307Z" } }, "outputs": [], "source": [ "# Plot the hits vs. the non-hits\n", "plt.figure(figsize=[5,5])\n", "plt.scatter(xs[hits], ys[hits], s=3) # plot only points i for which hits[i] is True\n", "plt.scatter(xs[~hits], ys[~hits], s=3) # plot only points i for which hits[i] is False" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2022-08-29T12:45:51.907174Z", "start_time": "2022-08-29T12:45:51.903729Z" } }, "outputs": [], "source": [ "# Estimate pi from the number of hits\n", "pi_dart = 4*hits.sum()/N\n", "print(f'π ≈ {pi_dart} (Dartboard method, {N} terms)')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Functions" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Finally, we illustrate how to define functions, including ones that are documented and check the validity of their input. " ] }, { "cell_type": "code", "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2022-08-29T12:45:53.521557Z", "start_time": "2022-08-29T12:45:53.516426Z" } }, "outputs": [], "source": [ "def factorial(n): # n is called the \"argument\"\n", " \"\"\"\n", " Returns n factorial. \n", " n must be an integer satisfying 0 <= n <= 1000.\n", " \"\"\" # This is a \"doc string\"\n", " \n", " # Thow an error if n does not have the right form\n", " assert isinstance(n,int),'Input is not an integer' \n", " assert n >= 0, 'Input is not nonnegative' \n", " assert n <= 1000, 'Intput is too large!'\n", " \n", " # Initialize return variable\n", " val = 1\n", " \n", " # Loop over i=1,2,...,n\n", " for i in range(1,n+1): \n", " val *= i\n", " \n", " return val # val is returned to the user" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We test this function by computing n! for n=0,1,2,...,9" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2022-08-29T12:45:55.061691Z", "start_time": "2022-08-29T12:45:55.057911Z" } }, "outputs": [], "source": [ "for n in range(10):\n", " print(str(n) + '! is ' + str(factorial(n)))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Just as important as making sure your function works on valid input is to make sure that it EXPLICITLY FAILS on invalid input. " ] }, { "cell_type": "code", "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2022-08-29T12:45:56.124937Z", "start_time": "2022-08-29T12:45:56.036459Z" } }, "outputs": [], "source": [ "# This should fail\n", "print(factorial(1.1))" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2022-08-29T12:45:56.559221Z", "start_time": "2022-08-29T12:45:56.541857Z" } }, "outputs": [], "source": [ "# This should fail\n", "print(factorial(-10))" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2022-08-29T12:45:57.946205Z", "start_time": "2022-08-29T12:45:57.933043Z" } }, "outputs": [], "source": [ "# This should fail\n", "print(factorial(\"I'm not even a number!\"))" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2022-08-29T12:45:58.777226Z", "start_time": "2022-08-29T12:45:58.772881Z" } }, "outputs": [], "source": [ "# You should also test boundary cases\n", "print('0! ==', factorial(0))\n", "print('1000! ==', factorial(1000))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The docstring is accessible from within Python, and is often very useful. Execute the following command and a window will pop up that describes what this function does." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2022-08-29T12:46:00.194221Z", "start_time": "2022-08-29T12:46:00.190920Z" } }, "outputs": [], "source": [ "# The built-in function help() will display another function's docstring' \n", "help(factorial)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2022-08-29T12:46:00.722441Z", "start_time": "2022-08-29T12:46:00.663079Z" } }, "outputs": [], "source": [ "# In Jupyter notebooks, you can also type a '?' after a function to see its documentation in a pop-up window\n", "factorial?" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Exercises, part 2 of 2" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**E3.2**: Write a function that computes the Leibnitz series for $\\pi$ with a specified number of terms. Include a docstring and checks to make sure the input is sane." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2022-08-29T12:46:03.144064Z", "start_time": "2022-08-29T12:46:03.141400Z" } }, "outputs": [], "source": [ "# Write function here" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2022-08-29T12:46:03.578136Z", "start_time": "2022-08-29T12:46:03.575924Z" } }, "outputs": [], "source": [ "# Check docstring here" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2022-08-29T12:46:05.962177Z", "start_time": "2022-08-29T12:46:05.959848Z" } }, "outputs": [], "source": [ "# Test input checking here" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.15" }, "varInspector": { "cols": { "lenName": 16, "lenType": 16, "lenVar": 40 }, "kernels_config": { "python": { "delete_cmd_postfix": "", "delete_cmd_prefix": "del ", "library": "var_list.py", "varRefreshCmd": "print(var_dic_list())" }, "r": { "delete_cmd_postfix": ") ", "delete_cmd_prefix": "rm(", "library": "var_list.r", "varRefreshCmd": "cat(var_dic_list()) " } }, "types_to_exclude": [ "module", "function", "builtin_function_or_method", "instance", "_Feature" ], "window_display": false } }, "nbformat": 4, "nbformat_minor": 4 }