{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Notebook 1: Python Programming and Jupyte Notebooks" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Welcome to Python" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In this course we will focus on the Python programming language. There are a number of reasons for this:\n", "\n", "**Python is one of the easiest languages to learn, to write, and to read.** More than almost any other programming language, Python code looks like pseudocode. Many aspects of the language help to enforce clarity. There is also a set of convensions, called the [PEP 8 Style Guide](https://www.python.org/dev/peps/pep-0008/), that most programmers have adopted to make their code as clear as possible.\n", "\n", "**Python allows you to create new code fast**. It is designed to allow you to write programs as quickly and as painlessly as possible. One feature of Python that enables rapid prototyping is that it is an \"interpreted\" language: each line is executed by the Python interpreter one after the next. This is different than C, C++, or Java, which require that programs first be \"compiled\", i.e. translated in their entirety from code to byte code before they are run. \n", "\n", "**Python is very flexible.** With Python you can do image analysis, sequence analysis, exploratory data analysis, and create publication-quality figures. You can write small analysis scripts or write major software applications. You can use Python at the command line, within a Jupyter notebook (such as this one), or within an interactive programming environment. \n", "\n", "**Python is well-supported.** There is an enormous user base for Python. This means that a simple Google search can usually answer any Python programming question you have. There are also a large number of mature 3rd party packages for a wide variety of tasks including image analysis, numerical computation, machine learning, statistics, and graphics. \n", "\n", "**Python skills are highly valued.** There is great demand for people who know Python, and that demand is rapidly increasing. This is true in both academic resaerch and in industry. In particular, Python has become the de facto language of Deep Learning / Artificial Intelligence. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The official python documentation page is https://www.python.org/. Lots of good documentation and tutorials can be found here. Another great resource for programmers is Stack Overflow at http://stackoverflow.com/. Until recently the best resource was Google: just google a question you have and the answer will likely be found in the first or second hit. But now the best resources are often ChatGPT (https://chatgpt.com/), Claude (https://claude.ai/), and other large larngauge models (LLMs). With LLMs, just ask a question and the LLM will not just answer but write example code for you that does the task. This doesn't always work, but it often does!" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Note: The R language is also very popular in biology and statistics, and arguably has a larger set of tools written for biological data analysis than does Python. We will not cover this language here, but once you learn Python it should be relatively easy to learn R. There is also a substantial amount of online material to help anyone interested in learning R. (This is also true of Matlab, popular with engineers and neuroscientists.)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Jupyter notebooks" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This is an **Jupyter notebook**. You can learn more about these on the Jupyter project website, http://jupyter.org/. Jupyter notebooks provide a conveneint interface to Python. They allow you to include fully functional Python code inside of a document that also contains markdown (i.e. text like this) and figures that show analysis results. This type of hybrid document provides a very powerful method of interactive data anlysis. I encourage taking advantage of this. In fact I use Jupyter notebooks for most of my data anlysis tasks. Only when I start writing production code do I switch over to more standard Python scripts. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "If your computer runs a UNIX-like command line, and you've installed the proper software (e.g. the Anaconda Python distribution, https://www.anaconda.com/products/distribution), you can start a Jupyter notebook by typing this at the command line an pressing enter:\n", "\n", "```\n", "jupyter notebook\n", "```\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "For this course, however, we will be using Google CoLab to avoid the potential headache of installing local versions of python on each of your computers. For our purposes, CoLab works nearly identically to Jupyter notebook except that they are run remotely on of Google servers instead of on your own machine." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Markdown cells" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To create a markdown cell, Click the '+' button in the menue above. Then choose 'Markdown' from the drop-down menu." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Markdown is a simple yet powerful way of creating sylized text. For example, you can embed LaTeX code to create an equation:\n", "\n", "$$\\sin x = \\sum_{n=0}^\\infty \\frac{(-1)^n x^{2n+1}}{(2n+1)!}$$\n", "\n", "You can add HTML, e.g. to embed images:\n", "\n", "\"Drawing\"\n", "\n", "Here is a [Markdown Cheat Sheet](https://github.com/adam-p/markdown-here/wiki/Markdown-Here-Cheatsheet) for reference." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Code cells" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Code cells contain snippets of Python code that you can execute by pressing Shift+Enter. Here is a code cell illustrating this with a simple Hello World program:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2022-06-08T19:25:22.142594Z", "start_time": "2022-06-08T19:25:22.140920Z" } }, "outputs": [], "source": [ "print(\"Hello World!\") # The Hello World program!" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2022-06-08T19:25:22.145110Z", "start_time": "2022-06-08T19:25:22.143713Z" } }, "outputs": [], "source": [ "name = \"Justin\"\n", "print(\"Hello \" + name)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Here, name is a variable we defined! What is a variable? A variable is a symbol (a letter or a group of letters) that serves as a stand in for some value. We can set variables to anything, and change those values around as well." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2022-06-08T19:25:22.148959Z", "start_time": "2022-06-08T19:25:22.146070Z" } }, "outputs": [], "source": [ "fruit1 = \"apple\"\n", "print(\"fruit1 is \" + fruit1)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2022-06-08T19:25:22.152603Z", "start_time": "2022-06-08T19:25:22.150614Z" } }, "outputs": [], "source": [ "fruit2 = \"banana\"\n", "print(\"fruit2 is \" + fruit2)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2022-06-08T19:25:22.155519Z", "start_time": "2022-06-08T19:25:22.153475Z" }, "scrolled": true }, "outputs": [], "source": [ "fruit3 = fruit2\n", "print(\"fruit3 is \" + fruit3)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We will review different types of the most common variables in the next notebook." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "All but the most trivial uses of Python usually require packages that must be imported. Some of the most common packages are `numpy`, `pandas`, and `matplotlib`. To import these, and to assign these packages useful aliases, execute the code below. Although it is not required, [PEP 8 recommends](https://www.python.org/dev/peps/pep-0008/#imports) that you do all package imports at the top of each Python document." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2022-06-08T19:25:22.468697Z", "start_time": "2022-06-08T19:25:22.156243Z" } }, "outputs": [], "source": [ "# First code cell\n", "import numpy as np # numpy is the standard numerical computing package\n", "import pandas as pd # pandas is the standard pacakge for working with dataframes\n", "import matplotlib.pyplot as plt # matplotlib is the standard plotting package" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You can also run commands on the UNIX command line by using the exclamation point symbol." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2023-06-14T16:53:55.961450Z", "start_time": "2023-06-14T16:53:55.835197Z" } }, "outputs": [], "source": [ "!ls" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Short example: Drawing sequence logos\n", "\n", "To give you just a taste of how Jupyter notebooks are used, we will install a piece of software called ``logomaker`` (https://logomaker.readthedocs.io/) and use it to make sequence logos.\n", "\n", "First we need to install ``logomaker``, then load it. We can actually do both in the same cell! " ] }, { "cell_type": "code", "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2023-06-14T16:53:59.688116Z", "start_time": "2023-06-14T16:53:58.680033Z" } }, "outputs": [], "source": [ "# Use the pip command-line command to install Python packages from PyPI\n", "!pip install logomaker\n", "\n", "# Import the newly installed logomaker package\n", "import logomaker" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2023-06-14T16:54:01.442323Z", "start_time": "2023-06-14T16:54:01.416396Z" } }, "outputs": [], "source": [ "# Load crp energy matrix in the form of a pandas dataframe\n", "crp_df = logomaker.get_example_matrix('crp_energy_matrix',\n", " print_description=False)\n", "crp_df" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2023-06-14T16:54:04.041841Z", "start_time": "2023-06-14T16:54:03.538473Z" } }, "outputs": [], "source": [ "# Plot CRP energy matrix as a sequence logo\n", "crp_logo = logomaker.Logo(crp_df)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2023-06-14T16:54:05.871911Z", "start_time": "2023-06-14T16:54:04.636058Z" } }, "outputs": [], "source": [ "# Plot logo, but also add some style\n", "crp_logo = logomaker.Logo(-crp_df,\n", " shade_below=.5,\n", " fade_below=.5,\n", " font_name='Arial Rounded MT Bold')\n", "\n", "# style using Logo methods\n", "crp_logo.style_spines(visible=False)\n", "crp_logo.style_spines(spines=['left', 'bottom'], visible=True)\n", "crp_logo.style_xticks(rotation=90, fmt='%d', anchor=0)\n", "\n", "# style using Axes methods\n", "crp_logo.ax.set_ylabel(\"$-\\Delta \\Delta G$ (kcal/mol)\", labelpad=-1)\n", "crp_logo.ax.xaxis.set_ticks_position('none')\n", "crp_logo.ax.xaxis.set_tick_params(pad=-1)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Exercises" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This course will include a variety of exercises to increase your Python skills. Note that the knowledge needed to complete each exercise will NOT necessarily have been presented or discussed. If you find yourself at sea, **try Googling your question, or using asking an LLM.** This is how most programming is done." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**E1.1**: Create a markdown cell containing a bulleted list, a numbered list, and a table." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2022-06-08T19:25:25.281379Z", "start_time": "2022-06-08T19:25:25.280073Z" } }, "outputs": [], "source": [ "# Write answer here" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**E1.2**: Print a variety of greetings to a friend. Store each greeting in a variable." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2023-06-14T16:56:46.124236Z", "start_time": "2023-06-14T16:56:46.121768Z" } }, "outputs": [], "source": [ "# Write answer here" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.15" }, "toc": { "base_numbering": 1, "nav_menu": {}, "number_sections": true, "sideBar": true, "skip_h1_title": false, "title_cell": "Table of Contents", "title_sidebar": "Contents", "toc_cell": false, "toc_position": {}, "toc_section_display": true, "toc_window_display": false }, "varInspector": { "cols": { "lenName": 16, "lenType": 16, "lenVar": 40 }, "kernels_config": { "python": { "delete_cmd_postfix": "", "delete_cmd_prefix": "del ", "library": "var_list.py", "varRefreshCmd": "print(var_dic_list())" }, "r": { "delete_cmd_postfix": ") ", "delete_cmd_prefix": "rm(", "library": "var_list.r", "varRefreshCmd": "cat(var_dic_list()) " } }, "types_to_exclude": [ "module", "function", "builtin_function_or_method", "instance", "_Feature" ], "window_display": false } }, "nbformat": 4, "nbformat_minor": 4 }