{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Answers to exercises: 2_datatypes.ipynb"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2022-08-30T20:15:56.937640Z",
     "start_time": "2022-08-30T20:15:55.850889Z"
    }
   },
   "outputs": [],
   "source": [
    "# Import the usual stuff first\n",
    "import numpy as np\n",
    "import pandas as pd\n",
    "import matplotlib.pyplot as plt\n",
    "import seaborn as sns\n",
    "sns.set_style('white')\n",
    "%matplotlib inline\n",
    "\n",
    "# We'll need this too\n",
    "import logomaker\n",
    "import os.path \n",
    "from scipy.signal import convolve"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This course will include a variety of exercises to increase your Python skills. Note that the knowledge needed to complete each exercise will NOT necessarily have been presented or discussed. If you find yourself at sea, **the first thing you should do is Google your question.** "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Here is the DNA sequence of the multiple cloning site (MCS) on the plasmid [pcDNA5](https://www.addgene.org/vector-database/2132/), a popular vector for mammalian gene expression."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2021-09-01T12:41:01.723903Z",
     "start_time": "2021-09-01T12:41:01.719516Z"
    }
   },
   "outputs": [],
   "source": [
    "# Note how to define a long string over multiple lines\n",
    "mcs_seq = 'GAGACCCAAGCTGGCTAGCGTTTAAACTTAAGCTTGGTACCGAGCTCGGATCCACTA' \\\n",
    "          'GTCCAGTGTGGTGGAATTCTGCAGATATCCAGCACAGTGGCGGCCGCTCGAGTCTAG' \\\n",
    "          'AGGGCCCGTTTAAACCCGCTGATCAGCCT'\n",
    "print(mcs_seq)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**E2.1**: Does this MCS contain a restriction site for NheI (GCTAGC)? How about for MscI (TGGCCA)? "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2021-09-01T12:41:02.165751Z",
     "start_time": "2021-09-01T12:41:02.160983Z"
    }
   },
   "outputs": [],
   "source": [
    "# Answer\n",
    "\n",
    "site_NheI = 'GCTAGC'\n",
    "site_MscI = 'TGGCCA'\n",
    "\n",
    "print('NheI: ', site_NheI in mcs_seq)\n",
    "print('MscI: ', site_MscI in mcs_seq)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**E2.2**: Using the string method `.find()`, find the location(s) of the above restriction sites within the MCS."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2021-09-01T12:41:02.556459Z",
     "start_time": "2021-09-01T12:41:02.551700Z"
    }
   },
   "outputs": [],
   "source": [
    "# find site\n",
    "site_start = mcs_seq.find(site_NheI)\n",
    "print(f'site starts at position {site_start}')\n",
    "\n",
    "# check\n",
    "site_stop = site_start + len(site_NheI)\n",
    "print('found site: ', mcs_seq[site_start:site_stop])\n",
    "print('NheI site : ', site_NheI)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**E2.3**: Using the string method `.replace()`, compute the RNA sequence transcribed from the GFP gene sequence (given below). "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2021-09-01T12:41:03.196244Z",
     "start_time": "2021-09-01T12:41:03.190753Z"
    }
   },
   "outputs": [],
   "source": [
    "gfp_seq = 'ATGAGTAAAGGAGAAGAACTTTTCACTGGAGTTGTCCCAATTCTTGTTGAATTAGATGGTGATG' \\\n",
    "          'TTAATGGGCACAAATTTTCTGTCAGTGGAGAGGGTGAAGGTGATGCAACATACGGAAAACTTAC' \\\n",
    "          'CCTTAAATTTATTTGCACTACTGGAAAACTACCTGTTCCATGGCCAACACTTGTCACTACTTTC' \\\n",
    "          'GCGTATGGTCTTCAATGCTTTGCGAGATACCCAGATCATATGAAACAGCATGACTTTTTCAAGA' \\\n",
    "          'GTGCCATGCCCGAAGGTTATGTACAGGAAAGAACTATATTTTTCAAAGATGACGGGAACTACAA' \\\n",
    "          'GACACGTGCTGAAGTCAAGTTTGAAGGTGATACCCTTGTTAATAGAATCGAGTTAAAAGGTATT' \\\n",
    "          'GATTTTAAAGAAGATGGAAACATTCTTGGACACAAATTGGAATACAACTATAACTCACACAATG' \\\n",
    "          'TATACATCATGGCAGACAAACAAAAGAATGGAATCAAAGTTAACTTCAAAATTAGACACAACAT' \\\n",
    "          'TGAAGATGGAAGCGTTCAACTAGCAGACCATTATCAACAAAATACTCCAATTGGCGATGGCCCT' \\\n",
    "          'GTCCTTTTACCAGACAACCATTACCTGTCCACACAATCTGCCCTTTCGAAAGATCCCAACGAAA' \\\n",
    "          'AGAGAGACCACATGGTCCTTCTTGAGTTTGTAACAGCTGCTGGGATTACACATGGCATGGATGA' \\\n",
    "          'ACTATACAAATAA'\n",
    "\n",
    "# Answer here\n",
    "gfp_rna = gfp_seq.replace('T','U')\n",
    "gfp_rna"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**E2.4**: Create a dictionary called `rc_dict` that maps DNA bases to their complementary bases. I.e., A -> T, C -> G, etc. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2021-09-01T12:41:03.986602Z",
     "start_time": "2021-09-01T12:41:03.979731Z"
    }
   },
   "outputs": [],
   "source": [
    "# Answer\n",
    "rc_dict = {'A':'T', 'C':'G', 'G':'C', 'T':'A'}\n",
    "\n",
    "# For example:\n",
    "rc_dict['A']"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2021-09-01T12:41:04.344965Z",
     "start_time": "2021-09-01T12:41:04.340501Z"
    }
   },
   "outputs": [],
   "source": [
    "# To compute the reverse complement, we need to create a 'translation table',\n",
    "# which is also a dictionary, but takes numerical ascii values as keys\n",
    "# instead of strings\n",
    "rc_table = str.maketrans(rc_dict)\n",
    "rc_table"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**E2.5**: By passing `rc_table` to the string method `.translate()`, then using indexing with a step of -1, compute the reverse complement of the MCS sequence given above."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2021-09-01T12:41:05.387313Z",
     "start_time": "2021-09-01T12:41:05.383330Z"
    }
   },
   "outputs": [],
   "source": [
    "# Compute reverse complement\n",
    "mcs_seq_rc = mcs_seq.translate(rc_table)[::-1]\n",
    "\n",
    "# Print forward and RC sequences\n",
    "print('FW:', mcs_seq)\n",
    "print('RC:', mcs_seq_rc)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**E2.6**: We have not yet discussed sets. Using Google, figure out what `set` objects are and explain what they represent. In particular, explain why Python evaluates {2,3,3} < {1,2,3} as True."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2021-09-01T12:41:06.219050Z",
     "start_time": "2021-09-01T12:41:06.215394Z"
    }
   },
   "outputs": [],
   "source": [
    "# Sets are like lists, but the elements therein don't have a specific order.\n",
    "# Moreover, each element can occur at most once. \n",
    "# So {2,3,3} and {2,3} are the same set. To see this:\n",
    "print('{2,3,3} == {2,3} is ', {2,3,3} == {2,3})\n",
    "\n",
    "# The '<' sign is interpreted as Python as 'is subset'. Because\n",
    "# {2,3} is a subset of {1,2,3}, this evaluates to true\n",
    "print('{2,3} < {1,2,3} is ', {2,3} < {1,2,3})"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.15"
  },
  "varInspector": {
   "cols": {
    "lenName": 16,
    "lenType": 16,
    "lenVar": 40
   },
   "kernels_config": {
    "python": {
     "delete_cmd_postfix": "",
     "delete_cmd_prefix": "del ",
     "library": "var_list.py",
     "varRefreshCmd": "print(var_dic_list())"
    },
    "r": {
     "delete_cmd_postfix": ") ",
     "delete_cmd_prefix": "rm(",
     "library": "var_list.r",
     "varRefreshCmd": "cat(var_dic_list()) "
    }
   },
   "types_to_exclude": [
    "module",
    "function",
    "builtin_function_or_method",
    "instance",
    "_Feature"
   ],
   "window_display": false
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}