Email from Elaine on 4-25-2022 yixinhu@cshl.edu The Aims for data analyzation would be: 1) the mapping coverage statistics for 2x8 samples in trial CE8 ss and dp (for your reference, I attached the one you did for the previous trial CE4). 2) the moving average of sample .2 over .1; .4 over .3; .6 over .5; .8 over .7. and generate the bed file for loading onto UCSC genome browser. 3) checking the p-value by the random replication origin simulation. For 2) and 3), I attached the command lines you gave me previously for your reference. I also attached the 95 well-specific barcode for your reference in case you need it. Elaine had data on a hard drive. She copied the data to Zihua's hard drive which he copied to /mnt/wigstore3/user/zwang/sequencingAnalysis/2021-12-06_SC-seq-data_Elaine_NextSeq programs in /mnt/wigstore3/user/kendall/stillman05 output files in /mnt/wigtop2/data/safe/kendall/stillman05 - make guide table nal3.tai1.guide01.txt - trim barcodes - barcode.trim.gzip.pe01.py - process.barcode.trim01.py - map to sc3 using hisat2 - hisat2_sc3.template01.txt - process.hisat2_sc3.01.py - There is a problem. Samples 4 and 8 in both dp and ss have 2M reads while other samples have 8M reads. It appears a large number of molecules still have the primer starting with ATCAA at the start of the read for either or both read 1 and read 2. - add varietal tag info to bam file - add.varietal.tag.template01.hisat2_sc3.bypair.txt - add.varietal.tag01.hisat2_sc3.bypair.py - process.add.varietal.tag01.hisat2_sc3.bypair.py - extract mappings - process.extract.mappings01.py - /mnt/wigstore3/user/kendall/stillman05/process.extract.mappings02.py # changed this to output in chrom, pos, tag order so I can remove duplicate tags later DOH this wasn't needed. The next program mappings to bed removes dups. DOUBLE DOH. I'm using these mappings files to make bin boundaries and bin counting. I need these in this better order to skip duplicates in those programs. - get coverage stats # get total genome length cd /mnt/wigtop2/data/safe/kendall/stillman05 samtools idxstats ./dp/CHY30001.sc3.hisat2_sorted.vt.bypair.bam | awk 'BEGIN {a=0;} {a+=$2;} END {print a;}' EXAMPLE HISAT2 REPORT 8588906 reads; of these: 8588906 (100.00%) were paired; of these: 886778 (10.32%) aligned concordantly 0 times 6651900 (77.45%) aligned concordantly exactly 1 time 1050228 (12.23%) aligned concordantly >1 times ---- 886778 pairs aligned concordantly 0 times; of these: 208758 (23.54%) aligned discordantly 1 time ---- 678020 pairs aligned 0 times concordantly or discordantly; of these: 1356040 mates make up the pairs; of these: 698664 (51.52%) aligned 0 times 513017 (37.83%) aligned exactly 1 time 144359 (10.65%) aligned >1 times 95.93% overall alignment rate # cut and paste these results into spreadsheet cd /mnt/wigtop2/data/safe/kendall/stillman05/dp for i in {30001..30008} ; do ( cat CHY$i.sc3.hisat2.stderr | grep reads | tr -s ' ' | cut -d ' ' -f 1; ) ; done for i in {30001..30008} ; do ( cat CHY$i.sc3.hisat2.stderr | grep overall | tr -s ' ' | cut -d ' ' -f 1; ) ; done for i in {30001..30008} ; do ( cat CHY$i.sc3.hisat2.stderr | grep 'concordantly exactly 1 time' | tr -s ' ' | cut -d ' ' -f 2; ) ; done for i in {30001..30008} ; do ( cat CHY$i.sc3.hisat2.stderr | grep 'discordantly 1 time' | tr -s ' ' | cut -d ' ' -f 2; ) ; done for i in {30001..30008} ; do ( cat CHY$i.sc3.hisat2.stderr | grep 'aligned exactly 1 time' | tr -s ' ' | cut -d ' ' -f 2; ) ; done # coverage for i in {30001..30008} ; do ( samtools stats CHY$i.sc3.hisat2_sorted.vt.bypair.bam | grep ^COV | awk 'BEGIN {a=0;} {a+=$4;} END {print a;}'; ) ; done # total reads in fastq files for file in /mnt/wigstore3/user/zwang/sequencingAnalysis/2021-12-06_SC-seq-data_Elaine_NextSeq/DoublePurification/*/*r1.fastq.gz ; do ( gunzip -c "$file" | wc -l ; ) ; done cd /mnt/wigtop2/data/safe/kendall/stillman05/ss for i in {30009..30016} ; do ( cat CHY$i.sc3.hisat2.stderr | grep reads | tr -s ' ' | cut -d ' ' -f 1; ) ; done for i in {30009..30016} ; do ( cat CHY$i.sc3.hisat2.stderr | grep overall | tr -s ' ' | cut -d ' ' -f 1; ) ; done for i in {30009..30016} ; do ( cat CHY$i.sc3.hisat2.stderr | grep 'concordantly exactly 1 time' | tr -s ' ' | cut -d ' ' -f 2; ) ; done for i in {30009..30016} ; do ( cat CHY$i.sc3.hisat2.stderr | grep 'discordantly 1 time' | tr -s ' ' | cut -d ' ' -f 2; ) ; done for i in {30009..30016} ; do ( cat CHY$i.sc3.hisat2.stderr | grep 'aligned exactly 1 time' | tr -s ' ' | cut -d ' ' -f 2; ) ; done # coverage for i in {30009..30016} ; do ( samtools stats CHY$i.sc3.hisat2_sorted.vt.bypair.bam | grep ^COV | awk 'BEGIN {a=0;} {a+=$4;} END {print a;}'; ) ; done # total reads in fastq files for file in /mnt/wigstore3/user/zwang/sequencingAnalysis/2021-12-06_SC-seq-data_Elaine_NextSeq/SizeSelection/*/*r1.fastq.gz ; do ( gunzip -c "$file" | wc -l ; ) ; done cd /mnt/wigtop2/data/safe/kendall/stillman05 for i in {30001..30016} ; do ( cat CHY$i.stillman05.mappings.txt | wc -l ; ) ; done for i in {30001..30016} ; do ( cat CHY$i.stillman05.mappings.txt | grep ATCAA | wc -l ; ) ; done # MAKE BED FILES FROM MAPPINGS, this removes duplicates by position and tag python /mnt/wigstore3/user/kendall/stillman05/process.mappings.to.bed01.py tar -czvf stillman05.bed.tar.gz *.bed # reads in origins compared to randomized reads python /mnt/wigstore3/user/kendall/stillman05/process.reads.in.origins01.py ###!!!!! REDO THIS PRETENDING chrXII:430000-500000 doesn't exist for reads and for origins 5-18-22 python /mnt/wigstore3/user/kendall/stillman05/process.reads.in.origins02.py # cumsum python /mnt/wigstore3/user/kendall/stillman05/process.sc3.cumsum.py # ratios cd /mnt/wigtop2/data/safe/kendall/stillman05 python /mnt/wigstore3/user/kendall/stillman05/cumsum.ratio01.py 2-HU90-50pg-dp 1-G1-50pg-dp 10 > 2_over_1.dp.1k.moving.avg.txt python /mnt/wigstore3/user/kendall/stillman05/cumsum.ratio01.py 4-HU90-1ng-dp 3-G1-1ng-dp 10 > 4_over_3.dp.1k.moving.avg.txt python /mnt/wigstore3/user/kendall/stillman05/cumsum.ratio01.py 6-HU90-50pg-dp 5-G1-50pg-dp 10 > 6_over_5.dp.1k.moving.avg.txt python /mnt/wigstore3/user/kendall/stillman05/cumsum.ratio01.py 8-HU90-1ng-dp 7-G1-1ng-dp 10 > 8_over_7.dp.1k.moving.avg.txt python /mnt/wigstore3/user/kendall/stillman05/cumsum.ratio01.py 2-HU90-50pg-ss 1-G1-50pg-ss 10 > 2_over_1.ss.1k.moving.avg.txt python /mnt/wigstore3/user/kendall/stillman05/cumsum.ratio01.py 4-HU90-1ng-ss 3-G1-1ng-ss 10 > 4_over_3.ss.1k.moving.avg.txt python /mnt/wigstore3/user/kendall/stillman05/cumsum.ratio01.py 6-HU90-50pg-ss 5-G1-50pg-ss 10 > 6_over_5.ss.1k.moving.avg.txt python /mnt/wigstore3/user/kendall/stillman05/cumsum.ratio01.py 8-HU90-1ng-ss 7-G1-1ng-ss 10 > 8_over_7.ss.1k.moving.avg.txt cat 2_over_1.dp.1k.moving.avg.txt | python /mnt/wigstore3/user/kendall/pythonpgms/hu.moving.avg.to.bed.py 1.8 "dp 2_over_1 ge 1.8" > 2_over_1.dp.1k.moving.avg.1_8.bed cat 4_over_3.dp.1k.moving.avg.txt | python /mnt/wigstore3/user/kendall/pythonpgms/hu.moving.avg.to.bed.py 1.8 "dp 4_over_3 ge 1.8" > 4_over_3.dp.1k.moving.avg.1_8.bed cat 6_over_5.dp.1k.moving.avg.txt | python /mnt/wigstore3/user/kendall/pythonpgms/hu.moving.avg.to.bed.py 1.8 "dp 6_over_5 ge 1.8" > 6_over_5.dp.1k.moving.avg.1_8.bed cat 8_over_7.dp.1k.moving.avg.txt | python /mnt/wigstore3/user/kendall/pythonpgms/hu.moving.avg.to.bed.py 1.8 "dp 8_over_7 ge 1.8" > 8_over_7.dp.1k.moving.avg.1_8.bed cat 2_over_1.ss.1k.moving.avg.txt | python /mnt/wigstore3/user/kendall/pythonpgms/hu.moving.avg.to.bed.py 1.8 "ss 2_over_1 ge 1.8" > 2_over_1.ss.1k.moving.avg.1_8.bed cat 4_over_3.ss.1k.moving.avg.txt | python /mnt/wigstore3/user/kendall/pythonpgms/hu.moving.avg.to.bed.py 1.8 "ss 4_over_3 ge 1.8" > 4_over_3.ss.1k.moving.avg.1_8.bed cat 6_over_5.ss.1k.moving.avg.txt | python /mnt/wigstore3/user/kendall/pythonpgms/hu.moving.avg.to.bed.py 1.8 "ss 6_over_5 ge 1.8" > 6_over_5.ss.1k.moving.avg.1_8.bed cat 8_over_7.ss.1k.moving.avg.txt | python /mnt/wigstore3/user/kendall/pythonpgms/hu.moving.avg.to.bed.py 1.8 "ss 8_over_7 ge 1.8" > 8_over_7.ss.1k.moving.avg.1_8.bed cat 2_over_1.dp.1k.moving.avg.txt | python /mnt/wigstore3/user/kendall/pythonpgms/hu.moving.avg.to.bed.py 1.5 "dp 2_over_1 ge 1.5" > 2_over_1.dp.1k.moving.avg.1_5.bed cat 4_over_3.dp.1k.moving.avg.txt | python /mnt/wigstore3/user/kendall/pythonpgms/hu.moving.avg.to.bed.py 1.5 "dp 4_over_3 ge 1.5" > 4_over_3.dp.1k.moving.avg.1_5.bed cat 6_over_5.dp.1k.moving.avg.txt | python /mnt/wigstore3/user/kendall/pythonpgms/hu.moving.avg.to.bed.py 1.5 "dp 6_over_5 ge 1.5" > 6_over_5.dp.1k.moving.avg.1_5.bed cat 8_over_7.dp.1k.moving.avg.txt | python /mnt/wigstore3/user/kendall/pythonpgms/hu.moving.avg.to.bed.py 1.5 "dp 8_over_7 ge 1.5" > 8_over_7.dp.1k.moving.avg.1_5.bed cat 2_over_1.ss.1k.moving.avg.txt | python /mnt/wigstore3/user/kendall/pythonpgms/hu.moving.avg.to.bed.py 1.5 "ss 2_over_1 ge 1.5" > 2_over_1.ss.1k.moving.avg.1_5.bed cat 4_over_3.ss.1k.moving.avg.txt | python /mnt/wigstore3/user/kendall/pythonpgms/hu.moving.avg.to.bed.py 1.5 "ss 4_over_3 ge 1.5" > 4_over_3.ss.1k.moving.avg.1_5.bed cat 6_over_5.ss.1k.moving.avg.txt | python /mnt/wigstore3/user/kendall/pythonpgms/hu.moving.avg.to.bed.py 1.5 "ss 6_over_5 ge 1.5" > 6_over_5.ss.1k.moving.avg.1_5.bed cat 8_over_7.ss.1k.moving.avg.txt | python /mnt/wigstore3/user/kendall/pythonpgms/hu.moving.avg.to.bed.py 1.5 "ss 8_over_7 ge 1.5" > 8_over_7.ss.1k.moving.avg.1_5.bed cat 2_over_1.dp.1k.moving.avg.txt | python /mnt/wigstore3/user/kendall/pythonpgms/hu.moving.avg.to.bed.py 2.0 "dp 2_over_1 ge 2.0" > 2_over_1.dp.1k.moving.avg.2_0.bed cat 4_over_3.dp.1k.moving.avg.txt | python /mnt/wigstore3/user/kendall/pythonpgms/hu.moving.avg.to.bed.py 2.0 "dp 4_over_3 ge 2.0" > 4_over_3.dp.1k.moving.avg.2_0.bed cat 6_over_5.dp.1k.moving.avg.txt | python /mnt/wigstore3/user/kendall/pythonpgms/hu.moving.avg.to.bed.py 2.0 "dp 6_over_5 ge 2.0" > 6_over_5.dp.1k.moving.avg.2_0.bed cat 8_over_7.dp.1k.moving.avg.txt | python /mnt/wigstore3/user/kendall/pythonpgms/hu.moving.avg.to.bed.py 2.0 "dp 8_over_7 ge 2.0" > 8_over_7.dp.1k.moving.avg.2_0.bed cat 2_over_1.ss.1k.moving.avg.txt | python /mnt/wigstore3/user/kendall/pythonpgms/hu.moving.avg.to.bed.py 2.0 "ss 2_over_1 ge 2.0" > 2_over_1.ss.1k.moving.avg.2_0.bed cat 4_over_3.ss.1k.moving.avg.txt | python /mnt/wigstore3/user/kendall/pythonpgms/hu.moving.avg.to.bed.py 2.0 "ss 4_over_3 ge 2.0" > 4_over_3.ss.1k.moving.avg.2_0.bed cat 6_over_5.ss.1k.moving.avg.txt | python /mnt/wigstore3/user/kendall/pythonpgms/hu.moving.avg.to.bed.py 2.0 "ss 6_over_5 ge 2.0" > 6_over_5.ss.1k.moving.avg.2_0.bed cat 8_over_7.ss.1k.moving.avg.txt | python /mnt/wigstore3/user/kendall/pythonpgms/hu.moving.avg.to.bed.py 2.0 "ss 8_over_7 ge 2.0" > 8_over_7.ss.1k.moving.avg.2_0.bed tar -czvf stillman05.moving.avg.ratios.tar.gz *moving*txt *moving*bed # redo bed files and send those tar -czvf stillman05.moving.avg.ratios.bed.tar.gz *moving*bed # make a moving average of bin counts python /mnt/wigstore3/user/kendall/stillman05/cumsum.ratio01.py 2-HU90-50pg-dp 1-G1-50pg-dp 10 > 2_over_1.dp.1k.moving.avg.txt python /mnt/wigstore3/user/kendall/stillman05/cumsum.ratio01.py 4-HU90-1ng-dp 3-G1-1ng-dp 10 > 4_over_3.dp.1k.moving.avg.txt python /mnt/wigstore3/user/kendall/stillman05/cumsum.ratio01.py 6-HU90-50pg-dp 5-G1-50pg-dp 10 > 6_over_5.dp.1k.moving.avg.txt python /mnt/wigstore3/user/kendall/stillman05/cumsum.ratio01.py 8-HU90-1ng-dp 7-G1-1ng-dp 10 > 8_over_7.dp.1k.moving.avg.txt cd /mnt/wigtop2/data/safe/kendall/stillman05 # 1k window moving average python /mnt/wigstore3/user/kendall/stillman05/bincount.moving.average.normalized01.py 1-G1-50pg-dp 10 > 1-G1-50pg-dp.1k.moving.avg.bincount.normalized.bed python /mnt/wigstore3/user/kendall/stillman05/bincount.moving.average.normalized01.py 2-HU90-50pg-dp 10 > 2-HU90-50pg-dp.1k.moving.avg.bincount.normalized.bed python /mnt/wigstore3/user/kendall/stillman05/bincount.moving.average.normalized01.py 3-G1-1ng-dp 10 > 3-G1-1ng-dp.1k.moving.avg.bincount.normalized.bed python /mnt/wigstore3/user/kendall/stillman05/bincount.moving.average.normalized01.py 4-HU90-1ng-dp 10 > 4-HU90-1ng-dp.1k.moving.avg.bincount.normalized.bed python /mnt/wigstore3/user/kendall/stillman05/bincount.moving.average.normalized01.py 5-G1-50pg-dp 10 > 5-G1-50pg-dp.1k.moving.avg.bincount.normalized.bed python /mnt/wigstore3/user/kendall/stillman05/bincount.moving.average.normalized01.py 6-HU90-50pg-dp 10 > 6-HU90-50pg-dp.1k.moving.avg.bincount.normalized.bed python /mnt/wigstore3/user/kendall/stillman05/bincount.moving.average.normalized01.py 7-G1-1ng-dp 10 > 7-G1-1ng-dp.1k.moving.avg.bincount.normalized.bed python /mnt/wigstore3/user/kendall/stillman05/bincount.moving.average.normalized01.py 8-HU90-1ng-dp 10 > 8-HU90-1ng-dp.1k.moving.avg.bincount.normalized.bed python /mnt/wigstore3/user/kendall/stillman05/bincount.moving.average.normalized01.py 1-G1-50pg-ss 10 > 1-G1-50pg-ss.1k.moving.avg.bincount.normalized.bed python /mnt/wigstore3/user/kendall/stillman05/bincount.moving.average.normalized01.py 2-HU90-50pg-ss 10 > 2-HU90-50pg-ss.1k.moving.avg.bincount.normalized.bed python /mnt/wigstore3/user/kendall/stillman05/bincount.moving.average.normalized01.py 3-G1-1ng-ss 10 > 3-G1-1ng-ss.1k.moving.avg.bincount.normalized.bed python /mnt/wigstore3/user/kendall/stillman05/bincount.moving.average.normalized01.py 4-HU90-1ng-ss 10 > 4-HU90-1ng-ss.1k.moving.avg.bincount.normalized.bed python /mnt/wigstore3/user/kendall/stillman05/bincount.moving.average.normalized01.py 5-G1-50pg-ss 10 > 5-G1-50pg-ss.1k.moving.avg.bincount.normalized.bed python /mnt/wigstore3/user/kendall/stillman05/bincount.moving.average.normalized01.py 6-HU90-50pg-ss 10 > 6-HU90-50pg-ss.1k.moving.avg.bincount.normalized.bed python /mnt/wigstore3/user/kendall/stillman05/bincount.moving.average.normalized01.py 7-G1-1ng-ss 10 > 7-G1-1ng-ss.1k.moving.avg.bincount.normalized.bed python /mnt/wigstore3/user/kendall/stillman05/bincount.moving.average.normalized01.py 8-HU90-1ng-ss 10 > 8-HU90-1ng-ss.1k.moving.avg.bincount.normalized.bed # 10k window moving average python /mnt/wigstore3/user/kendall/stillman05/bincount.moving.average.normalized01.py 1-G1-50pg-dp 100 > 1-G1-50pg-dp.10k.moving.avg.bincount.normalized.bed python /mnt/wigstore3/user/kendall/stillman05/bincount.moving.average.normalized01.py 2-HU90-50pg-dp 100 > 2-HU90-50pg-dp.10k.moving.avg.bincount.normalized.bed python /mnt/wigstore3/user/kendall/stillman05/bincount.moving.average.normalized01.py 3-G1-1ng-dp 100 > 3-G1-1ng-dp.10k.moving.avg.bincount.normalized.bed python /mnt/wigstore3/user/kendall/stillman05/bincount.moving.average.normalized01.py 4-HU90-1ng-dp 100 > 4-HU90-1ng-dp.10k.moving.avg.bincount.normalized.bed python /mnt/wigstore3/user/kendall/stillman05/bincount.moving.average.normalized01.py 5-G1-50pg-dp 100 > 5-G1-50pg-dp.10k.moving.avg.bincount.normalized.bed python /mnt/wigstore3/user/kendall/stillman05/bincount.moving.average.normalized01.py 6-HU90-50pg-dp 100 > 6-HU90-50pg-dp.10k.moving.avg.bincount.normalized.bed python /mnt/wigstore3/user/kendall/stillman05/bincount.moving.average.normalized01.py 7-G1-1ng-dp 100 > 7-G1-1ng-dp.10k.moving.avg.bincount.normalized.bed python /mnt/wigstore3/user/kendall/stillman05/bincount.moving.average.normalized01.py 8-HU90-1ng-dp 100 > 8-HU90-1ng-dp.10k.moving.avg.bincount.normalized.bed python /mnt/wigstore3/user/kendall/stillman05/bincount.moving.average.normalized01.py 1-G1-50pg-ss 100 > 1-G1-50pg-ss.10k.moving.avg.bincount.normalized.bed python /mnt/wigstore3/user/kendall/stillman05/bincount.moving.average.normalized01.py 2-HU90-50pg-ss 100 > 2-HU90-50pg-ss.10k.moving.avg.bincount.normalized.bed python /mnt/wigstore3/user/kendall/stillman05/bincount.moving.average.normalized01.py 3-G1-1ng-ss 100 > 3-G1-1ng-ss.10k.moving.avg.bincount.normalized.bed python /mnt/wigstore3/user/kendall/stillman05/bincount.moving.average.normalized01.py 4-HU90-1ng-ss 100 > 4-HU90-1ng-ss.10k.moving.avg.bincount.normalized.bed python /mnt/wigstore3/user/kendall/stillman05/bincount.moving.average.normalized01.py 5-G1-50pg-ss 100 > 5-G1-50pg-ss.10k.moving.avg.bincount.normalized.bed python /mnt/wigstore3/user/kendall/stillman05/bincount.moving.average.normalized01.py 6-HU90-50pg-ss 100 > 6-HU90-50pg-ss.10k.moving.avg.bincount.normalized.bed python /mnt/wigstore3/user/kendall/stillman05/bincount.moving.average.normalized01.py 7-G1-1ng-ss 100 > 7-G1-1ng-ss.10k.moving.avg.bincount.normalized.bed python /mnt/wigstore3/user/kendall/stillman05/bincount.moving.average.normalized01.py 8-HU90-1ng-ss 100 > 8-HU90-1ng-ss.10k.moving.avg.bincount.normalized.bed tar -czvf stillman05.moving.avg.bincount.normalized.bed.tar.gz *10k.moving*bincount*bed #### make some plots as requested in email from elaine 5-7-22 1-G1-50pg-dp.stillman05.cumsum.100.txt 4-HU90-1ng-dp.stillman05.cumsum.100.txt 7-G1-1ng-dp.stillman05.cumsum.100.txt 1-G1-50pg-ss.stillman05.cumsum.100.txt 4-HU90-1ng-ss.stillman05.cumsum.100.txt 7-G1-1ng-ss.stillman05.cumsum.100.txt 2-HU90-50pg-dp.stillman05.cumsum.100.txt 5-G1-50pg-dp.stillman05.cumsum.100.txt 8-HU90-1ng-dp.stillman05.cumsum.100.txt 2-HU90-50pg-ss.stillman05.cumsum.100.txt 5-G1-50pg-ss.stillman05.cumsum.100.txt 8-HU90-1ng-ss.stillman05.cumsum.100.txt 3-G1-1ng-dp.stillman05.cumsum.100.txt 6-HU90-50pg-dp.stillman05.cumsum.100.txt 3-G1-1ng-ss.stillman05.cumsum.100.txt 6-HU90-50pg-ss.stillman05.cumsum.100.txt g1 <- read.table("/mnt/wigtop2/data/safe/kendall/stillman05/1-G1-50pg-dp.stillman05.cumsum.100.txt", header=F, as.is=T, stringsAsFactors=F) hu90 <- read.table("/mnt/wigtop2/data/safe/kendall/stillman05/2-HU90-50pg-dp.stillman05.cumsum.100.txt", header=F, as.is=T, stringsAsFactors=F) names(g1) <- c("chrom", "bin.start", "bin.cumsum", "bin.order") names(hu90) <- c("chrom", "bin.start", "bin.cumsum", "bin.order") bin.lag <- 50 ratios <- (hu90[(1 + bin.lag):nrow(hu90), "bin.cumsum"] - hu90[1:(nrow(hu90) - bin.lag), "bin.cumsum"] + 1) / (g1[(1 + bin.lag):nrow(g1), "bin.cumsum"] - g1[1:(nrow(g1) - bin.lag), "bin.cumsum"] + 1) g1x <- read.table("/mnt/wigtop2/data/safe/kendall/stillman05/1-G1-50pg-dp.stillman05.cumsum.100.txt", header=F, as.is=T, stringsAsFactors=F) g2x <- read.table("/mnt/wigtop2/data/safe/kendall/stillman05/5-G1-50pg-dp.stillman05.cumsum.100.txt", header=F, as.is=T, stringsAsFactors=F) names(g1x) <- c("chrom", "bin.start", "bin.cumsum", "bin.order") names(g2x) <- c("chrom", "bin.start", "bin.cumsum", "bin.order") bin.lag <- 50 ratios2 <- (g2x[(1 + bin.lag):nrow(g2x), "bin.cumsum"] - g2x[1:(nrow(g2x) - bin.lag), "bin.cumsum"] + 1) / (g1x[(1 + bin.lag):nrow(g1x), "bin.cumsum"] - g1x[1:(nrow(g1x) - bin.lag), "bin.cumsum"] + 1) png("/mnt/wigstore3/user/kendall/stillman05/2hu90_1g1_dp.5g1_1g1_dp.whole_genome.png", height=700, width=1200) plot(ratios/mean(ratios), log="y", ylim=c(0.25, 4), main="black = 2hu90/1g1 dp; red = 5g1/1g1 dp") points(ratios2/mean(ratios2), col="red") dev.off() plot.genome.ratios <- function (hu90, g1, g1x, bin.lag, hu90.name, g1.name, g1x.name) { ratios <- (hu90[(1 + bin.lag):nrow(hu90), "bin.cumsum"] - hu90[1:(nrow(hu90) - bin.lag), "bin.cumsum"] + 1) / (g1[(1 + bin.lag):nrow(g1), "bin.cumsum"] - g1[1:(nrow(g1) - bin.lag), "bin.cumsum"] + 1) ratios2 <- (g1x[(1 + bin.lag):nrow(g1x), "bin.cumsum"] - g1x[1:(nrow(g1x) - bin.lag), "bin.cumsum"] + 1) / (g1[(1 + bin.lag):nrow(g1), "bin.cumsum"] - g1[1:(nrow(g1) - bin.lag), "bin.cumsum"] + 1) png(paste("/mnt/wigstore3/user/kendall/stillman05/", hu90.name, ".x.", g1.name, ".vs.", g1x.name, ".x.", g1.name, ".l.", bin.lag, ".whole_genome.png", sep=""), height=700, width=1200) plot(ratios/mean(ratios), log="y", ylim=c(0.25, 4), main=paste("black = ", hu90.name, "/", g1.name, "; red = ", g1x.name, "/", g1.name, "; lag = ", bin.lag, sep="")) points(ratios2/mean(ratios2), col="red") dev.off() } g1 <- read.table("/mnt/wigtop2/data/safe/kendall/stillman05/1-G1-50pg-dp.stillman05.cumsum.100.txt", header=F, as.is=T, stringsAsFactors=F) hu90 <- read.table("/mnt/wigtop2/data/safe/kendall/stillman05/2-HU90-50pg-dp.stillman05.cumsum.100.txt", header=F, as.is=T, stringsAsFactors=F) g1x <- read.table("/mnt/wigtop2/data/safe/kendall/stillman05/5-G1-50pg-dp.stillman05.cumsum.100.txt", header=F, as.is=T, stringsAsFactors=F) names(g1x) <- c("chrom", "bin.start", "bin.cumsum", "bin.order") names(g1) <- c("chrom", "bin.start", "bin.cumsum", "bin.order") names(hu90) <- c("chrom", "bin.start", "bin.cumsum", "bin.order") hu90.name <- "2-HU90-dp" g1.name <- "1-G1-dp" g1x.name <- "5-G1-dp" plot.genome.ratios(hu90, g1, g1x, 10, hu90.name, g1.name, g1x.name) plot.genome.ratios(hu90, g1, g1x, 50, hu90.name, g1.name, g1x.name) plot.genome.ratios(hu90, g1, g1x, 100, hu90.name, g1.name, g1x.name) ##### make bin boundaries cat stillman05.mappings.sorted.txt | grep G1 > stillman05.mappings.sorted.G1.txt # NOTE: I changed this program after I ran this to take the input filename as a parameter, rerun commands below python3 /mnt/wigstore3/user/kendall/pythonpgms/sc3.bin.boundaries01.py 12000 > sc3.bin.boundaries.12k.txt kendall@wigtop2:/data/safe/kendall/stillman05$ cat stillman05.mappings.sorted.G1.txt | awk '{if ($2 == "chrI" && $3 == 162282) print;}' | wc -l 46111 cat stillman05.mappings.sorted.G1.txt | awk '{if ($2 == "chrI" && $3 == 162282) print;}' | wc -l 46111 1 I 160698 162282 160698 162282 1584 1 I 162282 162282 162282 162282 0 1 I 162282 162282 162282 162282 0 1 I 162282 162282 162282 162282 0 1 I 162282 162282 162282 162282 0 1 I 162282 162282 162282 162282 0 1 I 162282 162282 162282 162282 0 1 I 162282 162282 162282 162282 0 1 I 162282 162282 162282 162282 0 1 I 162282 162282 162282 162282 0 1 I 162282 162282 162282 162282 0 1 I 162282 162282 162282 162282 0 1 I 162282 162776 162282 162776 494 cd /mnt/wigtop2/data/safe/kendall/stillman05 python3 /mnt/wigstore3/user/kendall/pythonpgms/sc3.bin.boundaries01.py 12000 /mnt/wigtop2/data/safe/kendall/stillman05/stillman05.mappings.sorted.G1.txt > sc3.bin.boundaries.12k.all_samples.txt python3 /mnt/wigstore3/user/kendall/pythonpgms/sc3.bin.boundaries01.py 12000 /mnt/wigtop2/data/safe/kendall/stillman05/CHY30003.stillman05.mappings.txt > sc3.bin.boundaries.12k.3-G1-dp.txt python3 /mnt/wigstore3/user/kendall/pythonpgms/sc3.bin.boundaries01.py 12000 /mnt/wigtop2/data/safe/kendall/stillman05/CHY30011.stillman05.mappings.txt > sc3.bin.boundaries.12k.3-G1-ss.txt python3 /mnt/wigstore3/user/kendall/pythonpgms/sc3.bin.boundaries01.py 1200 /mnt/wigtop2/data/safe/kendall/stillman05/stillman05.mappings.sorted.G1.txt > sc3.bin.boundaries.1_2k.all_samples.txt python3 /mnt/wigstore3/user/kendall/pythonpgms/sc3.bin.boundaries01.py 1200 /mnt/wigtop2/data/safe/kendall/stillman05/CHY30003.stillman05.mappings.txt > sc3.bin.boundaries.1_2k.3-G1-dp.txt python3 /mnt/wigstore3/user/kendall/pythonpgms/sc3.bin.boundaries01.py 1200 /mnt/wigtop2/data/safe/kendall/stillman05/CHY30011.stillman05.mappings.txt > sc3.bin.boundaries.1_2k.3-G1-ss.txt python3 /mnt/wigstore3/user/kendall/pythonpgms/sc3.bin.boundaries01.py 300 /mnt/wigtop2/data/safe/kendall/stillman05/stillman05.mappings.sorted.G1.txt > sc3.bin.boundaries.0_3k.all_samples.txt python3 /mnt/wigstore3/user/kendall/pythonpgms/sc3.bin.boundaries01.py 300 /mnt/wigtop2/data/safe/kendall/stillman05/CHY30003.stillman05.mappings.txt > sc3.bin.boundaries.0_3k.3-G1-dp.txt python3 /mnt/wigstore3/user/kendall/pythonpgms/sc3.bin.boundaries01.py 300 /mnt/wigtop2/data/safe/kendall/stillman05/CHY30011.stillman05.mappings.txt > sc3.bin.boundaries.0_3k.3-G1-ss.txt python3 /mnt/wigstore3/user/kendall/pythonpgms/sc3.bin.boundaries01.py 600 /mnt/wigtop2/data/safe/kendall/stillman05/stillman05.mappings.sorted.G1.txt > sc3.bin.boundaries.0_6k.all_samples.txt python3 /mnt/wigstore3/user/kendall/pythonpgms/sc3.bin.boundaries01.py 600 /mnt/wigtop2/data/safe/kendall/stillman05/CHY30003.stillman05.mappings.txt > sc3.bin.boundaries.0_6k.3-G1-dp.txt python3 /mnt/wigstore3/user/kendall/pythonpgms/sc3.bin.boundaries01.py 600 /mnt/wigtop2/data/safe/kendall/stillman05/CHY30011.stillman05.mappings.txt > sc3.bin.boundaries.0_6k.3-G1-ss.txt ##### MAKE VARBIN FILES python /mnt/wigstore3/user/kendall/stillman05/stillman05.process.varbin.1_2k.py python /mnt/wigstore3/user/kendall/stillman05/stillman05.process.varbin.0_6k.py python /mnt/wigstore3/user/kendall/stillman05/stillman05.process.varbin.0_3k.py ##### MAKE VARBIN PLOTS # determine bad bins by running sort -k 7,7n sc3.bin.boundaries.1_2k.3-G1-dp.txt | less sort -k 7,7n sc3.bin.boundaries.0_6k.3-G1-dp.txt | less sort -k 7,7n sc3.bin.boundaries.0_3k.3-G1-dp.txt | less # and look to see a good cutoff to remove most of the chrXII bad bins cutoffs 1600, 3000, 20000 # make uber python /mnt/wigstore3/user/kendall/stillman05/stillman05.process.make.varbin.uber.1_2k.py > stillman05.varbin.uber.1_2k.txt python /mnt/wigstore3/user/kendall/stillman05/stillman05.process.make.varbin.uber.0_6k.py > stillman05.varbin.uber.0_6k.txt python /mnt/wigstore3/user/kendall/stillman05/stillman05.process.make.varbin.uber.0_3k.py > stillman05.varbin.uber.0_3k.txt # make plots in R \\wigstore3\main\user\kendall\stillman05\plot.uber.commands01.txt # then make a slideshow in powerpoint ##### plot histograms of origins simulations x <- read.table("/mnt/wigtop2/data/safe/kendall/stillman05/8-HU90-1ng-dp.stillman05.origin.counts02.txt", header=F, as.is=T, stringsAsFactors=F, nrows=1001) png("/mnt/wigstore3/user/kendall/stillman05/8-HU90-1ng-dp.origins.histogram01.png", height=700, width=700) plot(hist(x[2:1001, 1]), main="8-HU90-1ng-dp reads in origins randomization", xlim=c(min(x[, 1]), max(x[, 1]))) abline(v=x[1, 1], col="red") dev.off() filelist <- list.files(path="/mnt/wigtop2/data/safe/kendall/stillman05", pattern="*.stillman05.origin.counts02.txt") for (i in 1:length(filelist)) { thisSample <- strsplit(filelist[i], "\\.")[[1]][1] x <- read.table(paste("/mnt/wigtop2/data/safe/kendall/stillman05/", filelist[i], sep=""), header=F, as.is=T, stringsAsFactors=F, nrows=1001) png(paste("/mnt/wigstore3/user/kendall/stillman05/", "zz", i, ".", thisSample, ".origins.histogram01.png", sep=""), height=700, width=700) plot(hist(x[2:1001, 1], breaks=16), main=paste(thisSample, " reads in origins randomization", sep=""), xlim=c(min(x[, 1]) - 1000, max(x[, 1]) + 1000), xlab="reads in origins") abline(v=x[1, 1], col="red") dev.off() }