comp565_fall2023_A1

.pdf

School

McGill University *

*We aren’t endorsed by this school

Course

565

Subject

Computer Science

Date

Dec 6, 2023

Type

pdf

Pages

4

Uploaded by stephenlu2002 on coursehero.com

Assignment 1 COMP 565 ML in Genomics and Healthcare This assignment is worth 8% of your total grade and due at midnight on September 25, 2023 Question 1 [2%] Implementing LD score regression For a phenotype of interest, we have collected the marginal statistics ˜ β for M = 4268 SNPs and the M × M LD matrix R (i.e., pairwise SNP-SNP Pearson correlation). The marginal statistics are based on N = 1000 individuals. Download the marginal statistics and LD matrix from here: https://drive.google.com/drive/folders/1tq4bTdbsv1iwO4wHxq1smzoN9D5luapp?usp=sharing For this question, you may also assume there is no population stratification in this dataset. Both phenotype and genotype were standardized. Implement the very basic LD score regression algorithm with a programming language of your choice (preferably Python or R) to estimate the heritability of the phenotype. What’s your estimate of the heritability? Submit your answer to this question in iPython notebook with name COMP565 A1 ldsr.ipynb or R Markdown COMP565 A1 ldsr.Rmd on MyCourses. This way the TA can run your code to validate its output. Do not submit the data provided to you as long as you have the clear path to the data you run. Question 2 [6%] Bayesian fine-mapping For a phenotype of interest, we have identified a GWAS locus based on N=498 individuals, which harbour 100 SNPs. As shown in Figure 1, because of the extensive LD, identifying the 1
Figure 1: Manhattan plot for the GWAS locus to finemap. The causal SNPs are in fact coloured in red although in practice we will know which SNPs are causal. causal SNPs based on the p-values of the z-scores alone is error prone. Because this is an as- signment, I have highlighted the causal SNPs namely rs10104559, rs1365732, rs12676370 but of course in real world applications, we will not know them. Download the marginal z-score and LD matrix from here: https://drive.google.com/drive/folders/1tr7BCceyIcKxiO_i6iCNjvk44HHpImgG?usp=sharing Your task is to implement a simplified version of the FINEMAP algorithm discussed in Lecture 5. To make the task easier, you may assume there are maximum 3 causal SNPs in the locus. You can divide the tasks into four small tasks: 2
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help