This site includes the Project MinE Variant Browser based on >6400 whole genomes from different European ancestry, and GWAS summary statistics as reported in Nature Genetics. To download the summary statistics please see the 'Publications' tab, or click here. We have a data-sharing policy to provide a simple and transparent framework for the broad use of Project MinE data. If you would like to request individual level array data please see the 'Request Data' tab. The website also includes burden test results for > 15.000 genes.

Thank you for your interest in our mailing list.

Thank you for your submitting in your email. A confirmation has been send to the address you supplied.

Transcript Details

Coverage

Burden Testing

#### Introduction

Genetic association testing has typically relied on association tests at single variants (i.e., genome-wide association studies (GWAS) of SNPs). GWAS typically use linear or logistic regression models to test SNPs for association to disease. When testing rare variation for association to disease, however, statistical power to detect association decreases rapidly (due to low counts of alleles contained in the data). Therefore, when testing rare single variants, the necessary sample sizes needed to increase power sufficiently are on the orders of tens or hundreds of thousands. An alternate approach is to combine information across multiple variant sites (typically referred to as genic ‘burden’ testing). Aggregating information across multiple sites can improve power to detect association signals (by summing the burden of variants contained in a particular region such as a gene) and reduce the multiple testing correction. Below is a brief explanation of various aggregation tests with references for more in-depth information.

#### T1

T1 (also called the Combined Multivariate and Collapsing (CMC) method). A CMC method combines (a) collapsing variants and (b) performing a multivariate test. Variants are divided into subgroups based on specific criteria, such as minor allele frequency (MAF). A multivariate test is then performed. The T1 name comes from using a MAF threshold of 1% (i.e., all variants included in the test have MAF <1%). For further reading please see Li et al., Methods for Detecting Associations with Rare Variants for Common Diseases: Application to Analysis of Sequence Data, AJHG 2008.

#### T5

Identical to T1 except that the upper MAF threshold is now set at 5%

#### VT

The Variable Threshold (VT) approach assumes that there exists some threshold for which variants with a MAF below threshold “T” are substantially more likely to be functional than are variants with a MAF above T. A test (and the corresponding z-score) is performed for multiple selections of T. The T with a maximum z-score is then used for permutations on the phenotype (to evaluate type I error and correct for multiple testing). It thus tries to maximally differentiate between cases and controls by choosing the optimal MAF threshold. For further reading please see Price et al., Pooled Association Tests for Rare Variants in Exon-Resequencing Studies, AJHG 2010.

#### MB

The Madsen-Browning approach is a weighted-sum test where each variants is weighted differently based on their frequency in unaffected individuals. Rarer variants receive bigger weights than common variants. The weigth for each variant j is calculated as

$$W_j= 1 / {\sqrt{{MAF}_j * (1 - {MAF}_j)}}$$

It then uses a Wilcoxon rank-sum test and calculates p values by permutation. For further reading please see Madsen and Browning, A groupwise association test for rare mutations using a weighted sum statistic., PloS Gen 2009

A limitation of T1, T5, VT and MB is that they implicitly assume that all rare variants influence the phenotype in the same direction. However, most variants in a region might have little or no effect on the phenotype or may even be protective. Collapsing across all variants thus introduces noise and is likely to result in loss of statistical power. To solve this issue, tests like SKAT and SKAT-O were developed. These tests of course have drawbacks of their own. To choose the optimal test one needs to make assumptions about the directions of effect of the underlying variants (i.e., the underlying genetic architecture of disease at the gene/locus).

#### SKAT

SKAT uses a multiple regression model to directly regress the phenotype on genetic variants in a region and on covariates, and so allows different variants to have different directions and magnitude of effects, including no effect. SKAT also allows for epistatic effects (SNP-SNP interactions). Strictly speaking, SKAT and SKAT-O are “non-burden” tests because instead of aggregating variants, SKAT aggregates individual variant-score test statistics with weights when SNP effects are modeled linearly. SKAT is preferred when it is hypothesized that a region has both protective and deleterious variants or many non-causal variants. When it is hypothesized that a region contains predominantly true causal variants, then burden tests have more power. For further reading please see Wu et al., Rare-Variant Association Testing for Sequencing Data with the Sequence Kernel Association Test, AJHG 2011 and Ionita-Laza et al., Sequence Kernel Association Tests for the Combined Effect of Rare and Common Variants, AJHG 2013

#### SKAT-O

###### (Will be included in future release)

SKAT-O attempts to optimize whether the locus contains variants of all the same effect (e.g, the CMC test) or variants can be risk or protective or benign (e.g, the SKAT test). It tests for rare-variant effects by using the data to test mixtures of the burden and SKAT tests to find the optimal combination between the two to maximize power. Furthermore, SKAT has the tendency to produce conservative type I errors for small sample sizes. SKAT-O more precisely estimates variance and kurtosis of small samples and thus allows for more precise calculation of the reference distribution and thereby controls for type I error. For further reading please see Lee et al., Optimal Unified Approach for Rare-Variant Association Testing with Application to Small-Sample Case-Control Whole-Exome Sequencing Studies, AJHG 2012

Gene Expression

Variant Table

# Publications

Below you will find the Project MinE publications. You will also find summary statistics related to the papers. We have a data-sharing policy to provide a simple and transparent framework for the broad use of Project MinE data. If you would like to request individual level array data please see the 'Request Data' tab. The website also includes burden test results for > 15.000 genes.

Genome-wide association analyses identify new risk variants and the genetic architecture of amyotrophic lateral sclerosis

W. van Rheenen et al., Nature Genetics, 2016

To elucidate the genetic architecture of amyotrophic lateral sclerosis (ALS) and find associated loci, we assembled a custom imputation reference panel from whole genome- sequenced ALS patients and matched controls (N = 1,861). Through imputation and mixed-model association analysis in 12,577 cases and 23,475 controls, combined with 2,579 cases and 2,767 controls in an independent replication cohort, we fine mapped a novel locus on chromosome 21 and identified C21orf2 as an ALS risk gene. In addition, we identified MOBP and SCFD1 as novel associated risk loci. We established evidence for ALS being a complex genetic trait with a polygenic architecture. Furthermore, we estimated the SNP-based heritability at 8.5%, with a distinct and important role for low frequency (1 to 10%) variants. This study motivates the interrogation of larger sample sizes with full genome coverage to identify rare causal variants that underpin ALS risk.

NEK1 variants confer susceptibility to amyotrophic lateral sclerosis

K. Kenna et al., Nature Genetics, 2016

To identify genetic factors contributing to amyotrophic lateral sclerosis (ALS), we conducted whole-exome analyses of 1,022 index familial ALS (FALS) cases and 7,315 controls. In a new screening strategy, we performed gene-burden analyses trained with established ALS genes and identified a significant association between loss-of-function (LOF) NEK1 variants and FALS risk. Independently, autozygosity mapping for an isolated community in the Netherlands identified a NEK1 p.Arg261His variant as a candidate risk factor. Replication analyses of sporadic ALS (SALS) cases and independent control cohorts confirmed significant disease association for both p.Arg261His (10,589 samples analyzed) and NEK1 LOF variants (3,362 samples analyzed). In total, we observed NEK1 risk variants in nearly 3% of ALS cases. NEK1 has been linked to several cellular functions, including cilia formation, DNA-damage response, microtubule stability, neuronal morphology and axonal polarity. Our results provide new and important insights into ALS etiopathogenesis and genetic etiology.

# Overviews of Results

The results on this page will be updated as the dataset grows or when new types of analyses have been performed (prior to publication in a scientific journal).

1) Overview of the working groups and their results.
2) Overviews of burden analysis

The working groups within Project MinE have been initiate on March 27 2017. As soon as the first results are in, we will publish them on this webpage.

### Working Groups

Working Group 1 - Phenotyping
Chair: Matthieu Moisse, Belgium

Working Group 2 - Gene burden Case-Controls
Chair: Sara Pulit, the Netherlands

Working Group 3 - Epigenetic data
Chair: Jonathan Mill, UK

Working Group 4 - Data infrastructure
Chair: Alfredo Iacoangeli, UK

Working Group 5 - Repeat expensions
Chair: Joke van Vugt, the Netherlands

# Project MinE Data Request Form

If you are looking for summary level data, please see the 'Publications' tab.
Individual level array data can be requested as follows:

1. Fill in the form below, for questions email info@projectmine.com
2. Your research request will be evaluated by the Project MinE staff
3. If positive, you and your fellow investigators will receive a data access agreement to be signed
4. You will be granted access to the data using a suitable method
5. At publication add suitable acknowledgement

We also have a pdf-version of the form below which might be more convenient to send around to your fellow investigators.
The pdf is equivalent to the form you see below.

A. Contact Details

B. Specify dataset requested

C. Research Question, Goal, or Specific Aims

Provide a brief description (e.g., 1 paragraph) describing the aims of the proposal and the research questions to be addressed.

D. Analytic Plan

Provide a brief description of the analyses to be performed to address the research questions described above. Include relevant details e.g. phenotype definition, QC, analysis, plans to address population stratification and other co-founder, power.

E. Data Usage Policy

Please indicate whether you have read, understood and agreed to the Project Mine Data Usage Policy by checking the box

Thank you for you interest.
The submission of your data request form has been succesful.
Your request will be evaluated and you will be contacted by the Project MinE staff via e-mail.

Amyotrophic Lateral Sclerosis (ALS), also known as Lou Gehrigs disease, Charcot disease, or motor neuron disease (MND), is a very serious and debilitating neurodegenerative disease. In patients with ALS, the motor nerve cells (motor neurons) in the spinal cord, brainstem, and brain progressively deteriorate and die. Motor neurons stimulate the muscles in the body to action.

Because fewer signals are send to the muscles by the dying nerve cells, the disease leads to progressive muscle weakness. Most often, the first symptoms of ALS are reduced strength in the arms and/or legs, or difficulty speaking, swallowing, or breathing. As the nerve cells send fewer and fewer signals, the muscles begin to atrophy, meaning that they get smaller and thinner. After the nerve cells completely die, the patient is effectively paralyzed. As such, respiratory failure is the most common cause of death for people with ALS. The speed at which ALS progresses varies per person, but the average life expectancy once a positive diagnosis has been made is three years.

ALS usually causes no pain and has no effect on mental functioning. The senses (touch, taste, sight, smell, and hearing) also usually remain intact, as well as the functioning of the bowel and bladder. ALS can strike at any adult age, but most often onset of the disease occurs between the ages of 40 and 60. In all cases, ALS is fatal.

Only a small portion of the cases of ALS can be attributed to a familial form of the disease. This hereditary form is rare and is usually an already known possibility within the families it affects. Because the precise cause of ALS is not known, there is no cure and still no effective treatment for the disease.

We plan to map the full DNA profiles of at least 15,000 people with ALS and 7,500 control subjects, and to perform comparative analyses on the resulting data.

##### Groundbreaking research on an unprecedented scale

Such large-scale genetic research into the origins of ALS is unprecedented! As such, we are fully committed to making a revolutionary breakthrough in the search for the cause of ALS. But, to reach our goals, we need your help.

##### No treatment yet

More than 200,000 people worldwide are living with Amyotrophic Lateral Sclerosis (ALS), otherwise known as Motor Neuron Disease (MND) or Lou Gehrigs disease. Relatively little is known about the cause of this progressively degenerative neurological disease. There is still no treatment. The average life expectancy of ALS patients is three years.

##### Find a cure

It is almost certain that the disease has a genetic basis. Project MinE is a large-scale research initiative devoted to discovering the genetic cause of ALS. The ultimate goal is to identify genes that are associated with ALS. The function of these genes may lead to disease pathways for which treatment can be developed.

##### Help us

In order to reach this ambitious objective, we plan to map the full DNA profiles of at least 15,000 people with ALS and compare them to DNA profiles of 7,500 control subjects to uncover associations between specific variations in genes and ALS. This type of comparative research requires enormous numbers of DNA profiles and is very costly. That is why we need your help. Make a donation or start a campaign today! Project MinE, make it yours!

##### In what genome build are the variants?

All data is in build hg19/GRCh37.

##### Why do the allele frequencies of the variants differ from ExAC?

The allele frequencies are based on different datasets (also check 'Which datasets are used?').

##### Why are some variants that are shown in the burden test not present in the variant table?

This is because they result from different datasets. The burden test is based on 1776 WGS samples from the Netherlands only, comprising 1169 cases and 608 controls, whereas the variant table is based on Project MinE controls-only WGS data from Belgium, Ireland, Spain, The Netherlands, UK, USA, and Turkey, resulting in 1.968 genomes. Therefore it is possible that some variants that are found in the burden testing plot do not occur in the variant table and vice versa.

##### Can I get access to individual-level genotype data from the GWAS?

Yes, you can request array-based genotypes via the request form via the Request Data tab.

##### Which datasets are used to generate the data?

The datasets used per component is described in the table below:

*samples from Belgium, Ireland, Spain, The Netherlands, UK, USA, and Turkey

**EU=European, AA=African American, AS=Asian, NR=not reported

***http://www.gtexportal.org/home/tissueSummaryPage#sampleInfo