Example: NEK1 or TARDBP



Welcome to the Project MinE Data Browser.



This open-access resource has been developed by Project MinE. Project MinE is an international collaboration of investigators aiming to unravel the genetic basis of Amytrofic Lateral Sclerosis (ALS). The Project MinE ALS sequencing Consortium has set out to collect whole-genome sequencing (WGS) of 15,000 ALS patients and 7,500 controls at 30X coverage. Project MinE is a largely crowd-funded initiative. As such, we are committed to sharing data and results with the scientific and healthcare communities, as well as the public more broadly. Therefor, we have create the Project MinE databrowser. We aim to share raw sequence data, provide results from our analyses, and facilitate interpretation through integration with existing datasets to serve researchers and the public across disciplines. The current dataset contains WGS data from 4,366 ALS cases and 1,832 controls. All of these 6,198 profiles have passed quality control. The data are freely available, but please see and adhere to the data access policy

The pilot paper describing Project MinE has been published in the EJHG and available here

The preprint describing the databrowser is available on bioRxiv


Info





Coverage



Burden Testing

Genetic association testing (i.e., genome-wide association studies or GWAS) of common variants (minor allele frequency (MAF) > 1%) typically interrogate single variants through application of linear or logistic regression models. When testing rare variation for association to disease, however, statistical power to detect association decreases rapidly (due to low counts of alleles observed in the data). An alternate approach to single-SNP testing is to combine information across multiple variant sites contained within a specific region (typically defined by a gene and therefore referred to as genic burden testing). Aggregating information across multiple sites can improve power to detect association signals and reduce the multiple testing burden. We have performed rare variant burden using firth logistic regression.

The figure you see below shows the exons (orange blocks) in this gene with the variants (triangles) that were observed in the current dataset. Please hover over the variants for more information. If you are interested, you can zoom in by clicking and draging your mouse across the x-axis.

Various subsets of variants can be made (e.g. MAF<1%, MAF<0.5% or only those variants that were not observed in ExAC).

There is a tradeoff between including more variants, which might yield higher statistical power to detect association, and including to much noise and therefor reduce power. We have performed firth logistic regression on only disruptive variants (fewer variants, but hopefully a good signal to noise ratio), on disruptive + damaging variants and disruptive + damaging + missense-non-damaging variants (many variants, more noise). You can select which subset you are interested in.

If you have selected the subset of variants that you are interested in, then the genic burden plot will be updated. The mini-manhattan plots, which you can see below the genic burden pot, for Family-wise, pathways and drugable categories will be updated as well. For more information on those plots, please see the background infomation on each of the tabs.
Based on the selection made above (MAF and variant impact) the plots below will be updated.

The genesets have been made based on the SuperFamilies to which a gene belongs. This information has been extracted from Ensembls BioMart GRch37. If no data is shown, that either means this gene is not included in a SuperFamily, or we have insufficient data for the geneset. If there are multiple genesets to which this gene belongs, then please select the one you are interested in by using the dropdown on the topleft.

Firth logistic regression has been used to test the variants in this geneset for association. What you see here is a summary with p-value, beta and SE, the number of unique variants that were observed in either cases or controls, the number of cases with at least one of those unique variants and the number of controls with at least one of those unique variants.

The manhattan plot indicates the genic burden information for all genes in this geneset. If there is a signal for the aggregated geneset, then this plot indicates which genes drive that signal. Please hover over the individual genes to view additional information about that gene. If the plot is crowded, please feel free to zoom in by clicking and dragging along the x-axis.
Based on the selection made above (MAF and variant impact) the plots below will be updated.

The genesets have been made based on the KEGG, Biocarta and Reactome pathways to which a gene belongs. This information has been extracted from GSEA. If no data is shown, that either means this gene is not included in a pathway, or we have insufficient data for the geneset. If there are multiple genesets to which this gene belong, the please select the one you are interested in by using the dropdown on the left.

Firth logistic regression has been used to test the variants in this geneset for association. What you see here is a summary with p-value, beta and SE, the number of unique variants that were observed in either cases or controls, the number of cases with at least one of those unique variants and the number of controls with at least one of those unique variants.

The manhattan plot indicates the genic burden information for all genes in this geneset. If there is a signal for the aggregated geneset, then this plot indicates which genes drive that signal. Please over over the individual genes to view additional information about that gene. If the plot is crowded, please feel free to zoom in by clicking and dragging along the x-axis.
Based on the selection made above (MAF and variant impact) the plots below will be updated.

The genesets have been made based on the drugable category to which a gene belongs. This information has been extracted from The Drug Gene Interaction Database. If no data is shown, that either means this gene is not included in a drugable category, or we have insufficient data for the geneset. If there are multiple genesets to which this gene belong, the please select the one you are interested in by using the dropdown on the left.

Firth logistic regression has been used to test the variants in this geneset for association. What you see here is a summary with p-value, beta and SE, the number of unique variants that were observed in either cases or controls, the number of cases with at least one of those unique variants and the number of controls with at least one of those unique variants.

The manhattan plot indicates the genic burden information for all genes in this geneset. If there is a signal for the aggregated geneset, then this plot indicates which genes drive that signal. Please over over the individual genes to view additional information about that gene. If the plot is crowded, please feel free to zoom in by clicking and dragging along the x-axis.


Gene Expression



Variant Table





Literature





Project MinE - WGS Data Freeze 1


Single variant association

Single variants results are shown to indicate well-behaved test statistics





Burden testing

Exome Wide burden



Tissue Specific











Genic











Family-wise











Pathways











Drugable











Genic Burden Testing - Chinese ancestry








Frequently Asked Questions


How many cases and controls are in the current dataset?

The depth of coverage, burden testing results and variant frequencies are based on on 4,366 Amyotrofic Lateral Sclerosis cases and 1,832 age and gender matched controls.


In what genome build are the variants?

All data is in build hg19/GRCh37.


Why do the allele frequencies of the variants differ from ExAC/gnomAD?

The allele frequencies are based on different datasets.
As a population reference we have added the gnomAD annotation of both genomes and exomes.
Please see the variant table on the gene specific pages.


Can I get a copy of the 2016 GWAS summary statistics?

Yes you can, either click here , or go to https://www.projectmine.com/research/publications/


Can I get a copy of the most recent burdentesting results based on the Project MinE WGS dataset?

Yes you can, please click here


I would like to check for duplicated samples between my own dataset and the Project MinE WGS dataset, how do I do that?

Simple, just email us at info@projectmine.com and indicate that you would like to check for duplicates. We will then send you an archive with checksum hashes for all the samples


Can I get access to individual-level genotype data?

Yes, if you wish to request data, please visit https://www.projectmine.com/research/data-sharing/


Are there any restrictions on data usage?

Please read the Project MinE Data Access Policy


What are the contact details for Project MinE?

If you have additional questions, please send an email to info@projectmine.com or visit our website at www.projectmine.com







Download


Terms of Use

Project MinE has made the full results from all published Project MinE studies available for download. If you download these data, you and your immediate collaborators (“investigators”) acknowledge and agree to all of the following conditions:

1) These data are provided on an "AS-IS" basis, without warranty of any type, expressed or implied, including but not limited to any warranty as to their performance, merchantability, or fitness for any particular purpose;
2) Investigators will use these results for scientific research and educational use only;
3) Downloaded Project MinE results can be shared among collaborators but the reposting or public distribution of Project MinE results files is prohibited;
4) Investigators certify that they are in compliance with all applicable local, state, and federal laws or regulations and institutional policies regarding human subjects and genetics research;
5) Investigators will cite the appropriate Project MinE publication(s) in any communications or publications arising directly or indirectly from these data;
6) For utilization of data available prior to publication, investigators will respect the requested responsibilities of resource users under 2003 Fort Lauderdale principles, which is detailed in the README file associated with the data set; and
7) Investigators will never attempt to identify any participant.

If investigators use these data, any and all consequences are entirely their responsibility.

To download data:

Please indicate your name

Please indicate a valid email address

Please indicate your institute

Agree to the terms and you will be able to download the data corresponding to each paper.






Genome-wide association analyses identify new risk variants and the genetic architecture of amyotrophic lateral sclerosis

W. van Rheenen et al., Nature Genetics, 2016


To elucidate the genetic architecture of amyotrophic lateral sclerosis (ALS) and find associated loci, we assembled a custom imputation reference panel from whole genome- sequenced ALS patients and matched controls (N = 1,861). Through imputation and mixed-model association analysis in 12,577 cases and 23,475 controls, combined with 2,579 cases and 2,767 controls in an independent replication cohort, we fine mapped a novel locus on chromosome 21 and identified C21orf2 as an ALS risk gene. In addition, we identified MOBP and SCFD1 as novel associated risk loci. We established evidence for ALS being a complex genetic trait with a polygenic architecture. Furthermore, we estimated the SNP-based heritability at 8.5%, with a distinct and important role for low frequency (1 to 10%) variants. This study motivates the interrogation of larger sample sizes with full genome coverage to identify rare causal variants that underpin ALS risk.

Download GWAS sumstats


The Project MinE databrowser: bringing large-scale whole-genome sequencing in ALS to researchers and the public.

The Project MinE ALS Sequencing Consortium, Manuscript in preparation


Amyotrophic lateral sclerosis (ALS) is a rapidly progressive fatal neurodegenerative disease affecting 1 in 350 people. Project MinE is an international collaboration with the aim of whole-genome sequencing at least 15,000 amyotrophic lateral sclerosis (ALS) patients and 7,500 controls. Here, we present the Project MinE data browser (databrowser.projectmine.com), a unique and intuitive one-stop open-access server that presents detailed information on the genetic variation in a growing set of 4,389 ALS cases and 1,846 matched controls. It allows a user to simply enter a gene and immediately access a number of results that would otherwise require analytic expertise, computational time and several visits to external resources. Through the browser, we present a wide range of rare-variant burden tests at a single variant resolution, suitable for both a scientific and non-scientific audience. To further facilitate interpretation, we integrated exome and genome variant information from gnomAD, tissue expression from GTEx, variant annotation based on dbSNP, dbNSFP, NextProt, Motif and ClinVar, and a real-time overview of gene-relevant literature. The data browser is based on the statistical programming language R, combined with the interactive web application framework Shiny. This unique combination of detailed (meta-) data, genetic association statistics and shear size of our dataset distinguishes it from other browsers. Through its visual components and interactive design, the browser specifically aims to be a resource for those with an interest in ALS, including researchers, clinicians and the public.

Download WGS sumstats
Download all WGS Burden results