Awesome Public Datasets
.. image:: https://cdn.rawgit.com/sindresorhus/awesome/d7305f38d29fed78fa85652e3a63e154dd8e8829/media/badge.svg :alt: Awesome :target: https://github.com/sindresorhus/awesome
This is a list of topic-centric public data sources <https://github.com/awesomedata/awesome-public-datasets>
_
in high quality. They are collected and tidied from blogs, answers, and user responses.
Most of the data sets listed below are free, however, some are not.
This project was incubated at OMNILab <https://github.com/OMNILab>
, Shanghai Jiao Tong University during Xiaming Chen's Ph.D. studies.
OMNILab is now part of the BaiYuLan Open AI community <https://github.com/Bai-Yu-Lan>
.
Other amazingly awesome lists can be found in sindresorhus's awesome <https://github.com/sindresorhus/awesome>
_ list.
NOTICE: This repo is automatically generated by apd-core <https://github.com/awesomedata/apd-core/tree/master/core>
.
Please DO NOT modify this file directly. We have provided a new way to contribute to this repo <https://github.com/awesomedata/apd-core/blob/master/CONTRIBUTING.md>
.
Join <https://join.slack.com/t/awesomedataworld/shared_invite/zt-dllew5xy-PJYi~mWUdY3hupohbmVZsA>
_
the slack community <https://awesomedataworld.slack.com>
_ for an instant touch of HQ data updates.
.. |OK_ICON| image:: https://raw.githubusercontent.com/awesomedata/apd-core/master/deploy/ok-24.png .. |FIXME_ICON| image:: https://raw.githubusercontent.com/awesomedata/apd-core/master/deploy/fixme-24.png
- |OK_ICON| I am well.
- |FIXME_ICON| Please fix me.
.. contents:: Table of Contents
Agriculture
-
|OK_ICON|
The global dataset of historical yields for major crops 1981–2016 - The Global Dataset of [...] <https://doi.pangaea.de/10.1594/PANGAEA.909132>
_ [Meta <https://github.com/awesomedata/apd-core/tree/master/core//Agriculture/Global-dataset-of-historical-yields-for-major-crops.yml>
_] -
|OK_ICON|
Hyperspectral benchmark dataset on soil moisture - This dataset was measured in a five-day [...] <https://doi.org/10.5281/zenodo.1227837>
_ [Meta <https://github.com/awesomedata/apd-core/tree/master/core//Agriculture/Hyperspectral-Benchmark-Dataset-On-Soil-Moisture.yml>
_] -
|OK_ICON|
Lemons quality control dataset - Lemon dataset has been prepared to investigate the [...] <https://github.com/softwaremill/lemon-dataset>
_ [Meta <https://github.com/awesomedata/apd-core/tree/master/core//Agriculture/Lemon-Dataset.yml>
_] -
|OK_ICON|
Optimized Soil Adjusted Vegetation Index - The IDB is a tool for working with remote sensing [...] <https://www.indexdatabase.de/db/i-single.php?id=63>
_ [Meta <https://github.com/awesomedata/apd-core/tree/master/core//Agriculture/Optimized Soil Adjusted Vegetation Index>
_] -
|FIXME_ICON|
U.S. Department of Agriculture's Nutrient Database <https://www.ars.usda.gov/northeast-area/beltsville-md/beltsville-human-nutrition-research-center/nutrient-data-laboratory/docs/sr28-download-files/>
_ [Meta <https://github.com/awesomedata/apd-core/tree/master/core//Agriculture/U.S.-Department-of-Agricultures-Nutrient-Database.yml>
_] -
|OK_ICON|
U.S. Department of Agriculture's PLANTS Database - The Complete PLANTS Checklist is nearly 7 [...] <https://plants.usda.gov/home/downloads>
_ [Meta <https://github.com/awesomedata/apd-core/tree/master/core//Agriculture/U.S.-Department-of-Agricultures-PLANTS-Database.yml>
_]
Architecture
- |OK_ICON|
Swiss Apartment Models - This dataset contains detailed data on 42,207 apartments (242,257 [...] <https://zenodo.org/record/7070952#.Y0mACy0RqO0>
_ [Meta <https://github.com/awesomedata/apd-core/tree/master/core//Architecture/appartment-models.yml>
_]
Biology
-
|OK_ICON|
1000 Genomes - The 1000 Genomes Project ran between 2008 and 2015, creating the largest [...] <https://www.internationalgenome.org/data>
_ [Meta <https://github.com/awesomedata/apd-core/tree/master/core//Biology/1000-Genomes.yml>
_] -
|OK_ICON|
ANHIR - Automatic Non-rigid Histological Image Registration (ANHIR) consists of 2D [...] <https://anhir.grand-challenge.org/>
_ [Meta <https://github.com/awesomedata/apd-core/tree/master/core//Biology/ANHIR.yml>
_] -
|OK_ICON|
American Gut (Microbiome Project) - The American Gut project is the largest crowdsourced [...] <https://github.com/biocore/American-Gut>
_ [Meta <https://github.com/awesomedata/apd-core/tree/master/core//Biology/American-Gut-Microbiome-Project.yml>
_] -
|OK_ICON|
BCNB - There are WSIs of 1058 patients, part of tumor regions are annotated in WSIs. Except [...] <https://bupt-ai-cz.github.io/BCNB/>
_ [Meta <https://github.com/awesomedata/apd-core/tree/master/core//Biology/BCNB.yml>
_] -
|OK_ICON|
Broad Bioimage Benchmark Collection (BBBC) - The Broad Bioimage Benchmark Collection (BBBC) [...] <https://www.broadinstitute.org/bbbc>
_ [Meta <https://github.com/awesomedata/apd-core/tree/master/core//Biology/Broad-Bioimage-Benchmark-Collection-BBBC.yml>
_] -
|OK_ICON|
Broad Cancer Cell Line Encyclopedia (CCLE) <http://www.broadinstitute.org/ccle/home>
_ [Meta <https://github.com/awesomedata/apd-core/tree/master/core//Biology/Broad-Cancer-Cell-Line-Encyclopedia-CCLE.yml>
_] -
|OK_ICON|
CIMA - CIMA dataset includes images of 2D histological microscopy tissue slices. <https://cmp.felk.cvut.cz/~borovji3/?page=dataset>
_ [Meta <https://github.com/awesomedata/apd-core/tree/master/core//Biology/CIMA.yml>
_] -
|OK_ICON|
Cell Image Library - This library is a public and easily accessible resource database of [...] <http://www.cellimagelibrary.org/home>
_ [Meta <https://github.com/awesomedata/apd-core/tree/master/core//Biology/Cell-Image-Library.yml>
_] -
|FIXME_ICON|
Complete Genomics Public Data - A diverse data set of whole human genomes are freely [...] <https://completegenomics.mgiamericas.com/demodata>
_ [Meta <https://github.com/awesomedata/apd-core/tree/master/core//Biology/Complete-Genomics-Public-Data.yml>
_] -
|OK_ICON|
CytoImageNet - A large-scale dataset of microscopy images. Contains 890,737 total grayscale [...] <https://www.kaggle.com/stanleyhua/cytoimagenet>
_ [Meta <https://github.com/awesomedata/apd-core/tree/master/core//Biology/CytoImageNet.yml>
_] -
|OK_ICON|
EBI ArrayExpress - ArrayExpress Archive of Functional Genomics Data stores data from high- [...] <http://www.ebi.ac.uk/arrayexpress/>
_ [Meta <https://github.com/awesomedata/apd-core/tree/master/core//Biology/EBI-ArrayExpress.yml>
_] -
|OK_ICON|
EBI Protein Data Bank in Europe - The Electron Microscopy Data Bank (EMDB) is a public [...] <https://www.ebi.ac.uk/emdb/>
_ [Meta <https://github.com/awesomedata/apd-core/tree/master/core//Biology/EBI-Protein-Data-Bank-in-Europe.yml>
_] -
|OK_ICON|
ENCODE project - The Encyclopedia of DNA Elements (ENCODE) Consortium is an ongoing [...] <https://www.encodeproject.org>
_ [Meta <https://github.com/awesomedata/apd-core/tree/master/core//Biology/ENCODE-project.yml>
_] -
|OK_ICON|
Electron Microscopy Pilot Image Archive (EMPIAR) - EMPIAR, the Electron Microscopy Public [...] <http://www.ebi.ac.uk/pdbe/emdb/empiar/>
_ [Meta <https://github.com/awesomedata/apd-core/tree/master/core//Biology/Electron-Microscopy-Pilot-Image-Archive-EMPIAR.yml>
_] -
|OK_ICON|
Ensembl Genomes <https://ensemblgenomes.org/>
_ [Meta <https://github.com/awesomedata/apd-core/tree/master/core//Biology/Ensembl-Genomes.yml>
_] -
|OK_ICON|
Gene Expression Omnibus (GEO) - GEO is a public functional genomics data repository [...] <http://www.ncbi.nlm.nih.gov/geo/>
_ [Meta <https://github.com/awesomedata/apd-core/tree/master/core//Biology/Gene-Expression-Omnibus-GEO.yml>
_] -
|OK_ICON|
Gene Ontology (GO) - GO annotation files <http://geneontology.org/docs/download-go-annotations/>
_ [Meta <https://github.com/awesomedata/apd-core/tree/master/core//Biology/Gene-Ontology-GO.yml>
_] -
|OK_ICON|
Global Biotic Interactions (GloBI) <https://github.com/jhpoelen/eol-globi-data/wiki#accessing-species-interaction-data>
_ [Meta <https://github.com/awesomedata/apd-core/tree/master/core//Biology/Global-Biotic-Interactions-GloBI.yml>
_] -
|OK_ICON|
Harvard Medical School (HMS) LINCS Project - The Harvard Medical School (HMS) LINCS Center is [...] <http://lincs.hms.harvard.edu>
_ [Meta <https://github.com/awesomedata/apd-core/tree/master/core//Biology/Harvard-Medical-School-LINCS-Project.yml>
_] -
|FIXME_ICON|
Human Genome Diversity Project - A group of scientists at Stanford University have [...] <http://www.hagsc.org/hgdp/files.html>
_ [Meta <https://github.com/awesomedata/apd-core/tree/master/core//Biology/Human-Genome-Diversity-Project.yml>
_] -
|OK_ICON|
Human Microbiome Project (HMP) - The HMP sequenced over 2000 reference genomes isolated from [...] <http://www.hmpdacc.org/reference_genomes/reference_genomes.php>
_ [Meta <https://github.com/awesomedata/apd-core/tree/master/core//Biology/Human-Microbiome-Project-HMP.yml>
_] -
|OK_ICON|
ICOS PSP Benchmark - The ICOS PSP benchmarks repository contains an adjustable real-world [...] <http://ico2s.org/datasets/psp_benchmark.html>
_ [Meta <https://github.com/awesomedata/apd-core/tree/master/core//Biology/ICOS-PSP-Benchmark.yml>
_] -
|OK_ICON|
International HapMap Project <http://hapmap.ncbi.nlm.nih.gov/downloads/index.html.en>
_ [Meta <https://github.com/awesomedata/apd-core/tree/master/core//Biology/International-HapMap-Project.yml>
_] -
|FIXME_ICON|
Journal of Cell Biology DataViewer <https://rupress.org/jcb/pages/jcb-dataviewer>
_ [Meta <https://github.com/awesomedata/apd-core/tree/master/core//Biology/Journal-of-Cell-Biology-DataViewer.yml>
_] -
|OK_ICON|
KEGG - KEGG is a database resource for understanding high-level functions and utilities of [...] <http://www.genome.jp/kegg/>
_ [Meta <https://github.com/awesomedata/apd-core/tree/master/core//Biology/KEGG.yml>
_] -
|OK_ICON|
NCBI Proteins <http://www.ncbi.nlm.nih.gov/guide/proteins/#databases>
_ [Meta <https://github.com/awesomedata/apd-core/tree/master/core//Biology/NCBI-Proteins.yml>
_] -
|OK_ICON|
NCBI Taxonomy - The NCBI Taxonomy database is a curated set of names and classifications for [...] <http://www.ncbi.nlm.nih.gov/taxonomy>
_ [Meta <https://github.com/awesomedata/apd-core/tree/master/core//Biology/NCBI-Taxonomy.yml>
_] -
|OK_ICON|
NCI Genomic Data Commons - The GDC Data Portal is a robust data-driven platform that allows [...] <https://gdc.cancer.gov/access-data/gdc-data-portal>
_ [Meta <https://github.com/awesomedata/apd-core/tree/master/core//Biology/NCI-Genomic-Data-Commons.yml>
_] -
|OK_ICON|
NIH Microarray data <ftp://ftp.ncbi.nih.gov/pub/geo/DATA/supplementary/series/GSE6532/>
_ [Meta <https://github.com/awesomedata/apd-core/tree/master/core//Biology/NIH-Microarray-data.yml>
_] -
|OK_ICON|
OpenSNP genotypes data - openSNP allows customers of direct-to-customer genetic tests to [...] <https://opensnp.org/>
_ [Meta <https://github.com/awesomedata/apd-core/tree/master/core//Biology/OpenSNP-genotypes-data.yml>
_] -
|OK_ICON|
Palmer Penguins - The goal of palmerpenguins is to provide a great dataset for data [...] <https://allisonhorst.github.io/palmerpenguins/>
_ [Meta <https://github.com/awesomedata/apd-core/tree/master/core//Biology/Palmer-Penguins.yml>
_] -
|OK_ICON|
Pathguid - Protein-Protein Interactions Catalog <http://www.pathguide.org/>
_ [Meta <https://github.com/awesomedata/apd-core/tree/master/core//Biology/Pathguid.yml>
_] -
|OK_ICON|
Protein Data Bank - This resource is powered by the Protein Data Bank archive-information [...] <http://www.rcsb.org/>
_ [Meta <https://github.com/awesomedata/apd-core/tree/master/core//Biology/Protein-Data-Bank.yml>
_] -
|OK_ICON|
Psychiatric Genomics Consortium - The purpose of the Psychiatric Genomics Consortium (PGC) is [...] <https://www.med.unc.edu/pgc/downloads>
_ [Meta <https://github.com/awesomedata/apd-core/tree/master/core//Biology/Psychiatric-Genomics-Consortium.yml>
_] -
|OK_ICON|
PubChem Project - PubChem is the world's largest collection of freely accessible chemical [...] <https://pubchem.ncbi.nlm.nih.gov/>
_ [Meta <https://github.com/awesomedata/apd-core/tree/master/core//Biology/PubChem-Project.yml>
_] -
|OK_ICON|
PubGene (now Coremine Medical) - COREMINE™ is a family of tools developed by the Norwegian [...] <https://www.coremine.com/>
_ [Meta <https://github.com/awesomedata/apd-core/tree/master/core//Biology/PubGene-now-Coremine-Medical.yml>
_] -
|OK_ICON|
Sanger Catalogue of Somatic Mutations in Cancer (COSMIC) - COSMIC, the Catalogue Of Somatic [...] <http://cancer.sanger.ac.uk/cosmic>
_ [Meta <https://github.com/awesomedata/apd-core/tree/master/core//Biology/Sanger-Catalogue-of-Somatic-Mutations-in-Cancer-COSMIC.yml>
_] -
|OK_ICON|
Sanger Genomics of Drug Sensitivity in Cancer Project (GDSC) <http://www.cancerrxgene.org/>
_ [Meta <https://github.com/awesomedata/apd-core/tree/master/core//Biology/Sanger-Genomics-of-Drug-Sensitivity-in-Cancer-Project-GDSC.yml>
_] -
|OK_ICON|
Sequence Read Archive(SRA) - The Sequence Read Archive (SRA) stores raw sequence data from [...] <http://www.ncbi.nlm.nih.gov/Traces/sra/>
_ [Meta <https://github.com/awesomedata/apd-core/tree/master/core//Biology/Sequence-Read-ArchiveSRA.yml>
_] -
|OK_ICON|
Serratus - Analysis of 7.1 million RNA/DNA sequencing datasets to discover the total [...] <https://github.com/ababaian/serratus/wiki/Access-Data-Release>
_ [Meta <https://github.com/awesomedata/apd-core/tree/master/core//Biology/Serratus-Open-Virome.yml>
_] -
|OK_ICON|
Stanford Microarray Data (Retired NOW) <http://smd.princeton.edu/>
_ [`Meta