HuBMAP Data Release Frequently Asked Questions

General Questions

What is HuBMAP & what are its goals?

Better insights into the principles governing tissue organization-function relationship will potentially lead to better understanding of the significance of normal inter-individual variability and changes across the lifespan, and inform about the emergence of disease at the biomolecular level before the appearance of clinical symptoms. Despite vastly improved imaging and omics technologies and many important foundational discoveries, our understanding of how tissues are organized is restricted by remaining main challenges: 1) integrating high content, high resolution spatial and omics information to comprehensively profile biomolecular distribution and morphology of tissues in a high throughput manner and 2) placing this information into 3D tissue maps amenable to modelling. The vision for the Human BioMolecular Atlas Program (HuBMAP) is to catalyze the development of a framework for mapping of the human body at single cell resolution to transform our understanding of normal tissue organization and function. This will be achieved by: - Accelerating the development of the next generation of tools and techniques for constructing high resolution spatial tissue maps that quantify multiple types of biomolecules either sequentially or simultaneously; - Generating foundational 3D human tissue maps using validated high-content, high-throughput imaging and omics assays; - Establishing an open data platform that will develop novel approaches to integrating, visualizing and modelling imaging and omics data to build multi-dimensional tissue maps, and making data rapidly findable, accessible, interoperable, and reusable by the global research community; - Coordinating and collaborating with other funding agencies, programs, and the biomedical research community to build the framework and tools for mapping the human body at single cell resolution; - Supporting pilot projects that demonstrate the value of the resources developed by the program to study normal individual variations and tissue changes across the lifespan and the health-disease continuum. [HuBMAP](https://commonfund.nih.gov/hubmap), which made the first external awards in the Fall 2018, is funded through the NIH Common Fund as a short-term (8 years), goal-driven strategic investment, with deliverables intended to catalyze research across multiple biomedical research disciplines. The [NIH Common Fund](https://commonfund.nih.gov/about) supports cross-cutting programs that are expected to have exceptionally high impact. [All Common Fund](https://commonfund.nih.gov/programs) initiatives invite investigators to develop bold, innovative, and often risky approaches to address problems that may seem intractable in isolation or to seize new opportunities that offer the potential for rapid progress. See also a video [HuBMAP Overview](https://www.youtube.com/watch?v=yCh4XnD7rEE). For a more in depth understanding, read the [HuBMAP marker paper](https://www.nature.com/articles/s41586-019-1629-x), or see a course on HuBMAP data acquisition, analysis, and visualization, The [Visible Human MOOoC](https://expand.iu.edu/browse/sice/cns/courses/hubmap-visible-human-mooc) Stay in touch by subscribing [to our mailing list](https://hubmapconsortium.org/hubmap-mailing-list) and our [YouTube channel](https://www.youtube.com/channel/UCbSvPJ9dXASL14KoDeutMFg).

Who are the intended users of HuBMAP resources?

Generating foundational 3D human tissue maps is one of the core goals of HuBMAP. HuBMAP projects will generate high resolution, high content, high-throughput biomolecular 3D tissue maps of non-diseased human organs and organ systems. For HuBMAP, a high-resolution assay is one that can reliably and reproducibly assign detected biomolecules to individual cells or extracellular compartments of a tissue. A high content approach is one that maximizes identification of tissue features through a combination of biomolecular depth, spatial resolution and multiplexing of complementary, multi-parameter assays. A high throughput pipeline is one that maximizes the bandwidth of data production to result in any or all of the following: 1) accelerated speed of analysis, so that hundreds or thousands of samples can be analyzed simultaneously, 2) greater depth of analysis, so that hundreds or thousands of molecules can be analyzed in a single sample, or 3) enhanced capacity for volume, so that a given set of molecules can be analyzed in all the cells within a larger tissue sample. Using a multi-dimensional approach, including imaging, sequencing, and mass spectrometry assays, HuBMAP provides robust molecular characterization of human cells in their natural tissue context. HuBMAP also generates and shares a number of other resources to support the use of these maps, including details of experimental protocols used, validation of affinity probes, biospecimen metadata, conventions used for annotation, as well as computational tools. The HuBMAP rich datasets and associated resources are intended for broad use by the research community, including: - Computational researchers exploring organizing principles of human tissues, new structural-functional relationships, and biomolecular networks - Biologists exploring hypotheses using publicly available HuBMAP datasets prior to or in parallel with work in their own labs - Experimentalists interested in using the same protocols or computational tools in their labs - Educators developing new teaching materials - Technology developers interested in developing new assys with enhanced performance

How does HuBMAP fit into the ecosystem of other single cell efforts within and outside the NIH?

HuBMAP is part of a rich ecosystem of established and emerging atlasing programs supported by NIH and globally by other funding organizations, many of which are focused on specific organs or diseases. HuBMAP has connected with these programs to ensure data interoperability, avoid duplication of work, and leverage and synergize gained knowledge. The consortium has organized a number of events to bring together these communities to discuss topics of shared interest (e.g. [CCF meeting, NIH-HCA meeting](https://hubmapconsortium.org/nihhca2020/)) and is committed to improving coordination and collaboration among different programs. In addition, many of the HuBMAP PIs had been or are still actively participating in these efforts, helping with cross-pollination and advancing our global understanding. HuBMAP, as its name implies, was specifically initiated to resolve the challenge of building integrated, comprehensive, high-resolution spatial maps of human tissues and organs, which has resulted in HuBMAP providing leadership in the ecosystem around techniques for integrating disparate, multi-dimensional and multi-scale datasets, the development of a Common Coordinate Framework (CCF) for integrating data across many individuals, and the development and validation of these assays. To further increase interoperability, HuBMAP has adopted a number of standards and processes developed by other domain expert consortia, working and is actively involved in the knowledge exchange. The consortium sees itself as an integral part of the ecosystem, sharing its strengths and actively contributing to the community.

Introduction to HuBMAP resources

We welcome all to explore the HuBMAP Portal and provide us with feedback through the Contact form.

The Visible Human MOOC provides an overview of HuBMAP and first introduction to data acquisition, analysis, and visualization.

Below are some FAQ to get you started.

Can I use HuBMAP data for my own research?

Yes, please follow the guidelines outlined in the [HuBMAP External Data Sharing Policy.](https://hubmapconsortium.org/policies/external-data-sharing-policy/) Access to NIH HuBMAP data is guided by the [NIH Genomic Data Sharing policy.](https://osp.od.nih.gov/scientific-sharing/genomic-data-sharing/) If you use NIH HuBMAP data in publications or presentations we request that you include an acknowledgement of the HuBMAP Program. This acknowledgement helps justify and sustain the funding needed to continue providing open access to a growing amount of data and tools. Suggested language for such an acknowledgment is: “The results \<published or shown\> here are in whole or part based upon data generated by the HuBMAP Program: https://hubmapconsortium.org."

Can I use your data for another project that I am working on?

Yes! We provide raw and processed (at multiple levels) data for the community to access through our [portal.](/) The products of HuBMAP will be made broadly available to the research community to establish the foundations for a human body map that other programs and the international community can build upon, including methods, tools, reagents, biospecimens, datasets, and software. To acknowledge HuBMAP in your findings, see “Can I use HuBMAP data for my own research?”.

Can I submit my own data to HuBMAP? Why would I want to submit data to HuBMAP?

Yes, HuBMAP aims to allow investigators to submit their own data via the HuBMAP Portal. Why share? Having your own data on HuBMAP will allow other researchers access to your results and provide additional resources for creating cellular and molecular level anatomical maps of the healthy human. In this way others may extend and interact with your scientific work. We also encourage the community to provide feedback about HuBMAP dataset metadata in order to increase the quality and usability of community data submissions. One of the first tools we will enable in the near future will let the users annotate cell-types in their own data based on HuBMAP approaches.

Can I add my own tools to HuBMAP portal? Why would I want to submit my tools to HuBMAP? How do I point users to the new tool I built and added to HuBMAP?

HuBMAP seeks to host relevant tools and we welcome community input to help with feature prioritization and development for the HuBMAP Portal. Adding your Tools to the HuBMAP Portal will help you get others to use your tools and provide feedback to improve the scientific impact of your work. For instance, very efficient software/statistical environments that would enable the community to deploy their own visualization tools especially for secondary analysis may exist already and could be added. We would invite you to comment whether it would be desirable for HuBMAP to provide facile access to data through API and/or downloads in enabling formats.

What kind of information is included in HuBMAP’s first data release?

The first release contains donor, tissue sample, and assay data & metadata for the following organs: heart, kidney, large intestine, lymph node, small intestine, spleen, and thymus. For additional information, please see [donor, tissue sample, and assay metadata](/metadata) as well as [assay details.](/assays)

What data modalities are covered in this first release?

Microscopy, Mass Spectrometry, and Sequencing data are available in the initial HuBMAP data release. Several assay types are available for each modality. More information can be found on the [list of available HuBMAP Assays.](/assays)

What metadata is included in HuBMAP? Will HuBMAP include additional metadata in future releases?

HuBMAP contains [donor metadata, sample metadata, and assay metadata.](/metadata) In the future, metadata will be linked to various ontologies to make integration more efficient.

What controlled vocabularies or ontologies are being used by HuBMAP?

Each donor metadata item uses Unified Medical Language System (UMLS) Concept Unique Identifiers (CUIs) and related SNOMEDCT_US codes with [complete list here.](/donor#more) This list will be expanded as clinical data transactions, not just metadata, are added for donors for which data is available. Similarly the other metadata will be encoded with applicable ontologies. The HuBMAP Knowledge Graph underpins all ontologies used in HuBMAP but is not yet deployed. The current CCF ontology uses Uberon, Kidney Tissue Atlas Ontology (KTAO) and Cell Ontology (CL), see details in https://arxiv.org/abs/2007.14474

How do I get started with using the portal?

The Portal is available at [portal.hubmapconsortium.org](/) and also includes documentation and a FAQ.

How can I learn more about HuBMAP data acquisition, analysis and visualization tools?

The [Visible Human MOOC](https://expand.iu.edu/browse/sice/cns/courses/hubmap-visible-human-mooc) provides an overview of HuBMAP and first introduction to data acquisition, analysis, and visualization.

What do I need to do to gain access to data? How can I download data? How do I get approval to access raw genetic data?

Access to the data portal is open to all interested viewers, without additional barriers (account creation, login, etc.) at portal.hubmapconsortium.org. Those interested in downloading available data will need to create an account within the data portal. Note that downloads of specific datasets will be anonymous. Downloads of these data sets require NIH approval and are therefore not anonymous.

How does HuBMAP curate and validate metadata?

HuBMAP Investigators are provided with a Github link to an assay-specific metadata template they can download, complete, and save as a TSV file. The completed metadata form is then sent to a HuBMAP Curator who runs a validation script to confirm all required fields are populated with the appropriate information in the syntactically correct format. Corrections are made if necessary and the HuBMAP team uploads the metadata.tsv and aligns it with the corresponding data. In the future, semantic validation through the use of ontological annotations will be supported where possible.

Can I sign up to be notified of data releases?

You can sign up for our mailing list at https://hubmapconsortium.org/hubmap-mailing-list/. Once you do, we’ll keep you informed on everything that is happening in HuBMAP, including future data releases.

What will HubMAP include in future releases? Will HuBMAP include additional tissues in future releases?

Future data releases will include items such as: - New Assay Data - Additional Existing Assay Data - Updated Metadata specifications - Updated CCF Ontology - Additional 3D Reference Organs - Updated Anatomical Structures, Cell Types, and Biomarkers (ASCT+B) tables - Standards & recommendations - QA/QC & curation recommendations - Search & navigation features - Visualization features - Cell annotations based on ASCT+B tables and Uberon Cell Ontology (link to preview) - Integrated analyses of multiple datasets from the same organ - Ability to map user-generated data onto HuBMAP references - Quality of Life enhancements For the next release, we currently have integrative analyses, additional spatial information for select organs (CCF), and submission of investigator data.

Is there a “MAP” in HuBMAP data release?

The HuBMAP map is three-dimensional (3D) to capture the 3D context of single-cells and anatomical structures. The first portal release features a 3D Visible Human common coordinate system (CCF) with two organs: kidney and spleen. A total of 116 samples from 27 donors provided by two Tissue Mapping Centers (TMCs) have been registered (or mapped) into this spatially and semantically explicit reference system. Use the [CCF Exploration User Interface](/ccf-eui) to explore data spatially and semantically. Watch a short video introduction [here.](https://www.youtube.com/watch?v=DDmP_7vDy-o)

Can I make HuBMAP data available through my resource?

You can use HuBMAP data for any purposes permitted by the Data Sharing Policy: https://hubmapconsortium.org/policies/external-data-sharing-policy/. The CCF 3D Reference Object Library provides anatomically correct reference organs. The organs are developed by a specialist in 3D medical illustration and approved by organ experts, see details [here.](https://hubmapconsortium.github.io/ccf/pages/ccf-3d-reference-library.html) Included in the 1st release are 10 organ objects that can be freely used in teaching, research, or commercial applications.

Where should I report any errors / make suggestions for new portal features or functionality / ask more questions about using the portal?

We welcome your comments and your help to identify errors and define priorities for future portal releases. You can provide error reports, make suggestions, or ask questions through the form at https://hubmapconsortium.org/contact-form/.

How would I cite and reference HuBMAP data and code?

To acknowledge HuBMAP data in publications or presentations, we suggest: “The results \<published or shown\> here are in whole or part based upon data generated by the HuBMAP Program: https://hubmapconsortium.org." The HuBMAP marker paper should be cited as: - Snyder, M.P., Lin, S., Posgai, A. et al. The human body at cellular resolution: the NIH Human Biomolecular Atlas Program. Nature 574, 187–192 (2019). https://doi.org/10.1038/s41586-019-1629-x. The Visible Human reference organs are freely available via the CCF 3D Reference Object Library. Please cite as: - Browne K, Cross LE, Herr, II BW, Record EG, Quardokus EM, Bueckle A, Börner K. 2020. [HuBMAP CCF 3D Reference Object Library.](https://hubmapconsortium.github.io/ccf/pages/ccf-3d-reference-library.html)

Can I use figures from HuBMAP in my publications?

Yes, as long as you cite the source of the figure.

Do you track my activities on your portal?

Yes, interactions with the site are recorded in server logs and on Google Analytics and are mapped to your IP address. In that regard the HuBMAP portal is no different from the rest of the internet.

What web-browsers are supported in the portal?

All modern, mainstream browsers are supported (i.e. Chrome, Edge, Firefox, Safari, etc.).

How can I run queries? What type queries are currently supported? Do you support saving my prior queries? Can you show me examples of queries run by others?

At this time, data is accessed solely through the HuBMAP portal and its visualization tools. The consortium is working on indexing genomics data to support queries in a future release.

Can I suggest types of queries I am interested in?

We are always happy to hear suggestions for additions and improvements. Please make any suggestions you have via the form at https://hubmapconsortium.org/contact-form.

Do you support an API for programmatic access?

The HuBMAP portal is built using an extensible API structure that supports all component interactions. APIs are being registered in [SmartAPI](https://smart-api.info/registry?q=hubmap). For external access to APIs, please submit a request to help@hubmapconsortium.org.

What are the current features of the API?

The HuBMAP APIs underpin all provenance, data access, processing, translation, search and access controls. The APIs also report the versions and uptime statuses of all the Docker containers that constitute HuBMAP’s microservices orchestration architecture.

What are the expectations about future API features?

APIs are extensible and are expected to be expanded to progressively. The next major set of APIs will deliver the underpinning transactions needed for semantic search.

What are the licensing requirements to use this data?

The CCF 3D Reference Object Library was released under [Attribution 4.0 International (CC BY 4.0).](https://creativecommons.org/licenses/by/4.0/)

What are the licensing requirements to use any of the software? Where is the software?

Most of the software implemented for HuBMAP is licensed under the [MIT License](https://en.wikipedia.org/wiki/MIT_License) or [GPL v3 License.](https://www.gnu.org/licenses/gpl-3.0.en.html) All source code is on GitHub at https://github.com/hubmapconsortium/. A few source code repositories have different open source licensing, which you can verify by viewing the LICENSE file in the respective repository.

What are the licensing requirements to use any of the APIs?

The HuBMAP APIs are released under [Creative Commons Attribution 4.0 International (CC BY 4.0)](https://creativecommons.org/licenses/by/4.0/) as is SmartAPI.

How do you plan to handle error fixes and updates? Will I be notified of these?

You can submit a bug or request a new feature in the Data Portal through the form at https://hubmapconsortium.org/contact-form/. To be sure you are up-to-date on all HuBMAP news, including updates to the Data Portal, sign up for the mailing list at https://hubmapconsortium.org/hubmap-mailing-list.

Experimental Design & Data Analysis Questions

Where can I find information about experimental design associated with HuBMAP data?

An overview of the Information on the experimental design and choice of modalities can be found within this reference: - Snyder, M.P., Lin, S., Posgai, A. et al. The human body at cellular resolution: the NIH Human Biomolecular Atlas Program. Nature 574, 187–192 (2019). https://doi.org/10.1038/s41586-019-1629-x ([PMC6800388](http://www.ncbi.nlm.nih.gov/pmc/articles/pmc6800388/)) Additional information on experimental design for each modality featured in the portal can be obtained on protocols.io as listed below. Further questions can be directed to the dataset contacts detailed within the portal. Overview protocols: #### University of Florida: - CODEX: https://www.protocols.io/view/hubmap-tmc-florida-zurich-codex-modality-overview-be9pjh5n - 10x: https://www.protocols.io/view/hubmap-uf-tmc-10x-genomics-scrnaseq-modality-overv-be79jhr6 - Imaging Mass Cytometry: https://www.protocols.io/view/imaging-mass-cytometry-modality-overview-bgatjsen #### Vanderbilt University: - Overview: https://www.protocols.io/view/vu-biomolecular-multimodal-imaging-center-biomic-k-bfskjncw #### UCSD: - https://www.protocols.io/view/human-kidney-urinary-tract-and-lung-cell-type-mapp-bj9wkr7e

Where can I find information about HuBMAP experimental protocols?

All published protocols that are used in HuBMAP are available on protocols.io here: https://www.protocols.io/groups/human-biomolecular-atlas-program-hubmap-method-development.

How do you define raw and processed data?

We define raw data as the data that comes directly off of the instrument (e.g. mass spectrometer, microscopy, etc.), while processed data has been transformed in some manner (e.g. normalization, background subtracted, aligned, etc.) and the level of processing is defined by the data state as detailed below. Data states are dependent upon the modality. In general, data state 0 (raw data) and state 1 (processed data) are available on the portal for downloading. Microscopy: | Data State | Description | Example file type | | --- | --- | --- | | 0 | Raw image data: This is the data that comes directly off the instrument without preprocessing. (may not always be included). | CZI, TIFF | | 1 | Processed data: Can include stitching, thresholding, background subtraction, z-stack alignment, deconvolution | CZI, TIFF, OME-TIFF| | 2 | Segmentation: Computationally predicted cell (nucleus, cytoplasm) and/or structural boundaries (tubules, ventricles, etc.) | CSV, TIFF | | 3 | Annotation (Cells and Structures): Interpretation of microscopy image and/or segmentation in terms of biology (e.g. unhealthy vs healthy, cell-type, function, functional region). | TIFF, PNG | Mass Spectrometry: | Data State | Description | Example file type | | --- | --- | --- | | 0 | Raw image data: This is the data that comes directly off the mass spectrometer without preprocessing; sometimes referred to as raw spectral data.| imzML | | 1 | Processed imaging MS data: Can include peak alignment, intensity normalization, m/z recalibration | CSV, OME-TIFF| Sequencing: | Data State | Description | Example file type | | --- | --- | --- | | 0 | Raw data: This is the raw sequence data (unprocessed) generated directly by the sequence instrument in files either with Phred quality scores (fastq). | FASTQ | | 1 | Aligned data: SAM files contain sequence data that has been aligned to a reference genome and includes chromosome coordinates. BAM files are compressed binary versions of SAM files. The reference genome used is hg38.| SAM, BAM|

In what formats is your data available?

Imaging based raw or processed data is available as TIFF or OME.TIFF formats. Segmented imaging data is generated as csv and TIFF formats. Annotated imaging data is TIFF, PNG and PDF. Raw sequence data is provided as fastq and metadata via tsv. Imaging mass spectrometry raw data is provided as is a .d and processed data is imzml, or a csv and a series of ome-tiffs.

Can I get the code that you used to process data?

All available code can be found on the HuBMAP github page (https://github.com/hubmapconsortium).

Are there recommended tools for handling large amounts of data from HuBMAP?

Answer coming soon.

Can I find an explanation for the choice of HuBMAP pipelines and algorithms?

Brief descriptions of the HuBMAP data analysis pipelines are available through the portal at https://portal.hubmapconsortium.org/docs/pipelines. All code made available to users can be found on the HuBMAP github page (https://github.com/hubmapconsortium).

Which samples should I consider to be technical replicates?

Technical replicates are repeated measurements of the same existing sample. As even serial tissue sections represent distinct samples, we do not consider any images of tissues to be technical replicates. Technical replicates for sequencing assays would be any sequencing libraries generated from the same sample or aliquot of cells or nuclei.

Which samples should I consider to be biological replicates?

Biological replicates are datasets from samples that originate from the same organ and organ donor. As such, each dataset within the HuBMAP database that is provided for a given donor organ for a comparable anatomical region/structure would be a biological replicate.

How are each organ/tissue handled to maintain high reproducibility and validity for the assay?

Protocols.io detailed processing with QA/QC [Human BioMolecular Atlas Program (HuBMAP) Method Development Community timeline](https://www.protocols.io/workspaces/human-biomolecular-atlas-program-hubmap-method-development)

How have antibodies been verified?

For the first data release, all antibodies were validated by individual groups. With the upcoming second data release in 2021, complete antibody information will be available, including antibody clone, vendor, RRID, conjugation information, etc. Additional antibody validation standards will be implemented in subsequent data releases. For the development of our antibody validation levels, we followed the antibody verification guidelines established in the following manuscripts: [A proposal for validation of antibodies.](https://pubmed.ncbi.nlm.nih.gov/27595404/) - Uhlen M, Bandrowski A, Carr S, Edwards A, Ellenberg J, Lundberg E, Rimm DL, Rodriguez H, Hiltke T, Snyder M, Yamamoto T. Nat Methods. 2016 Oct;13(10):823-7. doi: 10.1038/nmeth.3995. Epub 2016 Sep 5.PMID: 27595404 [The Antibody Society’s antibody validation webinar series](https://pubmed.ncbi.nlm.nih.gov/32748696/) - Voskuil, J., Bandrowski, A., Begley, C. G., Bradbury, A., Chalmers, A. D., Gomes, A. V., Hardcastle, T., Lund-Johansen, F., Plückthun, A., Roncador, G., Solache, A., Taussig, M. J., Trimmer, J. S., Williams, C., & Goodman, S. L. MAbs. 2020;12(1):1794421. doi:10.1080/19420862.2020.1794421. PMID: 32748696