Analysis, Data-Management and Data-Publication

scRNA-Seq Analysis, Data-Management and Data-Publication

Rationale

This project will generate large quantities of transcriptomics, genomics and imaging data. To ensure long-term sustainability, reuse and immediate contribution to the Human Cell Atlas project, we will submit all raw data and annotations will to the Data Coordination Platform (HCA-DCP). Analysis results will be distributed through public added-value databases such as the EMBL-EBI’s Expression Atlas and we will actively seek their inclusion into infrastructures serving further communities such as medical research and ELIXIR.

Strategy

To analyse, visualise and distribute all data and methods, ensuring long-term sustainability by integrating and further developing existing resources.

Methods - Data Analysis

We will apply the EMBL-EBI pipelines to perform normalisation and clustering of data, as well as differential expression to derive marker detection. We will use Single Cell Consensus Clustering and graph-based clustering using Seurat. These cloud- based pipelines have been recently released as an Interactive Analysis Portal that also underlines the analysis performed for all data sets in EMBL-EBI’s Single Cell Expression Atlas. Trajectory analysis and differential expression tools are part of the same project and will be available to use on healthy and diseased datasets. We will derive novel cell-types with their markers and regulators and will submit their annotations to the Cell Type Ontology. We will explore differential expression between cell-types in healthy versus CD-disease states and identify any differences in their abundance.

All protocols and software will be submitted/published as appropriate and will be fully open-access or open-source. Analysis methods developed for the purposes of this project will be made available through the Hinxton Interactive Analysis Portal and shared rapidly with the community, enabling analysis of any data sets available through the HCA-DCP. The Hinxton Interactive Analysis Portal is a CZI-funded platform that accesses gene quantification data sets from the Data Coordination Platform and performs downstream analysis.

Spatial data analysis will be performed with the Comparative Workbench, a CZI-funded platform. The Comparative Workbench is an integrated tool for spatial data annotation, visualization and query of the data in the atlas context.

SC-Data Visualisation

All analysed cRNA-Seq data sets will be made available through the single cell component of Expression Atlas for search and visualisation. Current visualisations include t- SNE plots of cell clusters, cell types and gene expression gradients, as well as a heatmap-based visualisation of marker genes within each cluster of scRNA-Seq experiments. We will develop a 2D gut micro-anatomogram based on the coordinate framework described in the model development and CCF section [LINK]. This will be embedded to visualise the selected data and as a link to downstream analyses provided from the Single Cell Expression Atlas. Further, we will develop tools for easily visualising the differences between healthy and disease expression datasets, across the different stages of Crohn’s Disease. It is our common practice to develop such tools as widgets for embedding into other interfaces and web-services developed by the community. Within the context of this project we will share these visualisation widgets between the Comparative Workbench [LINK] and Expression Atlas to enable integrated visualisation of gene expression and image data sets. The Comparative Workbench for Atlas Data will be extended to include querying and visualising the SC data.

Data Distribution

All data and annotations will be submitted to the Data Coordination Platform (HCA-DCP) for analysis, integration, sharing and publication. We work closely already with the data ingestion team of the DCP at EMBL-EBI on submissions of scRNA-Seq data to the platform and will further liaise with the DCP team to ensure submission of the imaging data. Image submission to the DCP is under development. We will therefore collaborate with the DCP team to work on the metadata standards for gut data and related spatial annotations. Beyond the Human Cell Atlas community, we will leverage the existing links of Expression Atlas to the Open Targets Platform to ensure that single cell RNA-Seq results from healthy and Crohn’s Disease data sets will be made available to assist drug target validation and identification to the drug development community through this platform. Furthermore, through an Elixir Strategic Implementation Study, data sets in Expression Atlas will be made available as part of Elixir’s workflow orchestration service, therefore ensuring that experimental results from the Gut Cell Atlas can be distributed further to the ELIXIR community.

Expected Outcomes

  1. Annotated cell-types of healthy and Crohn’s gut cell atlas, including differences in cell- type composition and gene expression between healthy and diseased tissues.
  2. Differences in expression and cell-type composition across inflammatory and dysplastic stages of Crohn’s disease.
  3. Submission of scRNA-Seq and imaging data into the DCP.
  4. Alignment of imaging meta-data and gut coordinate framework with the DCP.
  5. Extended Comparative Workbench interface.
  6. Gut atlas interfaces for spatial annotation and query. These include a portal providing web-services responding to transformation requests between each atlas representation.
  7. 2D gut anatomogram widget.
  8. Tools and embeddable widgets for exploration of differential expression in scRNA-Seq.
  9. Availability of scRNA-Seq analysis results through Single Cell Expression Atlas.