Treffer: Association Plots visualize cluster-specific genes from high-dimensional transcriptomics data

Title:
Association Plots visualize cluster-specific genes from high-dimensional transcriptomics data
Authors:
Publication Year:
2022
Collection:
Max Planck Society: MPG.PuRe
Document Type:
Dissertation doctoral or postdoctoral thesis
File Description:
application/pdf
Language:
English
Relation:
info:eu-repo/semantics/altIdentifier/urn/https://refubium.fu-berlin.de/handle/fub188/35663
Rights:
info:eu-repo/semantics/openAccess ; https://creativecommons.org/licenses/by-nc-nd/4.0/
Accession Number:
edsbas.8D04E8C4
Database:
BASE

Weitere Informationen

A re-occurring problem in transcriptomics data analysis is the search for associations between clusters of conditions and the highly expressed genes these conditions share. Approaches to solve this problem occur in many forms, for instance, biclustering or the search for marker genes. While for small data sets identification of marker genes is fairly easy, for complex data sets such as single-cell RNA-seq it poses a significant challenge to analysis and visualization methods currently available. In particular, low-dimensional data representation methods such as principal component analysis (PCA) lead to information loss, as they do not show information contained in higher dimensions. In this thesis, we address this problem by presenting Association Plots (APs), a novel method for determining and visualizing cluster-specific genes in high-dimensional data. APs are derived from correspondence analysis (CA), a projection method similar to PCA, which however enables the joint embedding of genes and conditions. In such an embedding, genes associated to a cluster of conditions lie in a particular direction in high-dimensional space. Measuring distances between genes and conditions leads to APs which are independent of the data dimensionality and can aid in delineating marker genes. We present the application of APs to bulk- and single-cell RNA-seq data through several examples. First, we show the identification of marker genes using APs on Genotype Tissue Expression (GTEx) and 3k Peripheral Blood Mononuclear Cell (PBMC) data. Next, we present how APs aid in cell cluster annotation using a predefined list of marker genes on human cell atlas of fetal gene expression data. Simultaneously, we also demonstrate how to apply APs for studying similarities between clusters from the data, and we compare results from APs to results from existing differential expression testing tools. Finally, we demonstrate APL, the developed Bioconductor R package and shiny app. APL implements the concept of APs and is integrated with the Gene ...