Single-cell multi-omics integration
This article, Single-cell multi-omics integration, has recently been created via the Articles for creation process. Please check to see if the reviewer has accidentally left this template after accepting the draft and take appropriate action as necessary.
Reviewer tools: Inform author |
Single-cell multi-omics integration describes a suite of computational methods used to harmonize information from multiple "omes" to jointly analyze biological phenomena[1][2][3][4]. This approach allows researchers to discover intricate relationships between different chemical-physical modalities by drawing associations across various molecular layers simultaneously. Multi-omics integration approaches can be categorized into four broad categories: Early integration, intermediate integration, late integration, and mixed integration methods[5]. Multi-omics integration can enhance experimental robustness by providing independent sources of evidence to address hypotheses, leveraging modality-specific strengths to compensate for another's weaknesses through imputation, and offering cell-type clustering and visualizations that are more aligned with reality[1][2].
Background
The emergence of single-cell sequencing technologies has revolutionized our understanding of cellular heterogeneity, uncovering a nuanced landscape of cell types and their associations with biological processes. Single-cell omics technologies has extended beyond the transcriptome to profile diverse physical-chemical properties at single-cell resolution, including whole genomes/exomes, DNA methylation, chromatin accessibility, histone modifications, epitranscriptome (e.g., mRNAs, microRNAs, tRNAs, lncRNAs), proteome, phosphoproteome, metabolome, and more[3][6][7]. In fact, there is an expanding repository of publicly available single-cell datasets, exemplified by growing databases such as the Human Cell Atlas Project (HCA), the Cancer Genome Atlas (TCGA), and the ENCODE project[8][9][10][11][12]. With the increasing diversity in both available datasets and data types, multi-omics data integration and multimodal data analysis represent pivotal trajectories for the future of systems biology.
Single-cell multi-omics integration can reveal underappreciated relationships between chemical-physical modalities, broaden our definition of cell states beyond single modality feature profiles, and provide independent evidence during analysis to support testing of biological hypotheses. However, the high dimensionality (features > observations), high degree of stochastic technical and biological variability, and sparsity of single-cell data (low molecule recovery efficiency) make computational integration a challenging problem[13][14][15][16]. Furthermore, different solutions for multi-omics integration are available depending on factors such as whether the data is matched (simultaneous measurements derived from the same cell) or unmatched (measurements derived from different cells), whether cell-type annotations are available, or whether modality feature conversion is available, with different implementations tailored to suit the specific use case[1]. As such, there are multiple approaches to single-cell data integration, each with a distinct use case, and each with its own set of advantages and disadvantages[1][5][17].
Methodology
Early Integration
Early integration involves concatenating two or more omic datasets (eg. scRNA-seq data and scATAC-seq data) into a single merged data matrix. Despite the advantages of simplicity and being able to consider dependencies between features, the inherent nature of concatenating two datasets together results in differing dimensions and scales among features. More importantly, the resulting matrix would become an even higher dimensional dataset (hence dimensionality reduction is often necessary). To mitigate these issues, strategies like feature selection and dimensionality reduction (eg. PCA, CCA, NMF) are employed - and as mentioned earlier, is often necessary. Regardless, due to these challenges, early data integration has most commonly been used to concatenate different datasets of the same datatype (eg. Integrating two different scRNA-seq datasets).
Intermediate Integration
dfadsf
Mixed Integration
hdfghdfh
Late Integration
dgfgfsgsdg
Dimensionality Reduction
ghdfh
Considerations of Data Integration
Noise
dfasdf
Dataset Compatibility
safasdfdsfds
Dimensionality
dfsdfadfds
Oversimplification of Modality Mapping
dsafsfdfa
Interpretability and Validation
asfdfsfd
Matched and Unmatched Data
dfdasfasfdasf
Applications and Uses
While single-modality datasets have proven to be a mainstay in systems biology, combining biological information across multiple modalities has the potential to address biological questions that cannot be inferred by a single data type alone. For example, the integration of transcriptome and DNA accessibility has enabled the development of bioinformatic tools to infer cell-type-specific gene regulatory networks[18][19][20]. This is achieved by leveraging transcription factor and target gene expression along with cis-regulatory information to impute relevant transcription factors and their regulatory partners. Another application for multi omics integration is in expanding definitions of cell states incorporating features observed across multiple modalities. For instance, integrating protein marker detection with transcriptome profiling using a multi-omics sequencing technology such as CITE-seq can resolve cell state signatures based on joint gene regulatory and surface marker expression[21]. This enables more robust inferences regarding cellular phenotypes, which are akin to and directly comparable with results from classical flow cytometry. Moreover, defining cell states based on clustering analysis within an integrated latent space may offer more stable estimations of cellular phenotypes compared to analysis within a single-modality latent space[1]. Furthermore, multi omics integration can overcome modality-specific limitations. For example, most spatial transcriptomic sequencing technologies suffer from limited spatial resolution (pixels comprising a mixture of local cells) and low feature complexity[22]. Integration of spatial transcriptomics with scRNAseq can help overcome these limitations by supporting the spatial deconvolution of low-resolution readouts and estimating the frequencies of each cell type[23][24].
References
- ^ a b c d e Miao, Zhen; Humphreys, Benjamin D; McMahon, Andrew P; Kim, Junhyong (2021). "Multi-omics integration in the age of million single-cell data". Nat Rev Nephrol. 17 (11): 710–724. doi:10.1038/s41581-021-00463-x. PMC 9191639. PMID 34417589.
- ^ a b Subramanian, Indhupriya (2020). "Multi-omics Data Integration, Interpretation, and Its Application". Bioinform Biol Insights. 14. doi:10.1177/1177932219899051. PMID 32076369.
- ^ a b Stuart, Tim; Sajita, Rahul (2019). "Integrative single-cell analysis". Nat Rev Genet. 20 (5): 257–272. doi:10.1038/s41576-019-0093-7. PMID 30696980. S2CID 59409752.
- ^ Li, Yunjin; Ma, Lu; Wu, Duojiao; Chen, Geng (2021). "Advances in bulk and single-cell multi-omics approaches for systems biology and precision medicine". Brief Bioinform. 22 (5). doi:10.1093/bib/bbab024. PMID 33778867.
- ^ a b Adossa, Nigatu; Khan, Sofia; Rytkönen, Kalle T; Elo, Laura L (2021). "Computational strategies for single-cell multi-omics integration". Comput Struct Biotechnol J. 19: 2588-2596. doi:10.1016/j.csbj.2021.04.060.
- ^ Baysoy, Alev; Bai, Zhiliang; Satija, Rahul; Fan, Rong (2024). "The technological landscape and applications of single-cell multi-omics". Nat Rev Mol Cell Biol. 24 (10): 695–713. doi:10.1038/s41580-023-00615-w. PMC 10242609. PMID 37280296.
- ^ Macaulay, Iain C; Ponting, Chris P; Voet, Thierry (2017). "Single-Cell Multiomics: Multiple Measurements from Single Cells". Trends Genet. 33 (2): 155-168. doi:10.1016/j.tig.2016.12.003.
- ^ Regev, Aviv; Teichmann, Sarah A; Lander, Eric S; Amit, Ido; Benoist, Christophe; Birney, Ewan; Bodenmiller, Bernd; Campbell, Peter; Carninci, Piero; Clatworthy, Menna; Clevers, Hans; Deplancke, Bart; Dunham, Ian; Eberwine, James; Eils, Roland; Enard, Wolfgang; Farmer, Andrew; Fugger, Lars; Göttgens, Berthold; Hacohen, Nir; Haniffa, Muzlifah; Hemberg, Martin; Kim, Seung; Klenerman, Paul; Kriegstein, Arnold; Lein, Ed; Linnarsson, Sten; Lundberg, Emma; Lundeberg, Joakim; Majumder, Partha; Marioni, John C; Merad, Miriam; Mhlanga, Musa; Nawijn, Martijn; Netea, Mihai; Nolan, Garry; Pe'er, Dana; Phillipakis, Anthony; Ponting, Chris P; Quake, Stephen; Reik, Wolf; Rozenblatt-Rosen, Orit; Sanes, Joshua; Satjia, Rahul; Schumacher, Ton N; Shalek, Alex; Shapiro, Ehud; Sharma, Padmanee; Shin, Jay W; Stegle, Oliver; Stratton, Michael; Stubbington, Michael J T; Theis, Fabian J; Uhlen, Matthias; Van Oudenaarden, Alexander; Wagner, Allon; Watt, Fiona; Weissman, Jonathan; Wold, Barbara; Xavier, Ramnik; Yosef, Nir (2017). "The Human Cell Atlas". eLife. 6. doi:10.7554/eLife.27041. PMC 5762154. PMID 29206104.
- ^ Lindeboom, Rik G.H; Regev, Aviv; Teichmann, Sarah A (2021). "Towards a Human Cell Atlas: Taking Notes from the Past". Trends Genet. 37 (7): 625–630. doi:10.1016/j.tig.2021.03.007. PMID 33879355.
- ^ Weinstein, John N; Collisson, Eric A; Mills, Gordon B; Shaw, Kenna R Mills; Ozenberger, Brad A; Ellrott, Kyle; Shmulevich, Ilya; Sander, Chris; Stuart, Joshua M (2013). "The Cancer Genome Atlas Pan-Cancer analysis project". Nat Genet. 45 (10): 1113-1120. doi:10.1038/ng.2764. PMID 24071849.
- ^ The ENCODE Project Consortium (2012). "An integrated encyclopedia of DNA elements in the human genome". Nature. 489 (7414): 57–74. doi:10.1038/nature11247. PMC 3439153. PMID 22955616.
- ^ The ENCODE Project Consortium (2020). "Expanded encyclopaedias of DNA elements in the human and mouse genomes". Nature. 583 (7818): 699-710. doi:10.1038/s41586-020-2493-4. PMID 32728249.
- ^ Lähnemann, David; Köster, Johannes; Szczurek, Ewa; McCarthy, Davis J; Hicks, Stephanie C; Robinson, Mark D; Vallejos, Catalina A; Campbell, Kieran R; Beerenwinkel, Niko; Mahfouz, Ahmed; Pinello, Luca; Skums, Pavel; Stamatakis, Alexandros; Attolini, Camille Stephan-Otto; Aparicio, Samuel; Baaijens, Jasmijn; Balvert, Marleen; Barbanson, Buys De; Cappuccio, Antonio; Corleone, Giacomo; Dutilh, Bas E; Florescu, Maria; Guryev, Victor; Holmer, Rens; Jahn, Katharina; Lobo, Thamar Jessurun; Keizer, Emma M; Khatri, Indu; Kielbasa, Szymon M; Korbel, Jan O; Kozlov, Alexey M; Kuo, Tzu-Hao; Lelieveldt, Boudewijn P.F; Mandoiu, Ion I; Marioni, John C; Marschall, Tobias; Mölder, Felix; Niknejad, Amir; Rączkowska, Alicja; Reinders, Marcel; Ridder, Jeroen De; Saliba, Antoine-Emmanuel; Somarakis, Antonios; Stegle, Oliver; Theis, Fabian J; Yang, Huan; Zelikovsky, Alex; McHardy, Alice C; Raphael, Benjamin J; Shah, Sohrab P; Schönhuth, Alexander (2020). "Eleven grand challenges in single-cell data science". Genome Biol. 21 (1): 31. doi:10.1186/s13059-020-1926-6. PMC 7007675. PMID 32033589.
- ^ Santiago-Rodriguez, Tasha M; Hollister, Emily B (2021). "Multi 'omic data integration: A review of concepts, considerations, and approaches". Semin Perinatol. 45 (6). doi:10.1016/j.semperi.2021.151456. PMID 34256961. S2CID 235822759.
- ^ Yuan, Guo-Cheng; Cai, Long; Elowitz, Michael; Enver, Tariq; Fan, Guoping; Guo, Guoji; Irizarry, Rafael; Kharchenko, Peter; Kim, Junhyong; Orkin, Stuart; Quackenbush, John; Saadatpour, Assieh; Schroeder, Timm; Shivdasani, Ramesh; Tirosh, Itay (2017). "Challenges and emerging directions in single-cell analysis". Genome Biol. 18 (1): 84. doi:10.1186/s13059-017-1218-y. PMC 5421338. PMID 28482897.
- ^ Argelaguet, RICARD; Cuomo, Anna S. E; Stegle, Oliver; Marioni, John C (2021). "Computational principles and challenges in single-cell data integration". Nat Biotechnol. 39 (10): 1202-1215. doi:10.1038/s41587-021-00895-7. PMID 33941931. S2CID 233722751.
- ^ Wu, Yan; Zhang, Kun (2020). "Tools for the analysis of high-dimensional single-cell RNA sequencing data". Nat Rev Nephrol. 16 (7): 408-421. doi:10.1038/s41581-020-0262-0. S2CID 214672522.
- ^ Kim, Daniel; Tran, Andy; Kim, Hani Jieun; Lin, Yingxin; Yang, Jean Yee Hwa; Yang, Pengyi (2023). "Gene regulatory network reconstruction: harnessing the power of single-cell multi-omic data". npj Syst Biol Appl. 9 (1): 51. doi:10.1038/s41540-023-00312-6. PMC 10587078. PMID 37857632.
- ^ Bravo González-Blas, Carmen; De Winter, Seppe; Hulselmans, Gert; Hecker, Nikolai; Matetovici, Irina; Christiaens, Valerie; Poovathingal, Suresh; Wouters, Jasper; Aibar, Sara; Aerts, Stein (2023). "SCENIC+: single-cell multiomic inference of enhancers and gene regulatory networks". Nat Methods. 20 (9): 1355–1367. doi:10.1038/s41592-023-01938-4. PMC 10482700. PMID 37443338.
- ^ Fleck, Jonas Simon; Jansen, Sophie Martina Johanna; Whollny, Damian; Zenk, Fides; Seimiya, Makiko; Jain, Akanksha; Okamoto, Ryoko; Santel, Malgorzata; He, Zhisong; Camp, J. Gray; Treutlein, Barbara (2023). "Inferring and perturbing cell fate regulomes in human brain organoids". Nature. 621 (7978): 675-372. doi:10.1038/s41586-022-05279-8. PMID 36198796.
- ^ Stoeckius, Marlon; Hafemeister, Christoph; Stephenson, William; Houck-Loomis, Brian; Chattopadhyay, Pratip K; Swerdlow, Harold; Sajita, Rahul; Smibert, Peter (2017). "Simultaneous epitope and transcriptome measurement in single cells". Nat Methods. 14 (9): 865–868. doi:10.1038/nmeth.4380. PMC 5669064. PMID 28759029.
- ^ Atta, Lyla; Fan, Jean (2021). "Computational challenges and opportunities in spatially resolved transcriptomic data analysis". Nat Commun. 12 (1): 5283. doi:10.1038/s41467-021-25557-9. PMC 8421472. PMID 34489425.
- ^ Andersson, Alma; Bergenstråhle, Joseph; Asp, Michaela; Jurek, Aleksandra; Fernández Navarro, José; Lundeberg, Joakim (2020). "Single-cell and spatial transcriptomics enables probabilistic inference of cell type topography". Commun Biol. 3 (1): 565. doi:10.1038/s42003-020-01247-y. PMID 33037292.
- ^ Ma, Ying; Zhou, Xiang (2022). "Spatially informed cell-type deconvolution for spatial transcriptomics". Nat Biotechnol. 40 (9): 1349-1359. doi:10.1038/s41587-022-01273-7. PMID 35501392.