Jump to content

Human genetic clustering

From Wikipedia, the free encyclopedia
This is an old revision of this page, as edited by Millager (talk | contribs) at 21:02, 18 April 2021 (nearly done with genetic clustering & race section). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

Human genetic clustering

Human genetic clustering refers to a wide range of scientific and statistical methods often used to characterize patterns and subgroups within studies of human genetic variation.

Clustering studies are thought to be valuable for characterizing the general structure of genetic variation among human populations, to contribute to the study of ancestral origins, evolutionary history, and precision medicine. Since the mapping of the human genome, and with the availability of increasingly powerful analytic tools, cluster analyses have revealed a range of ancestral and migratory trends among human populations and individuals.[1] Humans tend to cluster together by geographic ancestry, with divisions between clusters aligning largely with geographic barriers such as oceans or mountain ranges. But the practice of defining clusters among modern human populations is largely arbitrary and variable; although individual genetic markers can be used to produce smaller groups, there are no models that produce completely distinct subgroups when larger numbers of genetic markers are used.[2][3]

Studies of human genetic clustering have been implicated in discussions of race, ethnicity, and scientific racism, as some have controversially suggested that genetically derived clusters may be understood as proof of genetically determined races.[4][5] Although cluster analyses invariably organize humans (or groups of humans) into subgroups, debate is ongoing on how to interpret these genetic clusters with respect to race and its social and phenotypic features. And, because there is such a small fraction of genetic variation between human genotypes overall, genetic clustering approaches are highly dependent on the sampled data, genetic markers, and statistical methods applied to their construction.

Genetic clustering algorithms and methods

A wide range of methods have been developed to assess the structure of human populations with the use of genetic data. Early studies of within and between-group genetic variation used physical phenotypes and blood groups, with modern genetic studies using genetic markers such as restriction site polymorphisms, short tandem repeat polymorphisms, and single nucleotide polymorphisms (SNPs) among others.[6] Models for genetic clustering also vary by algorithms and programs used to process the data. Most methods for determining clusters can be categorized as model-based clustering methods or multidimensional summaries.[7][8] By processing a large number of SNPs (or other genetic marker data) in different ways, both approaches to genetic clustering tend to converge on similar patterns by identifying similarities among SNPs and/or haplotype tracts to reveal ancestral genetic similarities.[8]

Model-based clustering

Common model-based clustering algorithms include STRUCTURE, ADMIXTURE, and HAPMIX. These algorithms operate by finding the best fit for genetic data among an arbitrary or mathematically derived number of clusters, such that differences within clusters are minimized and differences between clusters are maximized. This clustering method is also referred to as "admixture inference," as individual genomes (or individuals within populations) can be characterized by the proportions of alleles linked to each cluster.[1] Of note, algorithms like STRUCTURE have required that populations are chosen for samples before running the cluster analysis.

Multidimensional summary statistics

Where model-based clustering characterizes populations using proportions of discrete clusters, multidimensional summary statistics characterize populations on a continuous spectrum. The most common multidimensional statistical method used for genetic clustering is principal component analysis (PCA), which plots individuals by two or more axes (their "principal components") that represent aggregations of genetic markers that account for the highest variance. Clusters can then be identified by assessing the distribution of data; with larger samples of human genotypes, data tends to cluster in discrete groups as well as admixed position between groups.[1][8]

Caveats and limitations

There are caveats and limitations to genetic clustering methods of any type, given the degree of admixture and relative similarity within the human population. All genetic cluster findings are biased by the sampling process used to gather data, and by the quality and quantity of that data. For example, many clustering studies use data derived from populations that are geographically distinct and far apart from one another, which may present an illusion of discrete clusters where, in reality, populations are much more blended with one another when intermediary groups are included.[1] STRUCTURE in particular may be misleading by requiring the data to be sorted into a predetermined number of clusters which may or may not reflect the actual population's distribution.[9] Sample size also plays an important moderating role on cluster findings, as different sample size inputs can influence cluster assignment, and more subtle relationships between genotypes may only emerge with larger sample sizes.[1][9]

Applications to human genetic data

Application of genetic clustering methods to a large human dataset was first marked by studies associated with the Human Genome Diversity Project (HGDP) data.[1] These early HGDP studies, such as those by Rosenberg and colleagues,[10][11] contributed to theories of the serial founder effect and early human migration out of Africa.

Talk about ROsenberg et al here

Could include table from original article (under genetic cluster studies) here

###PROBABLY NEED TO WRAP THIS SECTION INTO ANOTHER SECTION, i DON'T REALLY WANT TO GET INTO IT

Genetic clustering and race

A plurality of human genetic clustering studies have produced clusters of individuals with similar geographic origins or ancestry, and these findings have been interpreted by some to suggest biological support for the concept of race. Clustering results often, for example, have shown a clear cluster distinction between individuals with African and non-African ancestry, and other levels of clustering have come close to placing individuals all within their corresponding continental populations (i.e., Europeans clustered together, East Asians clustered together, etc.).[12] Rosenberg et al. (2002) suggested divisions of human populations into five clusters that can be seen to resemble major geographic divisions, and concluded that self-identified ancestry (taken by many to mean race) may be an adequate proxy for ancestry. And the association between genetic clusters and race may be further confounded by false assumptions about racialized traits, such as skin color or temperament, having clear genetic roots.[13] In these ways, aspects of genetic clusters may be seen to resemble the traditional notion of race, at least as understood in the United States.

Many other scholars have challenged the idea that race can be inferred by genetic clusters, drawing distinctions between arbitrarily assigned genetic clusters, ancestry, and race. One recurring caution against thinking of human populations in terms of clusters is the notion that genotypic variation and traits are distributed evenly between populations, along gradual clines rather than along discrete population boundaries. Although genetic similarities are usually organized geographically, populations have never been completely separated. Due to migration, gene flow, and baseline homogeneity, features between groups are extensively overlapping and intermixed.[3][4] Moreover,


"con" arguments-->

Humans have been shown to vary by clines with traits distributed gradually between groups, rather than along discrete population boundaries.

Everyone is a descendant of sub-Saharan Africans.

Race is understood to be a social phenomenon.

perahps a note at the end about race being a socially-constructed characteristic, with some epigenetic/environmental biological elements

See also

###I'll add some stuff here too

  1. ^ a b c d e f Novembre, John; Ramachandran, Sohini (2011-09-22). "Perspectives on Human Population Structure at the Cusp of the Sequencing Era". Annual Review of Genomics and Human Genetics. 12 (1): 245–274. doi:10.1146/annurev-genom-090810-183123. ISSN 1527-8204.
  2. ^ Bamshad, Michael J.; Olson, Steve E. (2003-12). "Does Race Exist?". Scientific American. 289 (6): 78–85. doi:10.1038/scientificamerican1203-78. ISSN 0036-8733. {{cite journal}}: Check date values in: |date= (help)
  3. ^ a b Maglo, Koffi N.; Mersha, Tesfaye B.; Martin, Lisa J. (2016-02-17). "Population Genomics and the Statistical Values of Race: An Interdisciplinary Perspective on the Biological Classification of Human Populations and Implications for Clinical Genetic Epidemiological Research". Frontiers in Genetics. 7. doi:10.3389/fgene.2016.00022. ISSN 1664-8021.{{cite journal}}: CS1 maint: unflagged free DOI (link)
  4. ^ a b Jorde, Lynn B; Wooding, Stephen P (2004-10-26). "Genetic variation, classification and 'race'". Nature Genetics. 36 (S11): S28 – S33. doi:10.1038/ng1435. ISSN 1061-4036.
  5. ^ Verfasser., Marks, Jonathan (Jonathan M.), 1955-. Is science racist?. ISBN 978-0-7456-8925-8. OCLC 1037867598. {{cite book}}: |last= has generic name (help)CS1 maint: multiple names: authors list (link) CS1 maint: numeric names: authors list (link)
  6. ^ Bamshad, Michael; Wooding, Stephen; Salisbury, Benjamin A.; Stephens, J. Claiborne (2004-08). "Deconstructing the relationship between genetics and race". Nature Reviews Genetics. 5 (8): 598–609. doi:10.1038/nrg1401. ISSN 1471-0056. {{cite journal}}: Check date values in: |date= (help)
  7. ^ Novembre, John; Ramachandran, Sohini (2011-09-22). "Perspectives on Human Population Structure at the Cusp of the Sequencing Era". Annual Review of Genomics and Human Genetics. 12 (1): 245–274. doi:10.1146/annurev-genom-090810-183123. ISSN 1527-8204.
  8. ^ a b c Lawson, Daniel John; Falush, Daniel (2012-09-22). "Population Identification Using Genetic Data". Annual Review of Genomics and Human Genetics. 13 (1): 337–361. doi:10.1146/annurev-genom-082410-101510. ISSN 1527-8204.
  9. ^ a b Kalinowski, S T (2010-08-04). "The computer program STRUCTURE does not reliably identify the main genetic clusters within species: simulations and implications for human population structure". Heredity. 106 (4): 625–632. doi:10.1038/hdy.2010.95. ISSN 0018-067X.
  10. ^ Rosenberg, N. A. (2002-12-20). "Genetic Structure of Human Populations". Science. 298 (5602): 2381–2385. doi:10.1126/science.1078311. ISSN 0036-8075.
  11. ^ Rosenberg, Noah A; Mahajan, Saurabh; Ramachandran, Sohini; Zhao, Chengfeng; Pritchard, Jonathan K; Feldman, Marcus W (2005-12-09). "Clines, Clusters, and the Effect of Study Design on the Inference of Human Population Structure". PLoS Genetics. 1 (6): e70. doi:10.1371/journal.pgen.0010070. ISSN 1553-7404.{{cite journal}}: CS1 maint: unflagged free DOI (link)
  12. ^ Jorde, Lynn B; Wooding, Stephen P (2004-10-26). "Genetic variation, classification and 'race'". Nature Genetics. 36 (S11): S28 – S33. doi:10.1038/ng1435. ISSN 1061-4036.
  13. ^ 1980-, Koenig, Barbara A. Lee, Sandra Soo-Jin, 1966- Richardson, Sarah S., (2008). Revisiting race in a genomic age. Rutgers University Press. ISBN 978-0-8135-4323-9. OCLC 468194495. {{cite book}}: |last= has numeric name (help)CS1 maint: extra punctuation (link) CS1 maint: multiple names: authors list (link) CS1 maint: numeric names: authors list (link)