Jump to content

R (programming language)

From Wikipedia, the free encyclopedia
(Redirected from R (software))

R
Terminal window for R
ParadigmsMulti-paradigm: procedural, object-oriented, functional, reflective, imperative, array[1]
Designed byRoss Ihaka and Robert Gentleman
DeveloperR Core Team
First appearedAugust 1993; 32 years ago (1993-08)
Stable release
4.5.2[2] Edit this on Wikidata / 31 October 2025; 4 months ago (31 October 2025)
Typing disciplineDynamic
Platformarm64 and x86-64
LicenseGPL-2.0-or-later[3]
Filename extensions
  • .R[4]
  • .r
  • .rdata
  • .rhistory
  • .rds
  • .rda[5]
Websiter-project.org
Influenced by
Influenced
  • R Programming at Wikibooks

R is a programming language for statistical computing and data visualization. It has been widely adopted in the fields of data mining, bioinformatics, data analysis, and data science.[9]

The core R language is extended by a large number of software packages, which contain reusable code, documentation, and sample data. Some of the most popular R packages are in the tidyverse collection, which enhances functionality for visualizing, transforming, and modelling data, as well as improves the ease of programming (according to the authors and users).[10]

R is free and open-source software distributed under the GNU General Public License.[3][11] The language is implemented primarily in C, Fortran, and R itself. Precompiled executables are available for the major operating systems (including Linux, MacOS, and Microsoft Windows).

Its core is an interpreted language with a native command line interface. In addition, multiple third-party applications are available as graphical user interfaces; such applications include RStudio (an integrated development environment), Jupyter (a notebook interface), as well as Termux and Google Colab for mobile devices.[12]

History

[edit]
Co-originators of the R language

R was started by professors Ross Ihaka and Robert Gentleman as a programming language to teach introductory statistics at the University of Auckland.[13] The language was inspired by the S programming language, with most S programs able to run unaltered in R.[6] The language was also inspired by Scheme's lexical scoping, allowing for local variables.[1]

The name of the language, R, comes from being both an S language successor and the shared first letter of the authors, Ross and Robert.[14] In August 1993, Ihaka and Gentleman posted a binary file of R on StatLib — a data archive website.[15] At the same time, they announced the posting on the s-news mailing list.[16] On 5 December 1997, R became a GNU project when version 0.60 was released.[17] On 29 February 2000, the 1.0 version was released.[18]

Packages

[edit]
refer to caption
A violin plot created with the R package ggplot2 for data visualization

R packages are collections of functions, documentation, and data that expand R.[19] For example, packages can add reporting features (using packages such as R Markdown, Quarto,[20] knitr, and Sweave) and support for various statistical techniques (such as linear, generalized linear and nonlinear modeling, classical statistical tests, spatial analysis, time-series analysis, and clustering). Ease of package installation and use have contributed to the language's adoption in data science.[21]

Immediately available when starting R after installation, base packages provide the fundamental and necessary syntax and commands for programming, computing, graphics production, basic arithmetic, and statistical functionality.[22]

An example is the tidyverse collection of R packages, which bundles several subsidiary packages to provide a common API. The collection specializes in tasks related to accessing and processing "tidy data",[23] which are data contained in a two-dimensional table with a single row for each observation and a single column for each variable.[24]

Installing a package occurs only once. For example, to install the tidyverse collection:[24]

> install.packages("tidyverse")

To load the functions, data, and documentation of a package, one calls the library() function. To load the tidyverse collection, one can execute the following code:[a]

> # The package name can be enclosed in quotes
> library("tidyverse")

> # But the package name can also be used without quotes
> library(tidyverse)

The Comprehensive R Archive Network (CRAN) was founded in 1997 by Kurt Hornik and Friedrich Leisch to host R's source code, executable files, documentation, and user-created packages.[25] CRAN's name and scope mimic the Comprehensive TeX Archive Network (CTAN) and the Comprehensive Perl Archive Network (CPAN).[25] CRAN originally had only three mirror sites and twelve contributed packages.[26] As of 30 June 2025, it has 90 mirrors[27] and 22,390 contributed packages.[28] Packages are also available in repositories such as R-Forge, Omegahat, and GitHub.[29][30][31]

To provide guidance on the CRAN web site, its Task Views area lists packages that are relevant for specific topics; sample topics include causal inference, finance, genetics, high-performance computing, machine learning, medical imaging, meta-analysis, social sciences, and spatial statistics.

The Bioconductor project provides packages for genomic data analysis, complementary DNA, microarray, and high-throughput sequencing methods.

Community

[edit]
The R Consortium is one of the three main groups that support R.

There are three main groups that help support R software development:

The R Journal is an open access, academic journal that features short to medium-length articles on the use and development of R. The journal includes articles on packages, programming tips, CRAN news, and foundation news.

UseR! conference is one place the R community can gather at.

The R community hosts many conferences and in-person meetups.[b] These groups include:

  • UseR!: an annual international R user conference (website)
  • Directions in Statistical Computing (DSC) (website)
  • R-Ladies: an organization to promote gender diversity in the R community (website)
  • SatRdays: R-focused conferences held on Saturdays (website)
  • Data Science & AI Conferences (website)
  • posit::conf (formerly known as rstudio::conf) (website)

On social media sites such as Twitter, the hashtag #rstats can be used to follow new developments in the R community.[32]

Examples

[edit]

Hello, World!

[edit]

The following is a "Hello, World!" program:

> print("Hello, World!")
[1] "Hello, World!"

Here is an alternative version, which uses the cat() function:

> cat("Hello, World!")
Hello, World!

Basic syntax

[edit]

The following examples illustrate the basic syntax of the language and use of the command-line interface.[c]

In R, the generally preferred assignment operator is an arrow made from two characters <-, although = can be used in some cases.[33]

> x <- 1:6 # Create a numeric vector in the current environment
> y <- x^2 # Similarly, create a vector based on the values in x.
> y        # Print the vector’s contents.
[1]  1  4  9 16 25 36

> z <- x + y # Create a new vector that is the sum of x and y
> z # Return the contents of z to the current environment.
[1]  2  6 12 20 30 42

> z_matrix <- matrix(z, nrow = 3) # Create a new matrix that transforms the
                                  # vector z into a 3x2 matrix object
> z_matrix 
     [,1] [,2]
[1,]    2   20
[2,]    6   30
[3,]   12   42

> 2 * t(z_matrix) - 2 # Transpose the matrix; multiply every element by 2;
                      # subtract 2 from each element in the matrix; and
                      # then return the results to the terminal.
     [,1] [,2] [,3]
[1,]    2   10   22
[2,]   38   58   82

# Create a new dataframe object that contains the data from a transposed
# z_matrix, with row names 'A' and 'B'
> new_df <- data.frame(t(z_matrix), row.names = c("A", "B")) 
> names(new_df) <- c("X", "Y", "Z") # Set the column names of the new_df dataframe as X, Y, and Z.
> new_df                            # Print the current results.
   X  Y  Z
A  2  6 12
B 20 30 42

> new_df$Z # Output the Z column
[1] 12 42

> new_df$Z == new_df['Z'] && new_df[3] == new_df$Z # The dataframe column Z can be accessed using the syntax $Z, ['Z'], or [3], and the values are the same. 
[1] TRUE

> attributes(new_df) # Print information about attributes of the new_df dataframe
$names
[1] "X" "Y" "Z"

$row.names
[1] "A" "B"

$class
[1] "data.frame"

> attributes(new_df)$row.names <- c("one", "two") # Access and then change the row.names attribute; this can also be done using the rownames() function
> new_df
     X  Y  Z
one  2  6 12
two 20 30 42

Structure of a function

[edit]

R can create functions that add new functionality and enable code reuse.[34] Objects created within the body of the function (which are enclosed by curly brackets) remain accessible only from within the function, and any data type may be returned. In R, almost all functions and all user-defined functions are closures.[35]

The following is an example of creating a function to perform an arithmetic calculation:

# The function's input parameters are x and y.
# The function, named f, returns a linear combination of x and y.
f <- function(x, y) {
  z <- 3 * x + 4 * y

  # An explicit return() statement is optional--it could be replaced with simply `z` in this case.
  return(z)
}

# As an alternative, the last statement executed in a function is returned implicitly.
f <- function(x, y) 3 * x + 4 * y

The following is some output from using the function defined above:

> f(1, 2) #  3 * 1 + 4 * 2 = 3 + 8
[1] 11

> f(c(1, 2, 3), c(5, 3, 4)) # Element-wise calculation
[1] 23 18 25

> f(1:3, 4) # Equivalent to f(c(1, 2, 3), c(4, 4, 4))
[1] 19 22 25

It is possible to define functions to be used as infix operators by using the special syntax `%name%`, where "name" is the function variable name:

> `%sumx2y2%` <- function(e1, e2) {e1 ^ 2 + e2 ^ 2}
> 1:3 %sumx2y2% -(1:3)
[1]  2  8 18

Since R version 4.1.0, functions can be written in a short notation (inspired by the lambda calculus), which is useful for passing anonymous functions to higher-order functions:[36]

> sapply(1:5, \(i) i^2)    # here \(i) is the same as function(i) 
[1]  1  4  9 16 25

Native pipe operator

[edit]

In R version 4.1.0, a native pipe operator, |>, was introduced.[37] This operator allows users to chain functions together, rather than using nested function calls.

> nrow(subset(mtcars, cyl == 4)) # Nested without the pipe character
[1] 11

> mtcars |> subset(cyl == 4) |> nrow() # Using the pipe character
[1] 11

An alternative to nested functions is the use of intermediate objects, rather than the pipe operator:

> mtcars_subset_rows <- subset(mtcars, cyl == 4)
> num_mtcars_subset <- nrow(mtcars_subset_rows)
> print(num_mtcars_subset)
[1] 11

While the pipe operator can produce code that is easier to read, influential R programmers like Hadley Wickham suggest to chain together at most 10-15 lines of code using this operator and saving them into objects having meaningful names to avoid code obfuscation.[38]

Object-oriented programming

[edit]

The R language has native support for object-oriented programming. There are two native frameworks, the so-called S3 and S4 systems. The former, being more informal, supports single dispatch on the first argument, and objects are assigned to a class simply by setting a "class" attribute in each object. The latter is a system like the Common Lisp Object System (CLOS), with formal classes (also derived from S) and generic methods, which supports multiple dispatch and multiple inheritance[39]

In the example below, summary() is a generic function that dispatches to different methods depending on whether its argument is a numeric vector or a factor:

> data <- c("a", "b", "c", "a", NA)
> summary(data)
   Length     Class      Mode 
        5 character character 
> summary(as.factor(data))
   a    b    c NA's 
   2    1    1    1

Modeling and plotting

[edit]
Diagnostic plots for the model from the example code in the "Modeling and plotting" section (q.v. the plot.lm() function). Mathematical notation is allowed in labels, as shown in the lower left plot.

The R language has built-in support for data modeling and graphics. The following example shows how R can generate and plot a linear model with residuals.

# Create x and y values
x <- 1:6
y <- x^2

# Linear regression model: y = A + B * x
model <- lm(y ~ x)

# Display an in-depth summary of the model
summary(model)

# Create a 2-by-2 layout for figures
par(mfrow = c(2, 2))

# Output diagnostic plots of the model
plot(model)

The output from the summary() function in the preceding code block is as follows:

Residuals:
      1       2       3       4       5       6       7       8      9      10
 3.3333 -0.6667 -2.6667 -2.6667 -0.6667  3.3333

Coefficients:
            Estimate Std. Error t value Pr(>|t|)   
(Intercept)  -9.3333     2.8441  -3.282 0.030453 * 
x             7.0000     0.7303   9.585 0.000662 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 3.055 on 4 degrees of freedom
Multiple R-squared:  0.9583, Adjusted R-squared:  0.9478
F-statistic: 91.88 on 1 and 4 DF,  p-value: 0.000662

Mandelbrot set

[edit]
A Mandelbrot set as visualized in R. (Note: The colours in this image differ from the output of the sample code in the "Mandelbrot set" section.)

This example of a Mandelbrot set highlights the use of complex numbers. It models the first 20 iterations of the equation z = z2 + c, where c represents different complex constants.

To run this sample code, it is necessary to first install the package that provides the write.gif() function:

install.packages("caTools")

The sample code is as follows:

library(caTools)

jet.colors <-
    colorRampPalette(
        c("green", "pink", "#007FFF", "cyan", "#7FFF7F",
          "white", "#FF7F00", "red", "#7F0000"))

dx <- 1500 # define width
dy <- 1400 # define height

C  <-
    complex(
            real = rep(seq(-2.2, 1.0, length.out = dx), each = dy),
            imag = rep(seq(-1.2, 1.2, length.out = dy), times = dx)
            )

# reshape as matrix of complex numbers
C <- matrix(C, dy, dx)

# initialize output 3D array
X <- array(0, c(dy, dx, 20))

Z <- 0

# loop with 20 iterations
for (k in 1:20) {

  # the central difference equation
  Z <- Z^2 + C

  # capture the results
  X[, , k] <- exp(-abs(Z))
}

write.gif(
    X,
    "Mandelbrot.gif",
    col = jet.colors,
    delay = 100)

Version names

[edit]
A CD with autographs on it
A CD of R Version 1.0.0, autographed by the core team of R, photographed in Quebec City in 2019

All R version releases from 2.14.0 onward have codenames that make reference to Peanuts comics and films.[40][41][42]

In 2018, core R developer Peter Dalgaard presented a history of R releases since 1997.[43] Some notable early releases before the named releases include the following:

  • Version 1.0.0, released on 29 February 2000, a leap day
  • Version 2.0.0, released on 4 October 2004, "which at least had a nice ring to it"[43]

The idea of naming R version releases was inspired by the naming system for Debian and Ubuntu versions. Dalgaard noted an additional reason for the use of Peanuts references in R codenames—the humorous observation that "everyone in statistics is a P-nut."[43]

Interfaces

[edit]

R is installed with a command line console by default, but there are multiple ways to interface with the language:

Statistical frameworks that use R in the background include Jamovi and JASP.[citation needed]

Implementations

[edit]

The main R implementation is written primarily in C, Fortran, and R itself. Other implementations include the following:

Microsoft R Open (MRO) was an R implementation. As of 30 June 2021, Microsoft began to phase out MRO in favor of the CRAN distribution.[49]

Commercial support

[edit]

Although R is an open-source project, some companies provide commercial support:

  • Oracle provides commercial support for its Big Data Appliance, which integrates R into its other products.
  • IBM provides commercial support for execution of R within Hadoop.

See also

[edit]

Notes

[edit]
  1. ^ This code displays to standard error a listing of all the packages that the tidyverse collection depends upon. The code may also display warnings showing namespace conflicts, which may typically be ignored.
  2. ^ Information about conferences and meetings is available in a community-maintained list on GitHub, jumpingrivers.github.io/meetingsR/
  3. ^ An expanded list of standard language features can be found in the manual "An Introduction to R", cran.r-project.org/doc/manuals/R-intro.pdf

References

[edit]
  1. ^ a b c Morandat, Frances; Hill, Brandon; Osvald, Leo; Vitek, Jan (11 June 2012). "Evaluating the design of the R language: objects and functions for data analysis". European Conference on Object-Oriented Programming. 2012: 104–131. doi:10.1007/978-3-642-31057-7_6. Retrieved 17 May 2016 – via SpringerLink.
  2. ^ Peter Dalgaard (31 October 2025). "[Rd] R 4.5.2 is released". Retrieved 31 October 2025.
  3. ^ a b "R - Free Software Directory". directory.fsf.org. Retrieved 26 January 2024.
  4. ^ "R scripts". mercury.webster.edu. Retrieved 17 July 2021.
  5. ^ "R Data Format Family (.rdata, .rda)". Loc.gov. 9 June 2017. Retrieved 17 July 2021.
  6. ^ a b Hornik, Kurt; The R Core Team (12 April 2022). "R FAQ". The Comprehensive R Archive Network. 3.3 What are the differences between R and S?. Archived from the original on 28 December 2022. Retrieved 27 December 2022.
  7. ^ "Introduction". The Julia Manual. Archived from the original on 20 June 2018. Retrieved 5 August 2018.
  8. ^ "Comparison with R". pandas Getting started. Retrieved 15 July 2024.
  9. ^ Giorgi, Federico M.; Ceraolo, Carmine; Mercatelli, Daniele (27 April 2022). "The R Language: An Engine for Bioinformatics and Data Science". Life. 12 (5): 648. Bibcode:2022Life...12..648G. doi:10.3390/life12050648. PMC 9148156. PMID 35629316.
  10. ^ "Home - RDocumentation". www.rdocumentation.org. Retrieved 13 June 2025.
  11. ^ "R: What is R?". www.r-project.org. Retrieved 10 May 2025.
  12. ^
  13. ^ Ihaka, Ross. "The R Project: A Brief History and Thoughts About the Future" (PDF). p. 12. Archived (PDF) from the original on 28 December 2022. Retrieved 27 December 2022. We set a goal of developing enough of a language to teach introductory statistics courses at Auckland.
  14. ^ Hornik, Kurt; The R Core Team (12 April 2022). "R FAQ". The Comprehensive R Archive Network. 2.13 What is the R Foundation?. Archived from the original on 28 December 2022. Retrieved 28 December 2022.
  15. ^ "Index of /datasets". lib.stat.cmu.edu. Retrieved 5 September 2024.
  16. ^ Ihaka, Ross. "R: Past and Future History" (PDF). p. 4. Archived (PDF) from the original on 28 December 2022. Retrieved 28 December 2022.
  17. ^ Ihaka, Ross (5 December 1997). "New R Version for Unix". stat.ethz.ch. Archived from the original on 12 February 2023. Retrieved 12 February 2023.
  18. ^ Ihaka, Ross. "The R Project: A Brief History and Thoughts About the Future" (PDF). p. 18. Archived (PDF) from the original on 28 December 2022. Retrieved 27 December 2022.
  19. ^ Wickham, Hadley; Cetinkaya-Rundel, Mine; Grolemund, Garrett (2023). R for Data Science, Second Edition. O'Reilly. p. xvii. ISBN 978-1-492-09740-2.
  20. ^ "Quarto". Quarto. Retrieved 5 September 2024.
  21. ^ Chambers, John M. (2020). "S, R, and Data Science". The R Journal. 12 (1): 462–476. doi:10.32614/RJ-2020-028. ISSN 2073-4859. The R language and related software play a major role in computing for data science. ... R packages provide tools for a wide range of purposes and users.
  22. ^ Davies, Tilman M. (2016). "Installing R and Contributed Packages". The Book of R: A First Course in Programming and Statistics. San Francisco, California: No Starch Press. p. 739. ISBN 9781593276515.
  23. ^ Wickham, Hadley (2014). "Tidy Data" (PDF). Journal of Statistical Software. 59 (10). doi:10.18637/jss.v059.i10.
  24. ^ a b Wickham, Hadley; Cetinkaya-Rundel, Mine; Grolemund, Garrett (2023). R for Data Science, Second Edition. O'Reilly. ISBN 978-1-492-09740-2.
  25. ^ a b Hornik, Kurt (2012). "The Comprehensive R Archive Network". WIREs Computational Statistics. 4 (4): 394–398. doi:10.1002/wics.1212. ISSN 1939-5108. S2CID 62231320.
  26. ^ Kurt Hornik (23 April 1997). "Announce: CRAN". r-help. Wikidata Q101068595..
  27. ^ "The Status of CRAN Mirrors". cran.r-project.org. Retrieved 16 October 2024.
  28. ^ "CRAN - Contributed Packages". cran.r-project.org. Retrieved 16 October 2024.
  29. ^ "R-Forge: Welcome". r-forge.r-project.org. Retrieved 5 September 2024.
  30. ^ "The Omega Project for Statistical Computing". www.omegahat.net. Retrieved 5 September 2024.
  31. ^ "Build software better, together". GitHub. Retrieved 5 September 2024.
  32. ^ Wickham, Hadley; Grolemund, Garrett (January 2017). 1 Introduction | R for Data Science (1st ed.). O'Reilly Media. ISBN 978-1491910399.
  33. ^ R Development Core Team. "Assignments with the = Operator". Retrieved 11 September 2018.
  34. ^ Kabacoff, Robert (2012). "Quick-R: User-Defined Functions". statmethods.net. Retrieved 28 September 2018.
  35. ^ Wickham, Hadley. "Advanced R - Functional programming - Closures". adv-r.had.co.nz.
  36. ^ "NEWS". r-project.org.
  37. ^ "R: R News". cran.r-project.org. Retrieved 14 March 2024.
  38. ^ Wickham, Hadley; Çetinkaya-Rundel, Mine; Grolemund, Garrett (2023). "4 Workflow: code style". R for data science: import, tidy, transform, visualize, and model data (2nd ed.). Beijing; Sebastopol, CA: O'Reilly. ISBN 978-1-4920-9740-2. OCLC 1390607935.
  39. ^ "Class Methods". Retrieved 25 April 2024.
  40. ^ Monkman, Martin. Chapter 5 R Release Names | Data Science with R: A Resource Compendium.
  41. ^ McGowan, Lucy D’Agostino (28 September 2017). "R release names". livefreeordichotomize.com. Retrieved 7 April 2024.
  42. ^ r-hub/rversions, The R-hub project of the R Consortium, 29 February 2024, retrieved 7 April 2024
  43. ^ a b c Dalgaard, Peter (15 July 2018). "What's in a name? 20 years of R release management" (video). YouTube. Retrieved 9 April 2024.
  44. ^ "R for macOS". cran.r-project.org. Retrieved 5 September 2024.
  45. ^ "IDE from Posit PBC, the creators of RStudio | Positron - Home". Positron. Retrieved 24 September 2025.
  46. ^ "IDE/Editor para Linguagem R | Tinn-R - Home". Tinn-R (in Brazilian Portuguese). Retrieved 5 September 2024.
  47. ^ Talbot, Justin; DeVito, Zachary; Hanrahan, Pat (1 January 2012). "Riposte: A trace-driven compiler and parallel VM for vector code in R". Proceedings of the 21st international conference on Parallel architectures and compilation techniques. ACM. pp. 43–52. doi:10.1145/2370816.2370825. ISBN 9781450311823. S2CID 1989369.
  48. ^ Jackson, Joab (16 May 2013). TIBCO offers free R to the enterprise. PC World. Retrieved 20 July 2015.
  49. ^ "Looking to the future for R in Azure SQL and SQL Server". 30 June 2021. Retrieved 7 November 2021.

Further reading

[edit]
[edit]