top of page

R Programming Language, An Overview

R is a type of open source programming language which is mainly used for statistical computations and visual representation of data. Developed by Ross Ihaka and Robert Gentleman in 1993, it includes a great number of resources for statistical inference and graphical visualization of data.

Primary Uses

In general, the R programming language is used for three main purposes

1. Statistical Inference
It is the process of using the results of data analysis to be able to identify various underlying properties of the given data.

2. Data Analysis
Data analysis is the process of collecting, studying, cleaning, transforming, and modeling the data about a given population so that useful information could be discovered.

3. Machine Learning

In this context, machine learning is a method of training the ai with data so that it can create various statistical models with itself through identifying patterns and taking decisions with the least human interventions

Machine Learning using R

K Means

K means is the process of dividing the available data into different groups or clusters by using random points in the chart(preferably the most far apart points) and calculating the distance of each and every point in data with these points and combining the points with the least distances to the particular points. Now, in the newly formed clusters, a centroid is calculated which is then the governing point to create new clusters based off of the distances of these centroids, and this process is continued for either a set number of rounds (which will be provided) or till the point where the clusters don’t change.

Example using R

For this example, we will be using the USArrests data set of the default data sets of R

> data(“USArrests”)
> df <- scale(USArrests)

> set.seed(1)

> kmeans_result <- kmeans(df, 4, nstart = 25)

> print(kmeans_result)

The aforementioned lines of code give us the following output

K-means clustering with 4 clusters of sizes 13, 13, 16, 8

Cluster means:

Murder    Assault   UrbanPop        Rape

1 -0.9615407 -1.1066010 -0.9301069 -0.96676331

2  0.6950701  1.0394414  0.7226370  1.27693964

3 -0.4894375 -0.3826001  0.5758298 -0.26165379

4  1.4118898  0.8743346 -0.8145211  0.01927104

Clustering vector:

4              2             2              4              2

2              3              3              2              4

Hawaii          Idaho       Illinois        Indiana           Iowa

3              1              2              3              1

Kansas       Kentucky      Louisiana          Maine       Maryland

3              1             4              1              2

Massachusetts       Michigan      Minnesota    Mississippi       Missouri

3             2              1             4              2

1              1             2              1              3

New Mexico       New York North Carolina   North Dakota           Ohio

2              2              4              1              3

Oklahoma         Oregon   Pennsylvania   Rhode Island South Carolina

3             3              3              3              4

South Dakota      Tennessee          Texas           Utah        Vermont

1              4              2              3              1

Virginia     Washington  West Virginia      Wisconsin        Wyoming

3              3              1              1              3

Within cluster sum of squares by cluster:

[1] 11.952463 19.922437 16.212213  8.316061

(between_SS / total_SS =  71.2 %)

Available components:

[1] "cluster"      "centers"      "totss"        "withinss"     "tot.withinss"

[6] "betweenss"    "size"         "iter"         "ifault"

bottom of page