What is R?
R is an open-source environment for statistical computing and visualization. It is based on the S language developed at Bell Laboratories in the 1980’s, and is the product of an active movement among statisticians for a powerful, programmable, portable, and open computing environment, applicable to the most complex and sophisticated problems, as well as “routine” analysis, without any restrictions on access or use.
Download and Installation R
The R can download from the R project of a repository CRAN or you can install Microsoft R Open.
** Install R in Windows**
Installation instruction of R in Windows and MAC could be found here
Detail Installation steps of Microsoft R Open in different operating systems can be found here.
Install R-base (3.5.2) in Ubuntu 16.04
Update repositories list: deb https://cloud.r-project.org/bin/Linux/Ubuntu xenial-cran35/
Use following commands in terminal:
Download and Installation R-Studio
First, you have to download RStudio according to your operating system from here. For windows user, and just run the installation file and it normally detects your latest installed R version automatically. If you want to do some extra configuration, you need follow some steps which can be found here
For Linux user, use following commands in terminal:
After installation, double click on desktop icon or open program from START to run R. R will be open as a Console window (Fig. 1a). You can work in console and use R with the command line. However, the command line can be quite daunting to a beginner, It it is better to work in R Editor (Fig. 1b). First you have to create a New script from File menu. Any code you run in R-script, output will be displayed in console window. We can save all of your R codes as a R script file and output in console as a R-Data file.
As I mentioned before, R-Studio includes console, syntax-highlighting editor that supports direct code execution, as well as tools for plotting, history, debugging and work space management. Moreover, you can share your codes with output as HTLM, MS-word and PDF with others.
R Packages
Packages are collections of R functions, data, and compiled code in a well-defined format. The directory where packages are stored is called the library. We can install any R ‘package’ or multiple package directly from the console, using r-script and GUI (Tools > Install Packages) through internet.
Use install.packages() function in your console or in a script:
# One package
# install.packages("raster", dependencies = TRUE)
# Multiple packages
# install.packages(c("raster","gstat") dependencies = TRUE)
R command prompt
Once you have R environment setup, then it’s easy to start your R command prompt by just typing the following command at your command prompt
print("Hello, World!")
## [1] "Hello, World!"
At the prompt, we enter the expression that we want evaluated and when we hit enter, it will compute the result, for example,
Or, we can execute above functions in a script
2+2 # Addition
## [1] 4
4-2 # Subtraction
## [1] 2
3*3 # Multiplication
## [1] 9
4/2 # Division
## [1] 2
2^2 # Power
## [1] 4
Two or more expressions can be placed on a single line so long as they are separated by semi-colons:
2+2; 4/2
## [1] 4
## [1] 2
Built-in Functions
Some built in function are shown below:
log(x): log to base e of x
exp(x): antilog of x
log(x,n): log to base n of x
log10(x): log to base 10 of x
sqrt(x): square root of x
round(x, digits=0): round the value of x to an integer
abs(x): the absolute value of x, ignoring the minus sign if there is one
log(10)
## [1] 2.302585
exp(1)
## [1] 2.718282
pi
## [1] 3.141593
sin(pi/2)
## [1] 1
Most useful R functions have been complied and found here
Variable Assignment
The variables can be assigned values using assignment statement - leftward, rightward and equal to operator (= or < -). The operator <- can be used anywhere, whereas the operator = is only allowed at the top level (e.g., in the complete expression typed at the command prompt) or as one of the sub expressions in a braced list of expressions.The operators <<- and ->> are normally only used in functions, and cause a search to be made through parent environments for an existing definition of the variable being assigned.
a <-2
# Or
a = 2
When you want to know what is in a variable simply ask by typing the variable name.
a
## [1] 2
We can store a computation of two variable names and do some calculation and the result is assigned to a new variable.
a=2
b=3
c=a+b
c
## [1] 5
Important note
Variable names in R are case-sensitive so a is not the same as A.
Variable names should not begin with numbers (e.g. 1x) or symbols (e.g. %x).
Variable names should not contain blank spaces: use grain.yield
String
String is value written within a pair of single quote or double quotes in R. Internally R stores every string within double quotes, even when you create them with single quote.
a <- 'single quote'
print(a)
## [1] "single quote"
b <- "double quotes"
print(b)
## [1] "double quotes"
you can combined Many strings in R using the paste() function. It can take any number of arguments to be combined together.
a <- "Hello"
b <- 'How'
c <- "are"
d <-" you? "
print(paste(a,b,c,d))
## [1] "Hello How are you? "
print(paste(a,b,c,d, sep = "-"))
## [1] "Hello-How-are- you? "
print(paste(a,b,c,d, sep = "", collapse = ""))
## [1] "HelloHoware you? "
Data Types
R supports a wide variety of data types including scalars, vectors (numerical, character, logical), matrices, data frames, and lists.
In contrast to other programming languages like C and java in R, the variables are not declared as some data type. The variables are assigned with R-Objects and the data type of the R-object becomes the data type of the variable. There are many types of R-objects. The frequently used ones are ???
Vectors
Matrices
Lists
Data Frames
Arrays
Factors
Vectors
A list of numbers together to form a vector.
Numeric vector
a <- c(1, 2, 5.3, 6, -2, 4, 2, 5, 10)
a
## [1] 1.0 2.0 5.3 6.0 -2.0 4.0 2.0 5.0 10.0
Vector index in R starts from 1, unlike most programming languages where index start from 0. We can use a vector of integers as index to access specific elements in a vector.
a[2] # access 2nd element
## [1] 2
a[c(2, 4)] # access 2nd and 4th element
## [1] 2 6
a[-1] # access all but 1st element
## [1] 2.0 5.3 6.0 -2.0 4.0 2.0 5.0 10.0
a[a < 0] # filtering vectors based on conditions
## [1] -2
a[2] <- 0 # modify 2nd element
a
## [1] 1.0 0.0 5.3 6.0 -2.0 4.0 2.0 5.0 10.0
You can apply following functions to get useful summaries of a vector:
For examples:
sum(a) # sums the values in the vector
## [1] 31.3
length(a) # number of the values in the vector
## [1] 9
mean (a) # the average of the values in the vector
## [1] 3.477778
var (a) # the sample variance of the values
## [1] 13.15444
sd(a) # the standard of deviations of the values
## [1] 3.626906
max(a) # the largest value in the vector
## [1] 10
min(a) # the smallest number in the vector
## [1] -2
median(a) # the sample median
## [1] 4
summary(a) # summary statistics
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -2.000 1.000 4.000 3.478 5.300 10.000
quantile(a) # quantile
## 0% 25% 50% 75% 100%
## -2.0 1.0 4.0 5.3 10.0
Matrices
In R, matrix is a two dimensional data structure, which is similar to vector but additionally contains the dimension attribute. All columns in a matrix must have the same mode (numeric, character, etc.) and the same length.
Matrix can be created using the matrix() function.Dimension of the matrix can be defined by passing appropriate value for arguments nrow and ncol
# 4 x 4 matrix
matrix(1:16, nrow = 4, ncol = 4)
## [,1] [,2] [,3] [,4]
## [1,] 1 5 9 13
## [2,] 2 6 10 14
## [3,] 3 7 11 15
## [4,] 4 8 12 16
You can create matrix with row and column names:
# create a vector
cells=c(1,26,24,68,35,68,73,18,2,56,4,5,34,21,24,20)
# names of column rows
cnames = c("C1","C2","C3","C4")
# names of two rows
rnames = c("R1","R2","R3","R4")
# Create a 4 x 4 matrix name as Z
z= matrix(cells,
nrow=4,
ncol=4,
byrow=TRUE,
dimnames=list(rnames,cnames))
z
## C1 C2 C3 C4
## R1 1 26 24 68
## R2 35 68 73 18
## R3 2 56 4 5
## R4 34 21 24 20
You can extract rows, columns or elements of matrix using following commands:
z[,4] # 4th column of matrix
## R1 R2 R3 R4
## 68 18 5 20
z[3,] # 3rd row of matrix
## C1 C2 C3 C4
## 2 56 4 5
z[2:4,1:3] # rows 2,3,4 of columns 1,2,3
## C1 C2 C3
## R2 35 68 73
## R3 2 56 4
## R4 34 21 24
Summary statistics of a column or any row can be calculate
summary(z[,3]) # summary statistics of the 3 column of matrix
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 4.00 19.00 24.00 31.25 36.25 73.00
summary(z[2,]) # summary statistics of the 2 rows
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 18.00 30.75 51.50 48.50 69.25 73.00
summary(z) # summary statistics of each column
## C1 C2 C3 C4
## Min. : 1.00 Min. :21.00 Min. : 4.00 Min. : 5.00
## 1st Qu.: 1.75 1st Qu.:24.75 1st Qu.:19.00 1st Qu.:14.75
## Median :18.00 Median :41.00 Median :24.00 Median :19.00
## Mean :18.00 Mean :42.75 Mean :31.25 Mean :27.75
## 3rd Qu.:34.25 3rd Qu.:59.00 3rd Qu.:36.25 3rd Qu.:32.00
## Max. :35.00 Max. :68.00 Max. :73.00 Max. :68.00
Data-frame
A data frame is little different than a matrix. In data frame different, columns can have different modes (numeric, character, factor, etc.).
ID = c(1,2,3,4) # create a vector of ID coloumn
Landuse = c("Grassland","Forest", "Arable", "Urban") # create a text vector
settlement = c (FALSE, FALSE, FALSE, TRUE) # creates a logical vector
pH = c(6.6,4.5, 6.8, 7.5) # create a numerical vector
my.data=data.frame(ID,Landuse,settlement,pH) # create a data frame
my.data
## ID Landuse settlement pH
## 1 1 Grassland FALSE 6.6
## 2 2 Forest FALSE 4.5
## 3 3 Arable FALSE 6.8
## 4 4 Urban TRUE 7.5
List
Lists are the collection of R objects which contain elements of different types like ??? numbers, strings, vectors and another list inside it. A list can also contain a matrix or a function as its elements.
In R, a list is created by placing all the items (elements) inside a bracket ( ), separated by commas.
# Create a list containing strings, numbers, vectors and a logical values.
my.list <- list("Blue", "Green", "Red", c(21,32,11), TRUE, 51.23, 119.1)
print(my.list)
## [[1]]
## [1] "Blue"
##
## [[2]]
## [1] "Green"
##
## [[3]]
## [1] "Red"
##
## [[4]]
## [1] 21 32 11
##
## [[5]]
## [1] TRUE
##
## [[6]]
## [1] 51.23
##
## [[7]]
## [1] 119.1
Elements of the list can be accessed by the index of the element in the list. In case of named lists it can also be accessed using the names.
# Access the first element of the list.
print(my.list[1])
## [[1]]
## [1] "Blue"
# Access the 4th element. As it is also a list, all its elements will be printed.
print(my.list[4])
## [[1]]
## [1] 21 32 11
Arrays
Arrays are the R data objects which can store data in more than two dimensions.
# Create two vectors of different lengths.
vector1 <- c(2,4,3)
vector2 <- c(9,11,10,12,14,11)
# Take these vectors as input to the array.
result <- array(c(vector1,vector2),dim = c(3,3,2))
print(result)
## , , 1
##
## [,1] [,2] [,3]
## [1,] 2 9 12
## [2,] 4 11 14
## [3,] 3 10 11
##
## , , 2
##
## [,1] [,2] [,3]
## [1,] 2 9 12
## [2,] 4 11 14
## [3,] 3 10 11
Factor
Factors are the data objects which are used to categorize the data and store it as levels
# Create a vector as input.
Landuse = c("Grassland","Forest", "Arable", "Urban") # create a text vector
print(Landuse)
## [1] "Grassland" "Forest" "Arable" "Urban"
print(is.factor(Landuse))
## [1] FALSE
# Apply the factor function
factor.data <- factor(Landuse)
print(factor.data)
## [1] Grassland Forest Arable Urban
## Levels: Arable Forest Grassland Urban
print(is.factor(factor.data))
## [1] TRUE
Sorting
soil =c("S1","S2", "S3", "S4", "S5", "S6","S7", "S8","S9","S10","S11","S12") # create a text vector
pH = c(5.2,6.0,6.6,5.6,4.7,5.2,5.7, 5.9,5.3,6.8,6.2,5.8) # create numerical vector
SOC = c(1.2,3.0,1.6,2.6,2.7,1.2,1.7, 2.9,2.3,1.8,2.2,1.8) # create numerical vector
pH.data= data.frame(soil, pH,SOC) # create a data frame
head(pH.data)
## soil pH SOC
## 1 S1 5.2 1.2
## 2 S2 6.0 3.0
## 3 S3 6.6 1.6
## 4 S4 5.6 2.6
## 5 S5 4.7 2.7
## 6 S6 5.2 1.2
# attach pH.data
attach(pH.data)
## The following objects are masked _by_ .GlobalEnv:
##
## pH, SOC, soil
# sort by pH
newdata.1 <-pH.data[order(pH),]
# sort by pH and SOC
newdata.2 <- pH.data[order(pH, SOC),]
#sort by pH (ascending) and SOC (descending)
newdata.3 <- pH.data[order(pH, -SOC),]
head(newdata.1)
## soil pH SOC
## 5 S5 4.7 2.7
## 1 S1 5.2 1.2
## 6 S6 5.2 1.2
## 9 S9 5.3 2.3
## 4 S4 5.6 2.6
## 7 S7 5.7 1.7
head(newdata.2)
## soil pH SOC
## 5 S5 4.7 2.7
## 1 S1 5.2 1.2
## 6 S6 5.2 1.2
## 9 S9 5.3 2.3
## 4 S4 5.6 2.6
## 7 S7 5.7 1.7
head(newdata.3)
## soil pH SOC
## 5 S5 4.7 2.7
## 1 S1 5.2 1.2
## 6 S6 5.2 1.2
## 9 S9 5.3 2.3
## 4 S4 5.6 2.6
## 7 S7 5.7 1.7
# dettach pH data
detach(pH.data)
Rounding
In R, various types of rounding (rounding up, rounding down, rounding to the nearest integer) can be done easily.
floor(5.7) # greatest integer less than' function
## [1] 5
ceiling(5.7) # the 'next integer' function is ceilling
## [1] 6
round(5.7) # rounded to 6
## [1] 6
round(5.4) # rounded to 5
## [1] 5
Set working directory
# Define data folder
dataFolder<-"D:\\Dropbox\\WebSite_Data\\R_Data\\Data_01\\"
If we want to read files from a specific location or write files to a specific location, first, we will need to set working directory in R. You can set a new working directory using setwd() function.
# Define working directory
# setwd("~\\Data\\DATA_01")
# or
# setwd("F~//DATA_01")
The files under in a directory can check using following command using dir() function
# dir()
R support a variety file types to read or import into R or write or export from R.
The data could be found [here]https://www.dropbox.com/s/s5lwsb8jt1cbocs/DATA_01.7z?dl=0).
*Read/Import file into R ** Read a Text File
The easiest form of data to import into R is a simple text file, and this will often be acceptable for problems of small or medium scale. The primary function to import from a text file is read.table.
# data.txt = read.table("~//Data//test_data.txt", header=T) # read txt files//
Comma Delimited Text File
The sample data can be in comma separated values (CSV) format. Each cell inside such data file is separated by a special character, which usually is a comma, although other characters can be used as well.
# data.csv = read.csv("~//DATA_01//test_data.csv", header=T) # read csv files
data.csv<-read.csv(paste0(dataFolder,"test_data.csv"), header= TRUE)
** Excel**
One of the best ways to read an Excel file is to export it to a comma delimited file and import it using the method above. Alternatively, we can use the xlsx package to access Excel files. The first row should contain variable/column names.
# install.packages("xlsx") # Install "xlsx" package
# library(xlsx) # Load xlsx package
# data.xls <- read.xlsx("~//Data_01//test_data.xlsx", 1) # read xlsx file
Export file from R
# Write CSV file
#write.csv(data.csv , "~/Data_01//test_data.csv", row.names = FALSE)
Getting Information on a Dataset
List the variables in the data
#names(data.txt)
names(data.csv)
## [1] "ID" "treat" "var" "rep" "PH" "TN" "PN" "GW"
## [9] "ster" "DTM" "SW" "GAs" "STAs"
#names(data.xls)
Structure of data
str(data.csv)
## 'data.frame': 42 obs. of 13 variables:
## $ ID : int 1 2 3 4 5 6 7 8 9 10 ...
## $ treat: Factor w/ 2 levels "High As ","Low As": 2 2 2 2 2 2 2 2 2 2 ...
## $ var : Factor w/ 7 levels "BR01","BR06",..: 1 1 1 2 2 2 3 3 3 4 ...
## $ rep : int 1 2 3 1 2 3 1 2 3 1 ...
## $ PH : num 84 112 102 118 115 ...
## $ TN : num 28.3 34 27.7 23.3 16.7 19 21.7 25.3 23 19.7 ...
## $ PN : num 27.7 30 24 19.7 12.3 15.3 19.3 21 19 14.7 ...
## $ GW : num 35.7 58.1 44.6 46.4 19.9 35.9 56.2 49.2 48.6 36.6 ...
## $ ster : num 20.5 14.8 5.8 20.3 32.3 14.9 6.1 9.2 4.2 12.1 ...
## $ DTM : num 126 119 120 119 120 ...
## $ SW : num 28.4 36.7 32.9 40 28.2 42.3 35.4 60.6 69.8 57.3 ...
## $ GAs : num 0.762 0.722 0.858 1.053 1.13 ...
## $ STAs : num 14.6 10.8 12.7 18.2 13.7 ...
Levels of a factor
levels(data.csv$var)
## [1] "BR01" "BR06" "BR28" "BR35" "BR36" "Jefferson"
## [7] "Kaybonnet"
Print first 10 rows of data
head(data.csv, n=10)
## ID treat var rep PH TN PN GW ster DTM SW GAs STAs
## 1 1 Low As BR01 1 84.0 28.3 27.7 35.7 20.5 126.0 28.4 0.762 14.60
## 2 2 Low As BR01 2 111.7 34.0 30.0 58.1 14.8 119.0 36.7 0.722 10.77
## 3 3 Low As BR01 3 102.3 27.7 24.0 44.6 5.8 119.7 32.9 0.858 12.69
## 4 4 Low As BR06 1 118.0 23.3 19.7 46.4 20.3 119.0 40.0 1.053 18.23
## 5 5 Low As BR06 2 115.3 16.7 12.3 19.9 32.3 120.0 28.2 1.130 13.72
## 6 6 Low As BR06 3 111.0 19.0 15.3 35.9 14.9 116.3 42.3 1.011 15.97
## 7 7 Low As BR28 1 114.3 21.7 19.3 56.2 6.1 123.7 35.4 0.965 14.49
## 8 8 Low As BR28 2 124.0 25.3 21.0 49.2 9.2 114.3 60.6 0.969 16.02
## 9 9 Low As BR28 3 120.3 23.0 19.0 48.6 4.2 113.3 69.8 0.893 15.25
## 10 10 Low As BR35 1 130.0 19.7 14.7 36.6 12.1 126.0 57.3 1.358 21.23
Print last 5 rows of mydata
tail(data.csv, n=5)
## ID treat var rep PH TN PN GW ster DTM SW GAs
## 38 38 High As Jefferson 2 72.3 9.7 8.3 14.1 18.2 128.7 11.6 1.872
## 39 39 High As Jefferson 3 80.0 13.3 11.0 23.0 12.6 127.0 16.3 2.007
## 40 40 High As Kaybonnet 1 96.7 14.3 7.7 5.4 57.2 131.7 18.2 1.888
## 41 41 High As Kaybonnet 2 101.0 15.7 8.0 5.2 82.8 130.7 28.7 1.889
## 42 42 High As Kaybonnet 3 105.3 13.7 10.0 15.0 54.7 128.7 28.5 1.767
## STAs
## 38 18.60
## 39 21.02
## 40 20.27
## 41 22.51
## 42 21.39
Some Important R-packages Statistical and Spatial Data Analysis
Utility and data manipulation
Plotting and Mapping
Advanced statistical analysis and Mechine Learning Packages
Spatial data
Installation of R-packages
If the R program has already been installed, the installation of any ‘package’ can be done directly from the console of R through internet or from local drive. It is better to install through the internet. Detail installation steps can be found here
rm(list = ls())