Factors are the data objects which are used to categorize the data and store it as levels. They can store both strings and integers. They are useful in the columns which have a limited number of unique values. Like “Male,”Female" and True, False etc. They are useful in data analysis for statistical modeling.
Factors are created using thefactor ()
function by taking a vector as input.
Let us consider the following example
<- c("East","West","East","North","North","East","West",
data "West","West","East","North")
=factor(data)
factor_data factor_data
[1] East West East North North East West West West
[10] East North
Levels: East North West
By default, the factor function assign the levels alphabetically. We can change this using levels=()
command.
<- c("East","West","East","North","North","East","West",
data "West","West","East","North")
=factor(data, levels = c("North","East","West"))
factor_data factor_data
[1] East West East North North East West West West
[10] East North
Levels: North East West
We can also convert a numeric vector into factor:
=c(0,0,1,0,1,1,1,0,1,0,1,1,0,1,0,1,1)
num_data=factor(num_data, labels = c("Male","Female"))
factor_data factor_data
[1] Male Male Female Male Female Female Female Male
[9] Female Male Female Female Male Female Male Female
[17] Female
Levels: Male Female
We can generate factor levels by using the gl() function. It takes two integers as input which indicates how many levels and how many times each level.
gl(n, k, labels)
Following is the description of the parameters used −
n is a integer giving the number of levels.
k is a integer giving the number of replications.
labels is a vector of labels for the resulting factor levels.
=gl(2,8, labels = c("Male","Female"))
factor_data factor_data
[1] Male Male Male Male Male Male Male Male
[9] Female Female Female Female Female Female Female Female
Levels: Male Female
Another example of Factor is:
# Generating 100 random numbers between 1 and 100
= sample(1:100,100)
data # Generating an empty vector of size 100.
=mat.or.vec(100,1)
resultfor (i in 1:100) {
if(data[i]<30){
=0
result[i]
}else if(data[i]>=30 && data[i]<45){
=1
result[i]
}else if(data[i]>=45 && data[i]<60){
=2
result[i]
}else {
=3
result[i]
}
}
=factor(result, labels = c("Fail", "Third Div", "Second Div", "First Div"))
f_resultstr(f_result)
Factor w/ 4 levels "Fail","Third Div",..: 2 3 4 4 4 2 2 1 1 3 ...
summary(f_result)
Fail Third Div Second Div First Div
29 15 15 41
plot(f_result)