A data frame is a table or a two-dimensional array-like structure in which each column contains values of one variable and each row contains one set of values from each column.
Following are the characteristics of a data frame.
=data.frame(
dataenrollment.no=01:10,
first.name=c("Biman","Prapti","Dipjyoti","Rashmi","Debojit",
"Rocky","Rajesh","Semantika","Dhriti","Priyanka"),
last.name=c("Deka","Kakati","Chahariya","Bhuyan","Barua",
"Baglary","Bodo","Chakraborty","Chakraborty","Agarwal"),
cgpa=c(6.5,7.2,6.1,8.5,6.0,5.2,5.5,9.0,8.75,8.0),
gender=c("male","female","male","female","male","male",
"male","female","female","female"),
stringsAsFactors = FALSE
)print(data)
enrollment.no first.name last.name cgpa gender
1 1 Biman Deka 6.50 male
2 2 Prapti Kakati 7.20 female
3 3 Dipjyoti Chahariya 6.10 male
4 4 Rashmi Bhuyan 8.50 female
5 5 Debojit Barua 6.00 male
6 6 Rocky Baglary 5.20 male
7 7 Rajesh Bodo 5.50 male
8 8 Semantika Chakraborty 9.00 female
9 9 Dhriti Chakraborty 8.75 female
10 10 Priyanka Agarwal 8.00 female
The structure of the data frame can be seen by using str()
function.
str(data)
'data.frame': 10 obs. of 5 variables:
$ enrollment.no: int 1 2 3 4 5 6 7 8 9 10
$ first.name : chr "Biman" "Prapti" "Dipjyoti" "Rashmi" ...
$ last.name : chr "Deka" "Kakati" "Chahariya" "Bhuyan" ...
$ cgpa : num 6.5 7.2 6.1 8.5 6 5.2 5.5 9 8.75 8
$ gender : chr "male" "female" "male" "female" ...
Note: By default R consider any variable in a data frame with character values as a factor. To override this we use
stringsAsFactors = FALSE
The statistical summary and nature of the data can be obtained by applying summary()
function.
summary(data)
enrollment.no first.name last.name cgpa
Min. : 1.00 Length:10 Length:10 Min. :5.200
1st Qu.: 3.25 Class :character Class :character 1st Qu.:6.025
Median : 5.50 Mode :character Mode :character Median :6.850
Mean : 5.50 Mean :7.075
3rd Qu.: 7.75 3rd Qu.:8.375
Max. :10.00 Max. :9.000
gender
Length:10
Class :character
Mode :character
Extract specific column from a data frame using column name.
= data.frame(data$first.name,data$last.name)
names names
data.first.name data.last.name
1 Biman Deka
2 Prapti Kakati
3 Dipjyoti Chahariya
4 Rashmi Bhuyan
5 Debojit Barua
6 Rocky Baglary
7 Rajesh Bodo
8 Semantika Chakraborty
9 Dhriti Chakraborty
10 Priyanka Agarwal
Extract rows from a data frame
=data[8:10,]
temp temp
enrollment.no first.name last.name cgpa gender
8 8 Semantika Chakraborty 9.00 female
9 9 Dhriti Chakraborty 8.75 female
10 10 Priyanka Agarwal 8.00 female
We can extract both rows and column simultaneously
=data[8:10,2:3]
result result
first.name last.name
8 Semantika Chakraborty
9 Dhriti Chakraborty
10 Priyanka Agarwal
A data frame can be expanded by adding columns and rows.
Just add the column vector using a new column name.
$specialization=c("OR","Demography","OR","OR","Demography",
data"Demography","OR","Demography","OR","Demography")
data
enrollment.no first.name last.name cgpa gender specialization
1 1 Biman Deka 6.50 male OR
2 2 Prapti Kakati 7.20 female Demography
3 3 Dipjyoti Chahariya 6.10 male OR
4 4 Rashmi Bhuyan 8.50 female OR
5 5 Debojit Barua 6.00 male Demography
6 6 Rocky Baglary 5.20 male Demography
7 7 Rajesh Bodo 5.50 male OR
8 8 Semantika Chakraborty 9.00 female Demography
9 9 Dhriti Chakraborty 8.75 female OR
10 10 Priyanka Agarwal 8.00 female Demography
To add more rows permanently to an existing data frame, we need to bring in the new rows in the same structure as the existing data frame and use the rbind()
function.
In the example below we create a data frame with new rows and merge it with the existing data frame to create the final data frame.
=data.frame(
data1enrollment.no=11:15,
first.name=c("Kankan","Reyhan","Simi","Arpita","Chandan"),
last.name=c("Sarma","Ali","Mahanta","Ganguly","Deka"),
cgpa=c(7.8,6.5,5.0,5.5,5.8),
gender=c("male","male","female","female","male"),
specialization=c("OR","Demography","OR","OR","Demography"),
stringsAsFactors = FALSE
)=rbind(data,data1)
dataprint(data)
enrollment.no first.name last.name cgpa gender specialization
1 1 Biman Deka 6.50 male OR
2 2 Prapti Kakati 7.20 female Demography
3 3 Dipjyoti Chahariya 6.10 male OR
4 4 Rashmi Bhuyan 8.50 female OR
5 5 Debojit Barua 6.00 male Demography
6 6 Rocky Baglary 5.20 male Demography
7 7 Rajesh Bodo 5.50 male OR
8 8 Semantika Chakraborty 9.00 female Demography
9 9 Dhriti Chakraborty 8.75 female OR
10 10 Priyanka Agarwal 8.00 female Demography
11 11 Kankan Sarma 7.80 male OR
12 12 Reyhan Ali 6.50 male Demography
13 13 Simi Mahanta 5.00 female OR
14 14 Arpita Ganguly 5.50 female OR
15 15 Chandan Deka 5.80 male Demography
in R we use
rbind
andcbind
to combine two or more vectors row wise or column wise respectively
Another simple way of creating a data frame is:
=1:10
a=21:30
b=cbind(a,b)
data2=as.data.frame(data2)
data2 data2
a b
1 1 21
2 2 22
3 3 23
4 4 24
5 5 25
6 6 26
7 7 27
8 8 28
9 9 29
10 10 30
$total=a+b
data2$average=(a+b)/2
data2 data2
a b total average
1 1 21 22 11
2 2 22 24 12
3 3 23 26 13
4 4 24 28 14
5 5 25 30 15
6 6 26 32 16
7 7 27 34 17
8 8 28 36 18
9 9 29 38 19
10 10 30 40 20
=c(11,31,11+31,21)
c=rbind(data2,c) data2