Data Frame

DATA FRAME

A data frame is a table or a two-dimensional array-like structure in which each column contains values of one variable and each row contains one set of values from each column.

Following are the characteristics of a data frame.

The column names should be non-empty.
The row names should be unique.
The data stored in a data frame can be of numeric, factor or character type.
Each column should contain same number of data items.

Creating a data frame

data=data.frame(
  enrollment.no=01:10,
  first.name=c("Biman","Prapti","Dipjyoti","Rashmi","Debojit",
               "Rocky","Rajesh","Semantika","Dhriti","Priyanka"),
  last.name=c("Deka","Kakati","Chahariya","Bhuyan","Barua",
              "Baglary","Bodo","Chakraborty","Chakraborty","Agarwal"),
  cgpa=c(6.5,7.2,6.1,8.5,6.0,5.2,5.5,9.0,8.75,8.0),
  gender=c("male","female","male","female","male","male",
           "male","female","female","female"),
  stringsAsFactors = FALSE
)
print(data)

   enrollment.no first.name   last.name cgpa gender
1              1      Biman        Deka 6.50   male
2              2     Prapti      Kakati 7.20 female
3              3   Dipjyoti   Chahariya 6.10   male
4              4     Rashmi      Bhuyan 8.50 female
5              5    Debojit       Barua 6.00   male
6              6      Rocky     Baglary 5.20   male
7              7     Rajesh        Bodo 5.50   male
8              8  Semantika Chakraborty 9.00 female
9              9     Dhriti Chakraborty 8.75 female
10            10   Priyanka     Agarwal 8.00 female

The structure of the data frame can be seen by using str() function.

str(data)

'data.frame':   10 obs. of  5 variables:
 $ enrollment.no: int  1 2 3 4 5 6 7 8 9 10
 $ first.name   : chr  "Biman" "Prapti" "Dipjyoti" "Rashmi" ...
 $ last.name    : chr  "Deka" "Kakati" "Chahariya" "Bhuyan" ...
 $ cgpa         : num  6.5 7.2 6.1 8.5 6 5.2 5.5 9 8.75 8
 $ gender       : chr  "male" "female" "male" "female" ...

Note: By default R consider any variable in a data frame with character values as a factor. To override this we use stringsAsFactors = FALSE

The statistical summary and nature of the data can be obtained by applying summary() function.

summary(data)

 enrollment.no    first.name         last.name              cgpa      
 Min.   : 1.00   Length:10          Length:10          Min.   :5.200  
 1st Qu.: 3.25   Class :character   Class :character   1st Qu.:6.025  
 Median : 5.50   Mode  :character   Mode  :character   Median :6.850  
 Mean   : 5.50                                         Mean   :7.075  
 3rd Qu.: 7.75                                         3rd Qu.:8.375  
 Max.   :10.00                                         Max.   :9.000  
    gender         
 Length:10         
 Class :character  
 Mode  :character

Extract Data from Data Frame

Extract specific column from a data frame using column name.

names = data.frame(data$first.name,data$last.name)
names

   data.first.name data.last.name
1            Biman           Deka
2           Prapti         Kakati
3         Dipjyoti      Chahariya
4           Rashmi         Bhuyan
5          Debojit          Barua
6            Rocky        Baglary
7           Rajesh           Bodo
8        Semantika    Chakraborty
9           Dhriti    Chakraborty
10        Priyanka        Agarwal

Extract rows from a data frame

temp=data[8:10,]
temp

   enrollment.no first.name   last.name cgpa gender
8              8  Semantika Chakraborty 9.00 female
9              9     Dhriti Chakraborty 8.75 female
10            10   Priyanka     Agarwal 8.00 female

We can extract both rows and column simultaneously

result=data[8:10,2:3]
result

   first.name   last.name
8   Semantika Chakraborty
9      Dhriti Chakraborty
10   Priyanka     Agarwal

Expand Data Frame

A data frame can be expanded by adding columns and rows.

Add Column

Just add the column vector using a new column name.

data$specialization=c("OR","Demography","OR","OR","Demography",
                      "Demography","OR","Demography","OR","Demography")

data

   enrollment.no first.name   last.name cgpa gender specialization
1              1      Biman        Deka 6.50   male             OR
2              2     Prapti      Kakati 7.20 female     Demography
3              3   Dipjyoti   Chahariya 6.10   male             OR
4              4     Rashmi      Bhuyan 8.50 female             OR
5              5    Debojit       Barua 6.00   male     Demography
6              6      Rocky     Baglary 5.20   male     Demography
7              7     Rajesh        Bodo 5.50   male             OR
8              8  Semantika Chakraborty 9.00 female     Demography
9              9     Dhriti Chakraborty 8.75 female             OR
10            10   Priyanka     Agarwal 8.00 female     Demography

Add Row

To add more rows permanently to an existing data frame, we need to bring in the new rows in the same structure as the existing data frame and use the rbind() function.

In the example below we create a data frame with new rows and merge it with the existing data frame to create the final data frame.

data1=data.frame(
  enrollment.no=11:15,
  first.name=c("Kankan","Reyhan","Simi","Arpita","Chandan"),
  last.name=c("Sarma","Ali","Mahanta","Ganguly","Deka"),
  cgpa=c(7.8,6.5,5.0,5.5,5.8),
  gender=c("male","male","female","female","male"),
  specialization=c("OR","Demography","OR","OR","Demography"),
  stringsAsFactors = FALSE
)
data=rbind(data,data1)
print(data)

   enrollment.no first.name   last.name cgpa gender specialization
1              1      Biman        Deka 6.50   male             OR
2              2     Prapti      Kakati 7.20 female     Demography
3              3   Dipjyoti   Chahariya 6.10   male             OR
4              4     Rashmi      Bhuyan 8.50 female             OR
5              5    Debojit       Barua 6.00   male     Demography
6              6      Rocky     Baglary 5.20   male     Demography
7              7     Rajesh        Bodo 5.50   male             OR
8              8  Semantika Chakraborty 9.00 female     Demography
9              9     Dhriti Chakraborty 8.75 female             OR
10            10   Priyanka     Agarwal 8.00 female     Demography
11            11     Kankan       Sarma 7.80   male             OR
12            12     Reyhan         Ali 6.50   male     Demography
13            13       Simi     Mahanta 5.00 female             OR
14            14     Arpita     Ganguly 5.50 female             OR
15            15    Chandan        Deka 5.80   male     Demography

in R we use rbind and cbind to combine two or more vectors row wise or column wise respectively

Another simple way of creating a data frame is:

a=1:10
b=21:30
data2=cbind(a,b)
data2=as.data.frame(data2)
data2

data2$total=a+b
data2$average=(a+b)/2
data2

    a  b total average
1   1 21    22      11
2   2 22    24      12
3   3 23    26      13
4   4 24    28      14
5   5 25    30      15
6   6 26    32      16
7   7 27    34      17
8   8 28    36      18
9   9 29    38      19
10 10 30    40      20

c=c(11,31,11+31,21)
data2=rbind(data2,c)