Tips and Tricks for Data Science

Sending R output to disk files

Sometimes you want to get the results of an analysis from your R console to a word processor. Copy and paste does not work well unless you are prepared to use fixed width font in the target document. The trick is to send the output to a disk file using the sink() command first.

The sink() command

The sink() command allows you to send anything that would have gone to the console (your screen) to a disk file instead.

sink(file = NULL, append = FALSE, split = FALSE)

You need to supply the filename, setting file = NULL closes the connection and stops sink()ing. To add to an existing file use append = TRUE. If you set split = TRUE the output goes to the console and the file you specified.

When you issue the command a file is created, ready to accept the output. If you set append = FALSE and the file already exists, it will be overwritten. If you set file = TRUE a connection is opened and subsequent output goes to the file.

# Send output to screen and file
> sink(file = "Out1.txt", split = TRUE, append = FALSE)

> summary(lm(Fertility ~ . , data = swiss))

Call:
lm(formula = Fertility ~ ., data = swiss)
Residuals:
     Min       1Q   Median       3Q      Max
-15.2743  -5.2617   0.5032   4.1198  15.3213

Coefficients:
                 Estimate Std. Error t value Pr(>|t|)
(Intercept)      66.91518   10.70604   6.250 1.91e-07 ***
Agriculture      -0.17211    0.07030  -2.448  0.01873 *
Examination      -0.25801    0.25388  -1.016  0.31546
Education        -0.87094    0.18303  -4.758 2.43e-05 ***
Catholic          0.10412    0.03526   2.953  0.00519 **
Infant.Mortality  1.07705    0.38172   2.822  0.00734 **
—
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ‘ 1

Residual standard error: 7.165 on 41 degrees of freedom
Multiple R-squared:  0.7067,     Adjusted R-squared:  0.671
F-statistic: 19.76 on 5 and 41 DF,  p-value: 5.594e-10

# Stop sending output to file
> sink(file = NULL)

Note that even if you set append = FALSE subsequent output is appended to the file. Once you issue the command sink(file = NULL) output stops and you can see your file using any kind of text editor.

If you only want to send a single “result” to a disk file you can use the capture.output() command instead.

capture.output(..., file = NULL, append = FALSE)

You provide the commands that will produce the output and the filename. If you set append = TRUE and the target file exists, the output will be added to the file. If you set append = FALSE (the default) the file will be “blanked” and the output will therefore overwrite the original contents.

Note that there is no equivalent of the split argument, all output goes to the file and cannot be “mirrored” to the console. You can supply several commands, separated by commas.

> capture.output(ls(), search(), file = "Out1.txt")

This example sent the ls() command followed by search(), with the results being output to the disk file.

Once you have your output in a text file you can transfer it to your word processor with a little pre-processing via Excel.

Processing sink() output text files

Your sink()ed file will be space separated but not exactly fixed width. In any event you’ll need to open the file in Excel and do a little processing so that you can get the results into Word in table form.

Most times it is the regression or ANOVA table that you want. So, open Excel then File > Open to bring up the Text Import Wizard.

Most times it is the regression or ANOVA table that you want. So, open Excel then File > Open to bring up the Text Import Wizard.

You can add column boundaries with a click and move them by dragging with the mouse. Once you are done you get the results in cells of the spreadsheet.

Now you can copy the cells to the clipboard and switch to Word. In Word you need the Home > Paste button, then you can select various options.

You can also use Paste Special and select the RTF format option, which takes Excel cells and transfers them as table cells in Word.

If you have other elements to transfer, you can deal with them separately. This is not an ideal method but the amount of user intervention is fairly minimal. You might also try opening the file in Word to start with and replacing spaces with Tab characters. You want to keep single spaces as spaces so start by replacing 3-space with 2-space until there are no more double spaces left. Then replace 2-space with Tab (^t in the Word replace box). You’ll need to do some manual editing as this will not produce a perfect result but it will do most of the work automatically. Then save the file and use Excel again, setting the delimiter to Tab.

11th June 2019 aJfsfjlser3f Tips and Tricks Comments Off

Rotating or transposing R objects

Two dimensional R objects include data.frame, matrix and table objects. You can transpose the rows and columns using the t() command. Here is a simple data.frame:

> dat <- as.data.frame(matrix(1:12, ncol = 4))
> colnames(dat) <- LETTERS[1:4]
> dat
  A B C  D
1 1 4 7 10
2 2 5 8 11
3 3 6 9 12

You can rotate the data.frame so that the rows become the columns and the columns become the rows. That is, you transpose the rows and columns. You simply use the t() command.

> t(dat)
  [,1] [,2] [,3]
A    1    2    3
B    4    5    6
C    7    8    9
D   10   11   12

The result of the t() command is always a matrix object.

> dat.t <- t(dat)
> class(dat.t)
[1] "matrix"

You can also rotate a matrix object or a table, as long as the table only has 2 dimensions. These items will have rownames() and colnames() elements (even if empty). You can use the dim() command to look at the dimensions of an object.

> dim(HairEyeColor)
[1] 4 4 2

The HairEyeColor table has more than 2 dimensions and so will not rotate:

> t(HairEyeColor)
Error in t.default(HairEyeColor) : argument is not a matrix

However, you can get part of the table (as 2-dimensions):

> HEC <- as.table(HairEyeColor[,,1])
> HEC
       Eye
Hair    Brown Blue Hazel Green
  Black    32   11    10     3
  Brown    53   50    25    15
  Red      10   10     7     7
  Blond     3   30     5     8

> class(HEC)
[1] "table"

Now it can be rotated:

> t(HEC)
       Hair
Eye     Black Brown Red Blond
  Brown    32    53  10     3
  Blue     11    50  10    30
  Hazel    10    25   7     5
  Green     3    15   7     8

The colnames() and rownames() elements are preserved (but transposed of course), any names() attributes are lost (since the result is a matrix):

> HEC.t <- t(HEC)
> dimnames(HEC.t)
$Eye
[1] "Brown" "Blue"  "Hazel" "Green"

$Hair
[1] "Black" "Brown" "Red"   "Blond"

Summary

Use t() to rotate (transpose) frame, matrix or table objects with 2-dimensions.
Check the dimensions using dim().
The result is always a matrix.
The colnames() and rownames() attributes are preserved (but transposed).
Any names() attributes are lost.

11th June 2019 aJfsfjlser3f Tips and Tricks Comments Off

R Object elements: brackets [], double brackets [[]] and $

Many R objects are composed of multiple elements. There are various ways to extract one (or more) elements from an object, depending on the object itself.

Square brackets []

A simple vector for example is a 1-D object; you can get elements from a vector using the square brackets:

# Make a numeric vector
> data1 <- c(3, 5, 7, 5, 3, 2, 6, 8, 5, 6, 9)
> data1
[1] 3 5 7 5 3 2 6 8 5 6 9

> data1[1] # The first item
[1] 3

> data1[3] # The third item
[1] 7

> data1[1:4] # The first 4 items
[1] 3 5 7 5

> data1[-1] # All except the first
[1] 5 7 5 3 2 6 8 5 6 9

> data1[c(1, 3, 4, 8)] # The 1st, 3rd, 4th, 8th
[1] 3 7 5 8

> data1[data1 > 3] # All items > 3
[1] 5 7 5 6 8 5 6 9

> data1[data1 < 5 | data1 > 7] # Items < 5 OR > 7
[1] 3 3 2 8 9

Multi-dimensional objects and brackets

If your object has 2 dimensions, such as a data.frame or a matrix you can use the same idea but now specify [rows, columns]. Extra dimensions can be supplied if needed (e.g. for a table).

> mymat <- matrix(1:30, ncol = 5, dimnames = list(letters[1:6], LETTERS[1:5]))
> mymat
  A  B  C  D  E
a 1  7 13 19 25
b 2  8 14 20 26
c 3  9 15 21 27
d 4 10 16 22 28
e 5 11 17 23 29
f 6 12 18 24 30

> mymat[2, 3] # Item from 2nd row and 3rd column
[1] 14

> mymat[, 2]  # All rows but only 2nd column
a  b  c  d  e  f
7  8  9 10 11 12

> mymat[3, ] # All columns but only 3rd row
A  B  C  D  E
3  9 15 21 27

> mymat[-1, ]  # All columns and all rows except the first
  A  B  C  D  E
b 2  8 14 20 26
c 3  9 15 21 27
d 4 10 16 22 28
e 5 11 17 23 29
f 6 12 18 24 30

> mymat[, "B"] # All rows and the column named "B"
a  b  c  d  e  f
7  8  9 10 11 12

You can also use conditional statements just like for a vector.

The dollar symbol $

With some objects you can use the $, particularly data.frame and list objects:

> mydf <- data.frame(num = 1:12, mnths = month.abb[1:12], fac = gl(3, 4, labels = c("high", "mid", "low")), let = LETTERS[12:1])

> mydf
   num mnths  fac let
1    1   Jan high   L
2    2   Feb high   K
3    3   Mar high   J
4    4   Apr high   I
5    5   May  mid   H
6    6   Jun  mid   G
7    7   Jul  mid   F
8    8   Aug  mid   E
9    9   Sep  low   D
10  10   Oct  low   C
11  11   Nov  low   B
12  12   Dec  low   A

> mylist <- list(num = 1:6, let = letters[9:1], mnth = month.abb[1:7])
> mylist
$num
[1] 1 2 3 4 5 6

$let
[1] "i" "h" "g" "f" "e" "d" "c" "b" "a"

$mnth
[1] "Jan" "Feb" "Mar" "Apr" "May" "Jun" "Jul"

The $ is used with the element name like so:

> mydf$let
[1] L K J I H G F E D C B A
Levels: A B C D E F G H I J K L

> mydf$num
[1]  1  2  3  4  5  6  7  8  9 10 11 12

> mylist$mnth
[1] "Jan" "Feb" "Mar" "Apr" "May" "Jun" "Jul"

You can also use square brackets. For the data.frame the [row, column] syntax works as it did for the matrix. Giving a single value selects a column.

> mylist[1]
$num
[1] 1 2 3 4 5 6

> mydf[2]
   mnths
1    Jan
2    Feb
3    Mar
4    Apr
5    May
6    Jun
7    Jul
8    Aug
9    Sep
10   Oct
11   Nov
12   Dec

> mydf[2,]
  num mnths  fac let
2   2   Feb high   K

Note that the $ does not work with a matrix object.

Double brackets [[]]

You can use double brackets to select elements in more or less the same way as single brackets. The difference between single and double is that with double brackets any element names are not displayed:

> mydf[[2]]
[1] Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
Levels: Apr Aug Dec Feb Jan Jul Jun Mar May Nov Oct Sep

> mydf[2]
   mnths
1    Jan
2    Feb
3    Mar
4    Apr
5    May
6    Jun
7    Jul
8    Aug
9    Sep
10   Oct
11   Nov
12   Dec

mylist[3]
$mnth
[1] "Jan" "Feb" "Mar" "Apr" "May" "Jun" "Jul"

> mylist[[3]]
[1] "Jan" "Feb" "Mar" "Apr" "May" "Jun" "Jul"

You don’t have to use an index value, the element name gives a similar result:

> mylist[["let"]]
[1] "i" "h" "g" "f" "e" "d" "c" "b" "a"

> mylist["let"]
$let
[1] "i" "h" "g" "f" "e" "d" "c" "b" "a"

Summary

So, the $ can be used with vector and data.frame objects and the [] with more or less any object. Use [[]] to suppress the element name.

11th June 2019 aJfsfjlser3f Tips and Tricks Comments Off

Row & column names using dimnames() in R

The dimnames() command can set or query the row and column names of a matrix. Unlike rownames() or colnames() the dimnames() command operates on both rows and columns at once. If you use it to set the names you need to specify the names for the rows and columns in that order) in a list.

> m1 <- matrix(1:12, nrow = 3)
> dimnames(m1) <- list(month.abb[1:3], month.abb[4:7])
> m1
    Apr May Jun Jul
Jan   1   4   7  10
Feb   2   5   8  11
Mar   3   6   9  12

The dimnames() command retrieves the names like so:

> dimnames(m1)
[[1]]
[1] "Jan" "Feb" "Mar"

[[2]]
[1] "Apr" "May" "Jun" "Jul"

Notice the double square braces acting as element names. You can get one element by using the square brace notation:

> dimnames(m1)[1]
[[1]]
[1] "Jan" "Feb" "Mar"

> dimnames(m1)[[1]]
[1] "Jan" "Feb" "Mar"

When you use single braces you get the “name” of the element. If you use double braces you do not. The dimnames() command will work on matrix, array or data.frame objects.

You can use the command to set a single element but you need to use the double braces to get it to work:

> dimnames(m1)[[1]] <- letters[1:3]
m1
  Apr May Jun Jul
a   1   4   7  10
b   2   5   8  11
c   3   6   9  12

Note that you cannot set a single element to NULL although you can set all elements to NULL:

> dimnames(m1)[[1]] <- NULL
Error in dimnames(m1)[[1]] <- NULL :
length of 'dimnames' [1] not equal to array extent

> dimnames(m1) <- list(NULL, NULL)
> m1
     [,1] [,2] [,3] [,4]
[1,]    1    4    7   10
[2,]    2    5    8   11
[3,]    3    6    9   12

11th June 2019 aJfsfjlser3f Tips and Tricks Comments Off

Naming rows and columns of a matrix in R

You make a matrix using matrix()rbind() or cbind() commands. The names of the rows and columns can be set after the matrix is produced in various ways:

rownames() – sets the row names
colnames() – sets the column names
dimnames() – sets both row and column names in one command

The rownames() and colnames() commands set the row and column names respectively:

> m1 <- matrix(1:12, nrow = 3)
> m1
     [,1] [,2] [,3] [,4]
[1,]    1    4    7   10
[2,]    2    5    8   11
[3,]    3    6    9   12

> rownames(m1)  <- letters[26:24]
> m1
  [,1] [,2] [,3] [,4]
z    1    4    7   10
y    2    5    8   11
x    3    6    9   12

> colnames(m1)  <- LETTERS[26:23]
> m1
  Z Y X  W
z 1 4 7 10
y 2 5 8 11
x 3 6 9 12

The commands can also query the names:

> rownames(m1)
[1] "z" "y" "x"

> colnames(m1)
[1] "Z" "Y" "X" "W"

Note that the basic names() command does not work for matrix objects:

> names(m1)
NULL

If you use the rbind() command then the rows will be named according to the data names that you use unless you specify otherwise:

> d1 <- 1:4 ; d2 = 5:8 ; d3 = 9:12

> rbind(d1, d2, d3)
   [,1] [,2] [,3] [,4]
d1    1    2    3    4
d2    5    6    7    8
d3    9   10   11   12

> rbind(Row1 = d1, Row2 = d2, Row3 = d3)
     [,1] [,2] [,3] [,4]
Row1    1    2    3    4
Row2    5    6    7    8
Row3    9   10   11   12

Similarly with the cbind() command the columns take the names of the objects unless you specify them explicitly.

You can set the row and column names in one go using the dimnames() command. This requires a list() of two items (the row names and the column names):

> m1
  Z Y X  W
z 1 4 7 10
y 2 5 8 11
x 3 6 9 12

> dimnames(m1) <- list(letters[1:3], LETTERS[1:4])
> m1
  A B C  D
a 1 4 7 10
b 2 5 8 11
c 3 6 9 12

If you use the matrix() command you can incorporate the dimnames() command within it to set the names:

> m3 <- matrix(1:12, nrow = 3, dimnames = list(month.abb[1:3], month.abb[4:7]))
> m3
    Apr May Jun Jul
Jan   1   4   7  10
Feb   2   5   8  11
Mar   3   6   9  12

Of course you can also use the dimnames() command to view the current names – more on this another time.

11th June 2019 aJfsfjlser3f Tips and Tricks Comments Off

Make a matrix in R

A matrix is a 2D object with rows and columns that contains data all of the same kind, e.g. all numbers or all text. You can make a matrix in various ways:

The matrix() command – this splits a vector into rows and columns.
The rbind() command – this takes several items and joins them together as rows. You usually join several vectors but you can also use this to join an existing matrix to another matrix or vectors.
The cbind() command – this is similar to rbind() but operates over columns.

The matrix() command assumes that the vector of data will split “nicely” into rows and columns; if it doesn’t then you may have to add NA items at the end to make it factorise.

> dat <- 1:12
> dat
[1]  1  2  3  4  5  6  7  8  9 10 11 12

> matrix(dat, ncol = 3)
     [,1] [,2] [,3]
[1,]    1    5    9
[2,]    2    6   10
[3,]    3    7   11
[4,]    4    8   12

> matrix(dat, ncol = 3, byrow = TRUE)
     [,1] [,2] [,3]
[1,]    1    2    3
[2,]    4    5    6
[3,]    7    8    9
[4,]   10   11   12

The vector is split column-wise unless you specify that you want to fill row-wise using the byrow = TRUEparameter.

The rbind() and cbind() commands join items as rows or columns:

> d1 <- 1:4 ; d2 = 5:8 ; d3 = 9:12
> d1 ; d2 ; d3
[1] 1 2 3 4
[1] 5 6 7 8
[1]  9 10 11 12

> rbind(d1, d2, d3)
   [,1] [,2] [,3] [,4]
d1    1    2    3    4
d2    5    6    7    8
d3    9   10   11   12

> cbind(d1, d2, d3)
d1 d2 d3
[1,]  1  5  9
[2,]  2  6 10
[3,]  3  7 11
[4,]  4  8 12

The rows or columns have names taken from the original vector names. You can also use the commands to add rows or columns to an existing matrix, as long as the dimensions are appropriate:

> mat <- cbind(d1, d2, d3)
> mat
     d1 d2 d3
[1,]  1  5  9
[2,]  2  6 10
[3,]  3  7 11
[4,]  4  8 12

> cbind(mat, d1)
     d1 d2 d3 d1
[1,]  1  5  9  1
[2,]  2  6 10  2
[3,]  3  7 11  3
[4,]  4  8 12  4

You can alter the names of the rows and columns afterwards.

11th June 2019 aJfsfjlser3f Tips and Tricks Comments Off

Multi-dimensional objects in R

A vector is a one-dimensional object in R. Usually the vector looks like a row but it really acts like a column. When you make more complicated objects you add more dimensions, commonly used multi-dimensional objects are:

matrix
frame
table
array
list

A matrix is a 2-dimensional object with rows and columns. A data.frame is also 2-dimensional with rows and columns. A table can have more than 2 dimensions – a three-dimensional table would appear as several two-dimensional tables. An array is similar to a table (the difference is largely how you build it to start with). A list is the most “primitive” of the objects and is a loose collection of other objects, bundled together.

A matrix contains rows and columns but all the data in the matrix must be of the same type, that is, all numbers or all text. You can think of a matrix as a single vector that happens to be split up into rows and columns. In fact, one way to make a matrix is to do just that:

> matrix(1:12, ncol = 4)

     [,1] [,2] [,3] [,4]
[1,]    1    4    7   10
[2,]    2    5    8   11
[3,]    3    6    9   12

A data.frame also contains rows and columns but the columns can be of different types, so you can have one column that is numeric, one that is character and one that is a factor.

a <- 1:4
b <- c("one", "two", "three", "four")
c <- gl(2, 2, labels = c("high", "low"))

> data.frame(b, a, c)
      b a    c
1   one 1 high
2   two 2 high
3 three 3  low
4  four 4  low

You can use the class() command to tell you what kind of object you are dealing with.

11th June 2019 aJfsfjlser3f Tips and Tricks Comments Off

Vector Objects in R

A vector is a one-dimensional object in R. There is no class specific for a vector but a vector object can be one of several classes, most usually:

numeric
character
logical
complex

Your numeric vector can be given as double or integer. Note that a factor is not considered to be a vector.

You can make a blank vector using the vector() command:

vector(mode = "logical", length = 0)

For example:

> vl <- vector(mode = "logical", length = 3)
> vl
[1] FALSE FALSE FALSE

> vc <- vector(mode = "character", length = 5)
> vc
[1] "" "" "" "" ""

> vn = vector(mode = "numeric", length = 4)
> vn
[1] 0 0 0 0

> vi = vector(mode = "complex", length = 3)
> vi
[1] 0+0i 0+0i 0+0i

The blank vector you create can be “filled in” later:

> vi[1] <- 3+2i
> vi
[1] 3+2i 0+0i 0+0i

Notice though that the value you enter can either be coerced to the appropriate form or alter the class of the vector:

> vc[1] <- 23
> vc
[1] "23" "" "" "" ""

The value entered was a number (23) but the vector it is added to remains character in class. In the next example you create an integer vector:

> vi = vector(mode = "integer", length = 3)
> vi
[1] 0 0 0

> class(vi)
[1] "integer"

Adding a value can alter the class:

> vi[1] <- 2
> vi
[1] 2 0 0

> class(vi)
[1] "numeric"

In this case the value (2) is taken as a regular double precision numeric value. To force the value to remain an integer you need to use the as.integer() command:

> vi <- vector(mode = "integer", length = 3)
> vi[1] <- as.integer(2)
> vi
[1] 2 0 0

> class(vi)
[1] "integer"

Now the entered value keeps its integer class.

11th June 2019 aJfsfjlser3f Tips and Tricks Comments Off

Types of R object – 3. complex numbers

All R objects have a class attribute, which can be important as R “decides” how to deal with objects based upon their class.

You can think of the simple classes as being in different categories:

Basic: numeric, character or factor
Logical: TRUE or FALSE
Complex: A number with real and imaginary parts

Object type: complex

You can make a complex number simply by appending an “imaginary” part to an actual number:

> newvec <- c(1+1i, 2+3i)
> newvec
[1] 1+1i 2+3i

So, R recognises 2+3i for example as a complex number with real part = 2, imaginary part = 3

The class() command shows that the result is complex:

> class(newvec)
[1] "complex"

There are several commands associated with complex numbers:

# The real part of the number
> Re(newvec)
[1] 1 2

# The imaginary part
> Im(newvec)
[1] 1 3

# The modulus
> Mod(newvec)
[1] 1.414214 3.605551

# The argument
> Arg(newvec)
[1] 0.7853982 0.9827937

# The complex conjugate
> Conj(newvec)
[1] 1-1i 2-3i

The commands help in dealing with complex numbers. In addition, the elementary trigonometric, logarithmic, exponential, square root and hyperbolic functions are implemented for complex values.

11th June 2019 aJfsfjlser3f Tips and Tricks Comments Off

Types of R object – 2. logical

All R objects have a class attribute, which can be important as R “decides” how to deal with objects based upon their class.

Object type: logical

Simple 1-dimensional objects are called vectors but vector is not a class in itself. A vector can be numeric or character in nature (these are class attributes). You can also have a logical class, which is either TRUE or FALSE.

> lv
[1] FALSE TRUE FALSE FALSE FALSE

> class(lv)
[1] "logical"

You can create a logical vector by using TRUE or FALSE in a command like c() for example:

> lv2 <- c(TRUE, FALSE, FALSE, TRUE)
> lv2
[1] TRUE FALSE FALSE TRUE

R recognises TRUE and FALSE as logical. You must use upper case but you can abbreviate using T or F:

> lv3 <- c(T, T, F, F)
> lv3
[1] TRUE TRUE FALSE FALSE

If you are typing commands for yourself then using abbreviations is fine but when your commands are destined to be seen by others, then it is a good idea to use the “full” version.

11th June 2019 aJfsfjlser3f Tips and Tricks Comments Off

Currently browsing: Tips and Tricks

Sending R output to disk files

The sink() command

Processing sink() output text files

Rotating or transposing R objects

Summary

R Object elements: brackets [], double brackets [[]] and $

Square brackets []

Multi-dimensional objects and brackets

The dollar symbol $

Double brackets [[]]

Summary

Row & column names using dimnames() in R

Naming rows and columns of a matrix in R

Make a matrix in R

Multi-dimensional objects in R

Vector Objects in R

Types of R object – 3. complex numbers

Object type: complex

Types of R object – 2. logical

Object type: logical

Recent Posts

Categories

Categories