The 3 Rs Reading

You can probably think of “reading matter” as being one of three sorts:

data
R objects
scripts

Data is a general term and by this I mean numbers and characters that you might reasonably suppose could be handled by a spreadsheet. Your data are what you use in your analyses.

R objects may be data or other things, such as custom R commands. They are usually stored (on disk) in a format that can only be read by R but sometimes they may be in text form.

Scripts are collections of R commands that are designed to be run “automatically”. They are generally saved (on disk) in text format.

Reading – using the keyboard

The simplest way to “read” data into R is to type it yourself using the keyboard. The c() command is the simplest method. Think of c for “combine” or “concatenate”. To make some data you type them in the command and separate the elements with commas:

> dat1 <- c(2, 3,4,5, 7, 9)
> dat1
[1] 2 3 4 5 7 9

Note that R ignores any spaces.

If you want text values you will have to enclose each element in quotes, you can use single or double quotes as long as they “match”.

> dat2 <- c("First", 'Second', "Third")
> dat2
[1] "First"  "Second" "Third"

When you “display” your text (character) data R includes the quotes (R always shows double quotes). You can use the c() command to join many items, not just numbers or character data. However, here you’ll see it used simply for “reading” regular data. The result of the c() command is generally a vector, a 1-dimensional data object. You can add to existing data but if you “mix” data types you may not get what you expect:

> dat3 <- c(1, 2, 3, "First", "Second", "Third")
> dat3
[1] "1"      "2"      "3"      "First"  "Second" "Third"

R has converted the numbers to text (character) values.

Use scan()

Typing the commas and/or quotes can be a pain, especially when you have more than a few items to enter. This is where the scan() command comes in useful. The scan() command is quite flexible and can read the keyboard, clipboard and files on disk. The simplest way to use scan() is for entering numbers. You invoke the command and then type numerical values, separated by spaces. You can press the ENTER key at any time but the data entry will not stop until you press ENTER on a blank line (i.e. you press ENTER twice).

> dat4 <- scan()
1: 3 6 7 12 13 9 8 4
9: 7 7 8 12 5
14: 4
15:
Read 14 items

> dat4
[1]  3  6  7 12 13  9  8  4  7  7  8 12  5  4

In the preceding example eight values were typed and then the ENTER pressed. You can see that the R cursor changes from the >. This helps you to keep track of how many items you’ve typed. After the first eight items we typed five more (the line beginning with 9:) and then pressed ENTER again. R displayed 14: to show us that the next item would be the 14th. One value was typed and then ENTER pressed twice to complete the process.

If you want to type character data you must add the what = “character” parameter to the command, then R “knows” what to expect and you do not need to type quotes:

> dat5 <- scan(what = "character")
1: Mon Tue Wed Thu Fri
6: Sat Sun
8:
Read 7 items

> dat5
[1] "Mon" "Tue" "Wed" "Thu" "Fri" "Sat" "Sun"

You can specify the separator character using the sep = parameter, although the default space seems adequate! If you are so inclined, you can also specify an alternative character for the decimal point via the dec = parameter.

> dat6 <- scan(sep = "", dec = ",")
1: 34 4,6
3: 1,4 7,34
5:
Read 4 items 

> dat6
[1] 34.00  4.60  1.40  7.34

Notice that the default is actually two quotes with no space between! The sep and dec parameters are more useful for reading data from clipboard or disk files, as you will see next.

Reading – using the clipboard

The scan() command can read the clipboard. This can be useful to shift larger quantities of data into R. You’ll probably have data in a spreadsheet but any program that can display text will allow you to copy to the clipboard.

Data in a spreadsheet

If you open the data is a spreadsheet then you can copy a column of data and use scan() as if you were going to type it from the keyboard. You do not need to specify the separator character (the default “” will suffice) when using a spreadsheet but you can specify the decimal point character if necessary.

Opening a text file in Excel allows you to see the data separator.

If you need to copy items by row, then you cannot do it using a spreadsheet unless you have a Mac. To get a column of data into R you simply copy the data to the clipboard:

If you need to copy items by row, then you cannot do it using a spreadsheet unless you have a Mac. To get a column of data into R you simply copy the data to the clipboard:

Then switch to R and use the scan() command:

> dat7 <- scan()
1: 34
2: 32
3: 30
4: 36
5: 36
6: 36
7: 41
8: 35
9: 30
10: 34
11:
Read 10 items

To finish you press ENTER on a blank line, this means you could copy more data and append it if you wish.

Multiple paste

Once you’ve pasted some data into R using scan() you finish the operation by pressing ENTER on a blank line. Usually this means that you have to press ENTER twice. If you want to add more data, you can simply press ENTER once and copy/paste more stuff.

Rows of data

If the data are in a spreadsheet or are separated by TAB characters you cannot read rows of data (unless you have a Mac). If you open the data in a text editor (Notepad or Wordpad for example) then you’ll be able to select and copy multiple rows of data as long as the items are not TAB separated. In the following example Notepad is used to open a comma separated file. Several rows are copied to the clipboard:

The scan() command can now be used, using the sep = “,” parameter as the data are comma separated:

> dat8 <- scan(sep = “,”)
1: 34,35,33,26,35,35,35,32,38,29
11: 32,34,33,29,37,37,31,37,31,28
21: 30,38,34,33,33,32,29,35,29,30
31:
Read 30 items

In total 30 data elements were read (note that the data are read row by row). ENTER was pressed after the first paste operation but you could add more data, completing data entry by pressing ENTER on a blank line.

End of Line characters

Text files can be split into lines using several methods. In general Linux and Mac use LF whereas Windows uses CR/LF (older Mac used CR). This can mean that if you use Windows and Notepad to open a file the data are not displayed “neatly” and everything appears on one line.

Wordpad is built in to Windows and this will handle alternative line endings quite adequately.

The Notepad++ program will also handle alternative end of line characters and allows you to alter the EOL setting for a file. It also supports syntax highlighting.

On a Mac the default TextEdit app will deal with alternative line endings but BBEdit is a useful editor, which also supports syntax highlighting.

If you use Linux, then it is likely that your default text editor will deal with any EOL setting and also allow you to alter the encoding.

Converting separators

If you have data in TAB separated format and want to convert it to simple space separated or comma separated format, then the easiest way is to open the data in your spreadsheet. Then use File > Save As.. to save the file in a new format.

The preceding example shows Excel 2010 but other versions of Excel have similar options. OpenOffice and LibreOffice also allow saving of files in various formats.

The TAB separated format is useful for displaying items for people to read but not so good for copy/paste. Comma separated files are generally most useful and can be produced by many programs. Space separated files are somewhere in-between.

Of course there is no need to convert TAB separated files, they are only a problem if you are going to use copy/paste. R can handle TAB separators quite easily if the file is to be read directly from disk.

Reading – from disk files

If you have a data file on disk, you can read it into R without opening it first. There are two main options:

the scan() command
the table() command

Generally speaking you’d use scan() to read a data file and make a vector object in R. That is, you use it to get/make a single sample (a 1-D object). You use the read.table() command to access a file with multiple columns, each one representing a separate sample or variable. The result is a data.frame object in R (a 2-D object with rows and columns).

The read.table() command has many options and there are several convenience functions, such as read.csv(), allowing you to read some popular formats with appropriate defaults.

Use scan() to read a disk file

To get scan() to read a file from disk you simply specify the filename (in quotes) in the command. You can set the separator character and decimal point character as appropriate using sep = and dec = parameters. If you are using Windows or Mac you can use file.choose() instead of the filename:

> dat8 <- scan(file.choose())
Read 100 items

The file.choose() part opens a browser-like window and permits you to select the file you require.

The file selected in the preceding example was called L1.txt and you can view this in your browser. It is simply separated with spaces. The result is a vector of 100 data items even though the original file seems composed of ten columns.

If you have a file separated with commas, like L2.txt, then you simply specify sep = “,” like so:

> dat9 <- scan(file.choose(), sep = “,”)
Read 100 items

The result is the same, a vector of 100 items. When the separator character is a TAB (e.g. L3.txt) you can use the sep = “\t” parameter.

> dat10 <- scan(file.choose(), sep = “\t”)
Read 100 items

Sometimes your data may include comments, these are helpful to remind you what the data were but you need to be able to deal with them.

Comments in data files

It is helpful to include comments in plain data files. This helps you (and others) to determine what the data represent without having to look elsewhere. The scan() command can “read and ignore” rows that are “flagged” as comments.

You can set rows that begin with a special character to be ignored using the comment.char = parameter. Of course you could use any character as a comment marker but in R the # is used so that would seem sensible.

> dat11 <- scan(file.choose(), comment.char = “#”)
Read 100 items

Have a look at the L1c.txt file. This has three comment lines at the start. The file itself is space separated.

Working directory

If you do not use file.choose() or you have Linux OS then you’ll have to specify the filename exactly (in quotes). R uses a working directory, where it stores files and where it looks for items to read. You can see the current working directory using getwd():

> getwd()
[1] "/Users/markgardener"

> getwd()
[1] "C:/Users/Mark/Documents"

You can set the working directory to a new folder by specifying the new filepath (in quotes) in the setwd() command:

> getwd() # Get the current wd
[1] "/Users/markgardener"

> setwd("Desktop") # Set to new value

> getwd() # Check that wd is set as expected
[1] "/Users/markgardener/Desktop"

> setwd("/Users/markgardener") # set back to original

> getwd()
[1] "/Users/markgardener"

You can “reset” your working directory using the tilde:

> setwd("~")
> getwd()
[1] "/Users/markgardener"

If you use Windows you can use choose.dir() in the setwd() command to interactively set a working directory like so:

> setwd(choose.dir())

Reading from URL

You can use scan() to read a file directly from the Internet if you have the URL, use that instead of the filename. The URL needs to be complete and in quotes:

Note that you cannot use spaces in the URL. If you copy/paste the URL from the website any spaces will be shown as %20:

> dat12 <- scan(file = “http://www.mysite/data%20files/test.txt”)
Read 100 items

You can use all the other parameters that you’ve seen before. Have a go yourself using these files:

txt – a space separated file
txt – a space separated file with comments
txt – a comma separated file
txt – a TAB separated file

If you right-click a hyperlink you can copy the URL to the clipboard.

Reading multi-column files

The scan() command reads files and stores the result as a vector. However, many data files will contain multiple columns that represent different variables. In this case you don’t want a single vector item but a data.frame, which reflects the rows and columns of the original data.

The read.table() command is a general tool for reading files from disk (or URL) and making data.frame results. You can set decimal point characters and separators in a similar way to scan(). However, you can also set names for columns and/or rows if required.

There are several convenience commands to help in reading “popular” formats:

csv() – this reads comma separated files
csv2() – uses semi-colon as the separator and the comma as a decimal point
delim() – uses TAB as a separator
delim2() – uses TAB as a separator and comma as decimal point

The read.csv() command is probably the most generally useful command as CSV format is the most widely used and can be saved by most programs.

The files L1.txt, L2.txt, L3.txt and L1c.txt are all ten columns (and 10 rows) in size. Previously you saw how the scan() command can read these files and create a single sample, as a vector object, from the data. By contrast, the read.table() command treats each column as a separate variable:

> dat12 <- read.table(file = “L1.txt”)
> dat12
   V1 V2 V3 V4 V5 V6 V7 V8 V9 V10
1  34 35 33 26 35 35 35 32 38  29
2  32 34 33 29 37 37 31 37 31  28
3  30 38 34 33 33 32 29 35 29  30
4  36 32 31 33 35 35 31 34 29  29
5  36 34 37 28 27 32 28 32 36  32
6  36 31 33 38 33 32 36 33 34  31
7  41 34 30 27 34 34 29 27 37  31
8  35 34 27 35 28 31 31 36 32  32
9  30 33 36 34 29 33 28 34 37  34
10 34 33 31 30 36 26 29 35 34  35

In the preceding example the L1.txt file was used, this is space separated so the default sep = ” ” need not be specified. Notice how R has “invented” names for the columns (the original file does not have column names). If the data are comma separated you can use the sep = “,” parameter:

> dat13 <- read.table(file = "L2.txt", sep = ",")

Any comment lines are dealt with by the comment.char parameter:

> dat14 <- read.table(file = "L1c.txt", sep = "", comment.char = "#")

The read.table() command assumes that there are no column names in the original data. You can assign names as part of the read.table() command using the col.names parameter. You need to specify exactly the correct number of names.

> cn = paste("X", 1:10, sep = "") # Make some names
> cn
[1] "X1"  "X2"  "X3"  "X4"  "X5"  "X6"  "X7"  "X8"  "X9"  "X10"

> dat15 <- read.table(file = "L1.txt", col.names = cn)

> head(dat15, n = 3)
   X1 X2 X3 X4 X5 X6 X7 X8 X9 X10
1  34 35 33 26 35 35 35 32 38  29
2  32 34 33 29 37 37 31 37 31  28
3  30 38 34 33 33 32 29 35 29  30

You can always alter the names afterwards if you prefer.

Column headings

In most cases your data will contain column headings. You can set these column headers using header = TRUE in the read.table() command.

> dat16 <- read.table(file = “L6.txt”, header = TRUE)
> dat16
   Smpl1 Smpl2 Smpl3 Smpl4 Smpl5 Smpl6 Smpl7 Smpl8 Smpl9 Smpl10
1     34    35    33    26    35    35    35    32    38     29
2     32    34    33    29    37    37    31    37    31     28
3     30    38    34    33    33    32    29    35    29     30
4     36    32    31    33    35    35    31    34    29     29
5     36    34    37    28    27    32    28    32    36     32
6     36    31    33    38    33    32    36    33    34     31
7     41    34    30    27    34    34    29    27    37     31
8     35    34    27    35    28    31    31    36    32     32
9     30    33    36    34    29    33    28    34    37     34
10    34    33    31    30    36    26    29    35    34     35

The file L6.txt is space separated. If you have TAB or comma separated you simply alter the sep parameter as appropriate. Try the following files for yourself:

txt – a comma separated file
txt – a TAB separated file
txt – a space separated file

Use “\t” to represent a TAB character:

> dat17 <- read.table(file = "L5.txt", header = TRUE, sep = "\t")

Note that R will check to see that the column names are “valid”, and will alter the names as necessary. You can over-ride this process using the check.names = FALSE parameter. This can be useful to allow numbers to be used as headings (such as years, e.g. 2001, 2002). However, it is a good idea to check the original datafile first.

Sometimes your data will have row headings, if so you will need an additional parameter, row.names.

Row headings

If your data contains row headings you can use the row.names parameter to instruct the read.table() command which column contains the headings (usually the first).

> dat18 <- read.table(file = “L7.txt”, sep = “”, header = TRUE, row.names = 1)
> dat18
      Smpl1 Smpl2 Smpl3 Smpl4 Smpl5 Smpl6 Smpl7 Smpl8 Smpl9 Smpl10
Obs1     34    35    33    26    35    35    35    32    38     29
Obs2     32    34    33    29    37    37    31    37    31     28
Obs3     30    38    34    33    33    32    29    35    29     30
Obs4     36    32    31    33    35    35    31    34    29     29
Obs5     36    34    37    28    27    32    28    32    36     32
Obs6     36    31    33    38    33    32    36    33    34     31
Obs7     41    34    30    27    34    34    29    27    37     31
Obs8     35    34    27    35    28    31    31    36    32     32
Obs9     30    33    36    34    29    33    28    34    37     34
Obs10    34    33    31    30    36    26    29    35    34     35

Notice that the default is sep = “”, this is used for space separated files but you do not need to put a space between the quotes! The row.names parameter needs a single numerical value, to identify the column that contains the headings. If the separator character is different you simply change the sep parameter. Try the following files for yourself:

txt – a space separated file
txt – a comma separated file
txt – a TAB separated file

All these files contain both row and column names.

Usually you give a column number in the row.names parameter. However, there are several ways to specify the row headings:

A number corresponding to the column containing the names
The name of the column header (in quotes) containing the names
A vector of names to use as row headings/names

If the data are simple space delimited and contain column headings in all columns except the first (so column 1 is 1 element shorter than the other columns) it will be treated as a column of row headings (that is you do not need to specify row.names). In practice it is best to use a definite value for row.names. You can “force” simple numbered rows by setting row.names = NULL.

The read.xxx() convenience commands

There are four convenience functions, designed to make things a bit easier when dealing with some common formats.

csv() – this reads comma separated files
csv2() – uses semi-colon as the separator and the comma as a decimal point
delim() – uses TAB as a separator
delim2() – uses TAB as a separator and comma as decimal point

All these commands have header = TRUE and comment = “”, set as default. This makes things simpler, the following commands achieve the same result:

> read.csv("L8.txt", row.names = 1)
> read.table("L8.txt", sep = ",", header = TRUE, row.names = 1)
> read.delim2("L8.txt", sep = ",", header = TRUE, row.names = 1, dec = ".")

You use any of the read.xxxx() commands as long as you over-ride the defaults. See the help entry for read.table() by typing help(read.table) in R.

Column formats

When you read a multi-column file into R the various columns are “inspected” and converted to an appropriate type. Generally, any character columns are converted to factor variables whilst numbers are kept as numbers. This is usually an acceptable process, since in most analytical situations you want character values to be treated as variable factors.

However, there may be times when you want to alter the default behaviour and read columns in a particular format. Here is a data file with several columns:

     count treat day int
obs1    23    hi mon   1
obs2    17    hi tue   1
obs3    16    hi wed   1
obs4    14    lo mon   2
obs5    11    lo tue   2
obs6    19    lo wed   2

The first column can act as row headers/names. The “count” column is fairly obviously a numeric value. The “treat” column is an experimental factor whilst “day” is simply a label (text). The “int” column is intended to be a treatment label. It is common in some branches of science to use plain numerical values as treatment labels.

When you read this file into R the columns are “interpreted” like so:

> df1 <- read.csv(“L11.txt”, row.names = 1)
> df1
     count treat day int
obs1    23    hi mon   1
obs2    17    hi tue   1
obs3    16    hi wed   1
obs4    14    lo mon   2
obs5    11    lo tue   2
obs6    19    lo wed   2

> str(df1) # Examine object structure
‘data.frame’:     6 obs. of  4 variables:
$ count: int  23 17 16 14 11 19
$ treat: Factor w/ 2 levels "hi","lo": 1 1 1 2 2 2
$ day  : Factor w/ 3 levels "mon","tue","wed": 1 2 3 1 2 3
$ int  : int  1 1 1 2 2 2

The file L11.txt now appears to have the column “day” converted to a factor variable and the “int” column is a number rather than a factor.

You can control the interpretation in several ways using parameters in the read.xxxx() commands:

stringsAsFactors – set this to FALSE to keep character columns as character variables
is – give the column numbers to keep “as is” and not convert
colClasses – give a vector of character strings corresponding to the classes required

The stringsAsFactors parameter is the most “simple” and is over-ridden if one of the others is used. It provides “blanket coverage” and operates on the entire data file being read:

> df2 <- read.csv("L11.txt", row.names = 1, stringsAsFactors = FALSE)

> str(df2)
‘data.frame’:     6 obs. of  4 variables:
$ count: int  23 17 16 14 11 19
$ treat: chr  "hi" "hi" "hi" "lo" ...
$ day  : chr  "mon" "tue" "wed" "mon" ...
$ int  : int  1 1 1 2 2 2

The as.is parameter operates only on the columns you specify – don’t forget that the column containing row names is column 1:

> df3 <- read.csv("L11.txt", row.names = 1, as.is = 4)

> str(df3)
‘data.frame’:     6 obs. of  4 variables:
$ count: int  23 17 16 14 11 19
$ treat: Factor w/ 2 levels "hi","lo": 1 1 1 2 2 2
$ day  : chr  "mon" "tue" "wed" "mon" ...
$ int  : int  1 1 1 2 2 2

All other parameters work as they did before so if you have a space or TAB separated file for example you can use the sep parameter as necessary. Try these files if you want some practice:

txt – a TAB separated file with row names
txt – a comma separated file with row names
txt – a space separated file with row names

Using the colClasses parameter gives you the finest control over the columns. You must specify (as a character vector) the required class for each column. In the example here you would want “count” to be an integer (or simply numeric), “treat” to be a factor, “day” to be character and “int” to be a factor (remember we said this represented an experimental treatment). Don’t forget the column of row names – use character for these:

## Define the column classes
> cc <- c("character", "integer", "factor", "character", "factor")

## Read the file and set column classes
> df4 <- read.csv("L11.txt", row.names = 1, colClasses = cc)

> str(df4)
‘data.frame’:     6 obs. of  4 variables:
$ count: int  23 17 16 14 11 19
$ treat: Factor w/ 2 levels "hi","lo": 1 1 1 2 2 2
$ day  : chr  "mon" "tue" "wed" "mon" ...
$ int  : Factor w/ 2 levels "1","2": 1 1 1 2 2 2

You can of course set the columns of the resulting data.frame afterwards but it makes sense to do this right at the start.

Reading R objects from disk

R objects can be stored on disk in one of two formats:

Binary – usually .RData files that are encoded and only readable by R
Text – plain text files that can be read by text editors (and R)

Generally speaking an R object is something you see when you use the ls() command, the objects() command does the same thing and the two are synonymous. Any R object can be saved to disk.

Usually text files will be “raw data” files and scripts (collections of R commands that are executed when the script is run). However, it is possible to have text versions of other R objects.

Binary files are generally used to keep custom commands, data and results objects. Often you’ll have several objects bundled together.

There are several ways to get these objects into R.

Use the OS to open R objects

In Windows and Mac you can usually open .RData and .R files directly from the operating system. Usually .RData files are associated with R so if you double click a file it will open R and load the objects it contains. There are two scenarios:

R is already running
R is not running

You get a different result in each case.

If R is not running

If R is not running and you double click a .RData file, R will open and read the objects contained in the file. In addition, the working directory will be set to the folder containing the .RData file.

So, if you double click the file or drag its icon to the R program icon (shortcut on desktop, taskbar or dock) then R will open. You can check the working directory, which should point to the folder containing the file you opened:

> getwd()
[1] "Y:/Dropbox/Source data files/R Source files"

In Windows if you drag the icon to the R program icon you can “pin” it to the taksbar, this can allow you to open a data file later with a right click of the taskbar icon:

Files that are stored in text format, usually with the .R extension, are handled differently. Since they are plain text the default program may well be a text editor, depending what is installed on your computer. You can right click a file or (in Windows) click the Open button on the menu bar to see what programs are available.

In Windows you will probably have to open the file with a text editor of some kind. If you have the RStudio IDE you can select to open the file in this, RStudio will run and your file will appear in the script window. Note that if R has saved a text representation of objects the file will have Unix-style EOL characters (LF), and Notepad will not display the data properly. Wordpad or Notepad++ are fine though.

In Mac you may have the file associated with a text editor or with R. If you open the file using R then the text appears in the script editor window.

If you use Linux you cannot open .RData files from the OS. R text files can be opened in a text editor. If you use RStudio IDE then .RData or .R files can be opened (in RStudio) from the OS.

So, .RData objects can be read into R directly via the OS but text representations of R objects (.R files) cannot.

If R is already running

If R is already running when you try to open an .RData file by double clicking, what happens depends on the OS.

In Windows R opens in a new window
In Mac R appends the data to the current workspace

In Windows you get another R window, the data in the .RData file is loaded and the working directory set to the folder containing the file, just as if R had not been open.

In Mac the data are appended to the current workspace and the working directory remains unchanged.

If you are using Windows and want to append the data to your current workspace then you must drag the file icon to the running R program window and drop it in. If you drag to the taskbar and wait a second, the program window will open and you can drop the file into the console.

Something similar happens on a Mac, you can drag a file to the dock icon or into the running R console window.

Get binary files with load() command

You can load .RData files directly from within R using the load() command. You need to supply the name of the file (in quotes) although in Windows or Mac you can use file.choose() instead. Any filenames must include the full file path if the file is not in your working directory.

> ls()
character(0)

> load(file = "Obj1.RData")
> ls()
[1] "data_item_1" "data_item_2"

In Windows and Mac you can also use a menu:

Mac: Workspace > Load Workspace File…
Win: File > Load Workspace…

The data are appended to anything in the current workspace and any items that are named the same as incoming items are overwritten without warning. The working directory is unaffected.

If you want a practice file, try the Obj1.RData file. You’ll have to download it, although you could try loading the file using the URL.

Get text files with dget() or source() command

R can make text representations of objects. Generally, there are two forms:

Un-named objects
Named objects

You’ll need to use the dget() or source() commands to read these text files into R.

Un-named R objects as text

Un-named objects are created using the dput() command and are text representations of single objects. The text can look slightly odd, see the D2.R example:

structure(c(1L, 1L, 1L, 2L, 2L, 2L), .Label = c("high", "low" ), class = "factor")

To read one of these objects into R you need the dget() command. Since the object is un-named in the text file you’ll need to assign a name in the workspace:

> ls() # Check what is in workspace
character(0)

> fac1 = dget(file = "D2.R") # Get object

> fac1
[1] high high high low  low  low
Levels: high low

Note that there is no GUI menu command that will read this kind of text object, you have to use dget(). Other text representations look more like regular R commands (i.e. the objects are named), you need a different approach for these.

Named R objects as text

Some R text objects look more like regular R commands. These can be saved to disk using the dump() command, which can save more than one object at a time (unlike the dput() command). An example is the D1.R file:

dat1 <- c(2, 3, 4, 5, 7, 9)
dat2 <- c("First", "Second", "Third")

The D1.R file was created using the dump() command to save two objects (dat1 and dat2).

To get these into R you need the source() command:

> ls() # Check what is in workspace
character(0)

> source(file = "D1.R") # Get the objects

> ls()
[1] "dat1" "dat2"

> dat1 ; dat2
[1] 2 3 4 5 7 9
[1] "First"  "Second" "Third"

In the preceding example the file was named explicitly but in Windows or Mac you can use file.choose() instead (also in Linux if you are running the RStudio IDE). Notice that you do not need to name the objects, the data file already has assigned names.

It is possible to use menu commands in Windows or Mac to get this kind of file into R:

Win: File > Source R code…
Mac: File > Source File…

This kind of text file is more like a series of R commands, which would generally be called an R script. Such script files are very useful, as you can use them for many purposes, such as making a custom calculation that can be run whenever you want.

R script files

R script files are simply a series of R commands saved in a plain text file. When you use the source() command, the commands are executed as if you had typed them into the R console directly.

Generally speaking you’ll want to “look over” the script before you it. This usually means opening the file in a text editor. Some text editors “know” the R syntax and can highlight the text accordingly, making the script easier to follow. Both Windows and Mac R GUI have a script editor but only the Mac version has syntax highlighting.

To open a script file in the R GUI use the File menu:

Win: File > Open script…
Mac: File > Open Document…

You can open a script in any text editor of course. In Windows the Notepad++ program is a simple editor with syntax highlighting. On the Mac the BBEdit program is highly recommended. On Linux there are many options, the default text editor will often support syntax highlighting, Geany is one IDE that not only has syntax highlighting but integrates with the terminal.

The RStudio IDE is very capable and makes a good platform for using R for any OS. The script editor has syntax highlighting.

If you start to get serious about R coding, then the Sublime Text editor is worth a look. This has versions for all OS and syntax highlighting for R and many other languages.

Reading other types of file

R cannot read other file types “out of the box” but there are command packages that will read other files. Two commonly used ones are foreign and xlsx.

Package foreign

The foreign package can read files created by several programs including (but not limited to):

SPSS – files made by save or export can be read
dBase – .dbf files, which are also used by other programs in the XBase family
MiniTab – .mtp MiniTab portable worksheets
Stata – .dta files up to version 12
Systat – .sys or .syd files
SAS – XPORT format files (there is also a command to convert SAS files to XPORT format)

There are various limitations, the package has not been updated for a while. In general, it is best to save data in CSV format, which most programs can do. If you get sent a file in an odd format, then you can ask the sender to convert it or try the package and see what you can do. The index file of the package help should give you a fair indication of what is available.

Package xlsx

The xlsx package reads Excel files. The main command is read.xlsx(), which has various options for Excel formatted files. There is also a read.xlsx2() command, this is more or less the same but uses Java more heavily and is probably better for large spreadsheets.

The essentials of the command are:

read.xlsx(file, sheetIndex, sheetName = NULL)

You provide the filename (file) and either a number corresponding to a worksheet (sheetIndex) or the name of the worksheet (sheetName = “worksheet_name”). You can use file.choose() to select the file interactively (except on Linux).

There is also a parameter colClasses, which you can use to force various columns into the class you require. You give a character vector containing the classes you want – they are recycled as necessary. In general, you should try a regular read and see what classes result. If there are some “oddities” you can either alter them in the imported object or re-import and set the classes explicitly.

4th August 2019 aJfsfjlser3f The 3 Rs: Reading, wRiting and aRithmetic

Previous Next