Dr. Mark Gardener |
||||||||||||||||||||||||||||||||||||||||||||
Data Analysis | Publications | Courses | About | |||||||||||||||||||||||||||||||||||||||||
On this page... |
Using R for statistical analyses - IntroductionThis page is intended to be a help in getting to grips with the powerful statistical program called R. It is not intended as a course in statistics (see here for details about those). If you have an analysis to perform I hope that you will be able to find the commands you need here and copy/paste them into R to get going. On this page learn how to create data files, read them into R and generally get ready to perform analyses. Also find out about getting further help and documentation. What is R? | Topic Navigation Index| R Tips, Tricks and Hints | MonogRaphs | Go to 1st Topic I run courses in using R; these may be held at various locations:
If you are interested then see our Courses page or contact us for details. My publications about R and Data Science |
|||||||||||||||||||||||||||||||||||||||||||
See my books about R and Data Science on my Publications page | ||||||||||||||||||||||||||||||||||||||||||||
I have more projects in hand - visit my Publications page from time to time. You might also like my random essays on selected R topics in MonogRaphs. See also my Writer's Bloc page, details about my latest writing project including R scripts developed for the book. |
||||||||||||||||||||||||||||||||||||||||||||
R is Open Source See my books about R on the Publications page |
What is R?R is an open-source (GPL) statistical environment modeled after S and S-Plus. The S language was developed in the late 1980s at AT&T labs. The R project was started by Robert Gentleman and Ross Ihaka (hence the name, R) of the Statistics Department of the University of Auckland in 1995. It has quickly gained a widespread audience. It is currently maintained by the R core-development team, a hard-working, international team of volunteer developers. The R project web page is the main site for information on R. At this site are directions for obtaining the software, accompanying packages and other sources of documentation. R is a powerful statistical program but it is first and foremost a programming language. Many routines have been written for R by people all over the world and made freely available from the R project website as "packages". However, the basic installation (for Linux, Windows or Mac) contains a powerful set of tools for most purposes. Because R is a programming language it can seem a bit daunting; you have to type in commands to get it to work. However, it does have a Graphical User Interface (GUI) to make things easier. You can also copy and paste text from other applications into it (e.g. word processors, but beware of "smart quotes"). So, if you have a library of these commands it is easy to pop in the ones you need for the task at hand. That is the purpose of this web page; to provide a library of basic commands that the user can copy and paste into R to perform a variety of statistical analyses. |
|||||||||||||||||||||||||||||||||||||||||||
Navigation index |
||||||||||||||||||||||||||||||||||||||||||||
R maintains a list of previous commands. Use the up and down arrows to scroll through them. You can then use the left and right arrows to edit and modify the command. |
IntroductionOnce you have installed R and run the program you will see an opening window and a message along these lines: R : Copyright 2006, The R Foundation for Statistical Computing R is free software and comes with ABSOLUTELY NO WARRANTY. Natural language support but running in an English locale R is a collaborative project with many contributors. Type 'demo()' for some demos, 'help()' for on-line help,
or [Previously saved workspace restored] > The > is the "prompt", this is the point where you type in commands (or paste them in from somewhere else). The window you see is part of the GUI and some operations are possible from the menus (including quit). You will generally be asked if you wish to save the workspace. R stores a list of commands and any data sets that are loaded. It can be pretty useful to say "yes" and to save the workspace. The command history is available by using the up and down arrows. You can easily scroll back through previous commands and edit them if needed. You can copy items from previous commands or in fact from any window on the screen and paste them into the current command line. You can also use the left and right arrow keys to move through the current command. |
|||||||||||||||||||||||||||||||||||||||||||
Data filesYou are going to need some data to perform your analyses on. You can type your data into R directly but it is usually much better to use a separate program to hold the information. A spreadsheet is an invaluable tool for this as you can manipulate the data quite easily. R can read plain text files in various formats (e.g. tab delimited, space delimited, comma delimited) and most spreadsheets can save data in these ways. The most useful is comma delimited (.CSV), which R can handle quite easily. The layout of the data file will depend upon the analysis you are going to run: |
||||||||||||||||||||||||||||||||||||||||||||
You can create a CSV file in a spreadsheet
or a word processor. A spreadsheet is the most useful tool as you can easily
manipulate the data later on. |
In this case you have multiple variables arranged in columns. The rows are the replicates. This sort of arrangement is useful for analysis of variance and multiple regression. However, it can also be used for comparing just two factors (you don't need to use all the information) as in a t-test. |
|
||||||||||||||||||||||||||||||||||||||||||
In
this case you have heading on both columns and rows. You have the same information
as above and a bit extra. The data may be used for the same kinds of analysis
as before but could also be used for tests of association (e.g. Chi-squared)
or for ordination. |
|
|||||||||||||||||||||||||||||||||||||||||||
In
this instance you have two columns (samples) but the number of replicates
is different. R reads the file as a rectangular frame and blank cells are
recorded as NA. This may have to be taken account of in some analyses
but for now we can assume it is not a problem. |
|
|||||||||||||||||||||||||||||||||||||||||||
You
may also have data merely as numbers without any labels at all. This
is not really to be recommended although R will assign row and column numbers
to the data.
|
||||||||||||||||||||||||||||||||||||||||||||
R stores everything as variables. Your variable names can contain letters and numbers but the only puctuation mark allowed is a full stop. |
Inputting dataThe next step is to get your data into R. If you have saved your data in a .CSV file then you can use the read.csv(filename) command to get the information. You need to tell R where to store the data and to do this you assign it a name. All names must have at least one letter (otherwize it is a number of course!). You can use a period (e.g. test.1) but no other punctuation marks. R is case sensitive so the variable test is different from Test or teSt. What you need to do is to copy the appropriate command into the clipboard. Then paste into R at the > prompt. You can then edit the command as you like and when ready press the enter key.
To get a file into R with basic columns of data and their
labels use: To
get a file into R with column headings and row headings use: N.B. There are occasions when R won't like your data file. Check the file carefully. In some cases the addition of an extra linefeed at the end will sort out the issue. To do this open the file in a word processor and make sure that non-printing characters are displayed. Add the extra carriage return and save the file. |
|||||||||||||||||||||||||||||||||||||||||||
Seeing your data in ROnce you have persuaded R to read your data you will naturally want to check it is there! To view data stored in R you merely type the name of the variable that you stored it as. |
||||||||||||||||||||||||||||||||||||||||||||
In
the case on the right you had both row and column headers. When you type in
the variable name you see the data framed more or less like this. |
|
|||||||||||||||||||||||||||||||||||||||||||
In this case you only had column headings. When displayed R adds a simple number to each row. If you had neither row or column headings then the columns would also be numbered (in square brackets). |
|
|||||||||||||||||||||||||||||||||||||||||||
If you wish to view only a single variable (i.e. column) from your data set then you can. Simply add the variable name to the end of the data name along with a dollar sign so: bats$Hedge or field$Upper might be examples from the above two data sets. It is not terribly convenient to have to append the $variable every time you want to do something on a data set. R provides a way to read these variables directly. Here is an example: |
||||||||||||||||||||||||||||||||||||||||||||
Now you can look at the overall data set e.g. > field You can look at a single factor e.g. > Upper |
||||||||||||||||||||||||||||||||||||||||||||
So, it is a good habit to get into to read in your data set and then use the attach(data) function immediately. Use meaningful factor names and avoid single letters (e.g. x, y). If you already have a variable called by the same name it will be overwritten. You can avoid confusion by only working on one set of data at once. |
||||||||||||||||||||||||||||||||||||||||||||
What data are loaded?To see what data, variables etc. are loaded in R you can type a simple command: > ls() This lists the variables in memory. In
Windows you can list all the "objects" in memory from the Misc menu on the
GUI toolbar. In both operating systems you can save the current workspace to a file (you can also read in a previously saved workspace). This will save any data and variables currently in memory (on Windows use the File menu and on the Mac use Workspace). You can also get a list of the variables for each dataset by typing: > names(dataset) |
||||||||||||||||||||||||||||||||||||||||||||
Removing data setsTo remove a variable you can type a simple command: > rm(variable) This will remove the variable (in this case called variable) from the memory. If you have variables that are attached to your data they don't show up. You can do the opposite of attach(data) and detach(data), which removes them if and when the data are removed with rm(data). In
Windows you can remove all the "objects" in memory from the
Misc menu on the GUI toolbar. |
||||||||||||||||||||||||||||||||||||||||||||
|
Help and DocumentationMy PublicationsSee my books about R at my Publications Page: Statistics for Ecologists using R and Excel. Published December 2011 Beginning R: The Statistical Programming Language. Available wherever great books are sold in June 2012 The Essential R Reference. Published November 2012 DocumentsThere are plenty of sources of help and information regarding R. Most are to be found on the R-Project website. Look under the 'Documentation' section. In the manuals section the "Introduction to R" document is a good start (available as HTML or a PDF). Also very good are: “Using R for Data Analysis and Graphics - Introduction,
Examples and Commentary” by John Maindonald [PDF]. These are available via the 'Contributed Documentation' section. Courses I run courses in using R as well as basic statistics and data management – check the Courses page for more details. Help within RThe help system within R is comprehensive. There are several ways to access help: Click on the 'Help' menu. There are a number of options available (depending upon your OS) but the main documentation is in the form of HTML. If you want help on a specific command you can enter a search directly from the keyboard: > help(keyword) A shortcut is to type: > ?keyword This is fine if you know the command you want. If you are not sure of the command you can try the following: > apropos("part.word") You type in a part.word and R will list all commands that contain that string of letters. For example: > apropos("rank") This shows that there are actually 6 commands containing "rank"; we can now type help() for any of those to get more detail. If you run the HTML help you will see a heading entitled "Packages". This will list the packages that you have installed with R. The basic package is 'base' and comes with another called 'stats'. These two form the core of R. If you navigate to one of those you can browse through all the commands available. R comes with a number of data sets. Many of the help topics come with examples. Usually these involve data sets that are already included. This allows you to copy and paste the commands into the console and see what happens. |
|||||||||||||||||||||||||||||||||||||||||||
Back to Data Analysis page | R Tips & Tricks | MonogRaphs | Forward to More about data | ||||||||||||||||||||||||||||||||||||||||||||