Manipulating an R formula

Manipulating an R formula

The formula() is an integral part of the R language. You use a formula to specify relationships between variables for tasks such as graphics and analysis (e.g. linear modelling). Manipulating an R formula is a useful skill.

When you’ve created some kind of analysis model in R you will have specified the variables in some kind of formula. R “recognises” formula objects, which have their own class "formula".  If, for example you used the lm() command to create a regression result you will be able to extract the formula from the result.

mod <- lm(Fertility ~ ., data = swiss)
formula(mod)
Fertility ~ Agriculture + Examination + Education + Catholic + Infant.Mortality

It can be useful to be able to extract the components of the model formula. For example you may want to examine how the R2 value alters as you add variables to the model.

Extract the predictor variables

To access the parts of a formula you need the terms() command:

terms(formula(mod))

The result contains various components; you want the term.labels.

attr(terms(formula(mod)), which = "term.labels")
[1] "Agriculture" "Examination" "Education" "Catholic"
[5] "Infant.Mortality"

You now have the variables, that is the predictor variables, from the formula. The next step is to get the response variable.

Extract the response variable

The response variable can be seen using the terms() command and the variables component, like so:

attr(terms(formula(mod)), which = "variables")
list(Fertility, Agriculture, Examination, Education, Catholic, Infant.Mortality)

The result looks slightly odd but essentially it is a list and the 2nd component is the response.

vv <- attr(terms(formula(mod)), which = "variables")
rr <- as.character(vv[[2]]) # The response variable name
rr
[1] "Fertility"

Now you have the response variable, and the predictors from earlier, which you can use to “build” a formula.

Building a formula

In its most basic sense a formula is simply a character string that “conforms” to the formula syntax: y ~ x + z for example. You can build a formula with the paste() command by joining the response, a ~ character and the predictors you want (these themselves separated by + characters).

The following example uses the swiss dataset, which is built into base R.

mod <- lm(Fertility ~ ., data = swiss)

# Get the (predictor) variables
vars <- attr(terms(formula(mod)), which = "term.labels")

# Get the response
vv <- attr(terms(formula(mod)), which = "variables")
rr <- as.character(vv[[2]]) # The response variable name

# Now the predictors
pp <- paste(vars, collapse = " + ")       # All
pp <- paste(vars[1], collapse = " + ")    # 1st
pp <- paste(vars[1:3], collapse = " + ")  # 1,2,3

# Build a formula
fml <- paste(rr, " ~ ", pp)
fml
[1] "Fertility ~ Agriculture + Examination + Education + Catholic + Infant.Mortality"

Once you have your formula as a character object you can use it in place of a regular formula in commands.

Using a “built” formula

The character string representing a formula can be used exactly as you would a “regular” formula:

lm(fml, data = swiss)
Call:
lm(formula = fml, data = swiss)

Coefficients:
     (Intercept)       Agriculture       Examination         Education  
         66.9152           -0.1721           -0.2580           -0.8709  
        Catholic  Infant.Mortality  
          0.1041            1.0770

One use for building a formula is in model testing. For example you create your regression model containing five predictors but maybe only the first three are really necessary. You can re-build the formula term by term and extract the R2 value for example. This would show you how the explained variance alters as you add more variables.

See more articles in our Tips and Tricks pages.

Add more to a histogram in R

Add more to a histogram in R

A histogram is a standard way to present the distribution of a sample of numbers. A basic histogram is useful but it is easy to add more to a histogram in R to produce an even more informative plot. In this article you’ll see how to add a rug and a strip-chart to a basic historgram using simple R commands.

A basic histogram

It is easy to make a histogram using R with the hist() command. For example:

set.seed(123)
x <- norm(n = 50, mean = 10, sd = 1)
hist(x, col = "skyblue")

Produces a histogram resembling this:

Add a rug

A rug plot can be added to more or less any graphic. The rug() command can add the rug to any side of the plot:

  • side = 1 is the bottom axis
  • side = 2 is the left axis

You can alter the colour and width of the rug lines using regular graphical parameters:

rug(x, side = 1, col = "blue")

Adds the rug like so:

Add a strip chart

A strip chart can also be added to any chart via the stripchart() command. However, you also need to specify add = TRUE to the command. Giving a bit of jitter helps to separate out points that are coincident:

stripchart(x,
           method = "jitter",
           pch = 23,
           bg = "pink",
           add = TRUE)

The final plot looks like so:

There are many additional options for the stripchart() command; use help(stripchart) in R to find out more. Look out for other articles in our Tips & Tricks pages.

Progress indicator for R console

A progress indicator for the R console

Sometimes you know you are in for a bit of a wait when you’re running some R code. However, it would be nice to get some idea of how long you’ll be waiting. A progress indicator for the R console is what you need.

The txtProgressBar() command can help you here. The command allows you to set-up a progress indicator that displays in the R console and shows your progress towards the “end point”.

txtProgressBar(min, max, style)

You set the starting and ending points via the min and max parameters, usually these will match up with the “counter” in a loop. The style parameter is a simple value, style = 3 shows a series of = characters marching towards the % completion (displayed at the end of a line in your console).

The setTxtProgressBar() command actually updates the progress indicator and displays in the console. When you are done you close()the progress bar to tidy up.

pb <- txtProgressBar(min = 0, max = 100, style = 3)
  for(i in 1:100) {
    Sys.sleep(0.1)
    setTxtProgressBar(pb, i)
    }
close(pb)

The first line sets up the progress indicator. The for() command sets a loop to run from 0 to 100. The loop in this example is trivial with the Sys.sleep() command simply making R “wait” 0.1 seconds; in your own code you would replace this line with your own routines. The 4th line updates the progress indicator. After the loop has ended the close() command “resets” the progress bar.

Data Analytics Articles

These articles cover a range of topics in data science, including Using R, Using Excel, quantitative data analysis, predictive data analysis and a lot more besides.

You can browse the articles or use the search facility to look for specific topics and keywords.

Learning to work with R

Learning to work with R

A question I often see in online help forums is along these lines, “I want to become an expert in R, please help”. Such messages are usually followed by replies containing plenty of links to online materials or helpful books. I have posted such replies myself. Whilst this kind of reply is undoubtedly useful it doesn’t really address the issue of how someone sets about learning to work with R.

This made me think about how I set about learning to work with R. I wondered if I could set out the key elements of the learning experience – this essay is my attempt.

  • Key learning elements
    • Look around
    • Be active
    • Solve your own problems
    • Keep a diary
    • Be persistent

This article first appeared on the Amazon tech.book store in early April 2013.

Key learning elements

The best learning experiences are practical and interactive. You learn fastest when you do something for yourself. You learn slowest when someone tells you what to do. Somewhere in-between is when someone shows you what to do.

You can split the processes involved into several inter-related parts:

  • Look around – try different resources
  • Be active – get your fingers on the keyboard
  • Solve your own problems – using your own data helps your understanding
  • Keep a diary – write down what you learn
  • Be persistent – practice makes perfect

Look around

There are many resources available to help you learn R; I have written several myself (see my website). A good start is the R website at http://www.r-project.org/ where you can access many excellent documents. Try downloading some of the contributed documents, you will find that some are more helpful to you than others. Different authors have different approaches so look around and see what suits you.

There are many online forums and articles regarding R; there are Google groups, LinkedIn groups and many others besides. Have a search and see what is going on in these. Two general sites are R-bloggers and stackoverflow.

There are also many books, including my own modest contributions (see my Amazon profile). You can easily see what is available with a quick search in Amazon. Many of the books are available in electronic format, which might suit you better than a paper book. Many Amazon books have the “search inside” feature, which can help you see something of the contents.

Be active

You learn best when you are doing something active for yourself. This can be as simple as copying the examples in an R textbook. As you read you should pause and copy the examples by typing for yourself. This helps you to consolidate your learning and makes it “stick”. If there are example exercises, then try them out; similarly, if there are summary questions then have a go and test yourself.

Solve your own problems

This comes vaguely into the previous category in that it is an active process. The point is to use your own data or at least data from your own field of study/work. Solving problems from any field is useful but if you can apply what you’ve learnt to your own area of expertise you will learn quicker.

Keep a diary

A really important element is being able to move forwards. If you keep forgetting details you will have to continually refer back to the book or resource that you’ve been using. Keeping your own “learning diary” can help enormously. I suggest a simple text document rather than a full-blown word processor. This is because R does not recognize “smart” quotes, so if you need to copy/paste you may run into problems.

As you learn new things you should try to write them into your learning diary right away. My own learning diary is still in use! You will soon find the best way to organize your learning diary – perhaps you might think of it as your own help file. I tend to have simple headings that categorize the things I want to recall, such as graphics, entering data and manipulating data. I might add the name of a command that I have just learnt and then include a few brief notes. Having an example that you can copy and paste into R is helpful.

At the beginning you’ll undoubtedly refer to your learning diary often. As you learn more you will find that you refer to it less often. I find that I add new stuff, to which I refer, whilst the older material is read less often.

Your learning diary could be in the form of an R script. This is just a plain text document saved with the .R extension to the filename. If you use Windows you can use the Filemenu to open a new or existing script (in Mac use the Filemenu to open or create a new document).

A script window is useful to keep snippets of code, examples and your learning diary. You can transfer command lines into R from the script window using Ctrl+R (on Mac use Cmd+Enter) and transfer material into the script from the R console using copy/paste.

You can easily arrange it so that you can see the R console and the script window side by side, allowing you to easily keep track of what you are doing.

Be persistent

There are two sayings that spring to mind: “Rome wasn’t built in a day” and “Use it or lose it”. You cannot expect to become perfect overnight and you cannot expect to remember how to carry out any particular analysis a year down the line without having practiced in the intervening period.

The point is that you need to keep using R to be really proficient with it. I used to use R for various statistical analyses and not for much else. These days I use it for trivial calculations (it is after all, a glorified calculator) and even to choose my lottery numbers!

Author Bio

Mark is an ecologist, lecturer and writer working in the UK and with a passion for learning new things. He is currently self-employed and runs courses in ecology, data analysis and R for a variety of organizations.

Plot two (overlapping) histograms on one chart in R

I was preparing some teaching material recently and wanted to show how two samples distributions overlapped. This meant I needed to work out how to plot two histograms on one axis and also to make the colors transparent, so that they could both be discerned.

For my teaching example I wanted to make some normally distributed data and show how the overlap changes as the means and variance of the samples alters. However, being able to plot two sample distributions on a single chart is a generally useful thing so I wrote some code to take two samples and do just that.

Transparent colors

The first step is to make transparent colors; then any overlapping bars will remain visible. The key command is rgb() but you need to get R G and B values first.

Pick a color

Select a color that you want to make transparent. Then use the col2rgb() command to get the red, green and blue values you need for the rgb() command e.g.:

> col2rgb("lightblue")
      [,1]
red    173
green  216
blue   230

This gives you a matrix with three rows (red, blue, green). This means you can get values for several colors at once:

> col2rgb(c("lightblue", "lightgreen", "pink"))
      [,1] [,2] [,3]
red    173  144  255
green  216  238  192
blue   230  144  203

Make your chosen color transparent

The rgb() command defines a color: you define a new color using numerical values (0–255) for red, green and blue. In addition, you set an alpha value (also 0–255), which sets the transparency (0 being fully transparent and 255 being “solid”).

You also need to set the maximum color value, so that the command can relate your alpha value to a level of transparency. In practice setting max = 255 works well (since RGB colors are usually defined in the range 0–255).

The following example takes the standard blue and makes it transparent (~50%):

> mycol <- rgb(0, 0, 255, max = 255, alpha = 125, names = "blue50")

Note that the names parameter sets a name attribute for your color. You cannot use the name directly but it can be useful to see a name. You can call your colors anything of course, here they are simply named c1 and c2:

> c1 <- rgb(173,216,230,max = 255, alpha = 80, names = "lt.blue")
> c2 <- rgb(255,192,203, max = 255, alpha = 80, names = "lt.pink")

Make your histograms

The hist() command makes a histogram. Here is an example using some defaults.

> A # a numeric vector
[1] 17 26 28 27 29 28 25 26 34 32 23 29 24 21 26 31 31 22 26 19 36 23 21 16 30

> hist(A, col = "lightblue")

The defaults set the breakpoints and define the limits of the x-axis too. The histogram is plotted by default but you can alter this and save the histogram to a named object, which is going to be useful.

Histogram breakpoints

The breakpoints are set using the breaks parameter. There are 3 main options:

  • A character string giving one of the in-built algorithms: “Sturges”, “Scott” or “FD” (“Freedman-Diaconis”).
  • A number giving the desired number of breaks (you can also give a formula that produces a single number).
  • A numerical vector giving the explicit breakpoints (or a formula that results in a numeric vector).

For example:

> hist(A, col = "lightblue", breaks = 10, main = "Breaks = 10")

The previous example used a set number of breakpoints. You can set explicit values too (which also means you can have unequal bar widths!):

> range(A)
[1] 16 36

> pretty(15:36, n = 12)
[1] 14 16 18 20 22 24 26 28 30 32 34 36

> hist(A, breaks = pretty(15:36, n = 12), col = "lightblue", main = "Breaks = pretty(15:36, n = 12)")

Note that the second breakpoint is the right edge of the first histogram bar. In the previous example the pretty() command was used to set the breaks. This command splits up a range of values into a tidy set of values, and is generally used internally by graphics commands to set axes.

Histogram x-axis width

The limits of the x-axis are set by the breakpoints but you can over-ride them as you need. There are two ways you can control the width, either way will permit you to make the space for two histograms on the one axis:

  • Use the xlim parameter: you can set the axis width to cover the range of the combined samples.
  • Use the breaks parameter: you can set the breaks to cover the range of the combined sample.

Using xlim to set axis limits

The xlim parameter allows you to specify the limits of the x-axis by giving a vector of two values, the start and end.

> A
[1] 17 26 28 27 29 28 25 26 34 32 23 29 24 21 26 31 31 22 26 19 36 23 21 16 30

> B
[1] 27 36 38 37 39 38 35 36 44 42 33 39 34 31 36 41 41 32 36 29 46 33 31 26 40

> range(c(A,B))
[1] 16 46

Unfortunately, simply using the range of the combined samples is not always sufficient!

> hist(B, xlim = range(c(A,B)), col = "lightblue")

In the previous example you can see that the x-axis is not quite large enough to accommodate the entire range of the histogram. If you save the histogram to a named object you can see the data:

> hgB <- hist(B, plot = FALSE)

> hgB
$breaks
[1] 25 30 35 40 45 50

$counts
[1] 3 7 10 4 1

$density
[1] 0.024 0.056 0.080 0.032 0.008

$mids
[1] 27.5 32.5 37.5 42.5 47.5

$xname
[1] "B"

$equidist
[1] TRUE

attr(,"class")
[1] "histogram"

So, if you want to use xlim to set the axis limits you should use the histogram $breaks data, rather than the original sample data.

Using pretty() to set axis limits

The pretty() command is useful to set your x-axis limits because it moves the breakpoints about and makes tidy intervals. For example:

> range(c(A,B))
[1] 16 46

> pretty(16:46)
[1] 15 20 25 30 35 40 45 50

If you used this method your x-axis would encompass the entire histogram range.

You can set the “desired” number of breaks in the pretty() command:

> pretty(16:46)
[1] 15 20 25 30 35 40 45 50

> pretty(16:46, n = 10)
[1] 15 20 25 30 35 40 45 50

> pretty(16:46, n = 12)
[1] 16 18 20 22 24 26 28 30 32 34 36 38 40 42 44 46

You set n = your desired optimal number and the command does its best to create approximately that number of intervals.

If you subtract a tiny value from the minimum value you’ll be certain to encompass the entire dataset:

> b <- min(c(A,B)) – 0.001
> e <- max(c(A,B))

> b;e
[1] 15.999
[1] 46

> pretty(b:e, n = 12)
[1] 14 16 18 20 22 24 26 28 30 32 34 36 38 40 42 44 46

Don’t try to set the xlim parameter with the pretty() values, use them as explicit breakpoints:

> hist(B, col = "lightblue", breaks = pretty(b:e, n = 12))

Using the pretty() command has an additional benefit: the interval will be the same for both histograms so that when plotted the bars will be the same width.

Histogram chart data

In order to plot two histograms on one plot you need a way to add the second sample to an existing plot. You cannot do this directly via the hist() command.

You need to save your histogram as a named object without plotting it. To do this you specify plot = FALSE as a parameter. If you save the histogram to a named object you can plot it later. Actually you can save the histogram data and plot it at the same time but you cannot add to an existing plot in this way.

> A
[1] 17 26 28 27 29 28 25 26 34 32 23 29 24 21 26 31 31 22 26 19 36 23 21 16 30

> hgA <- hist(A)
> hgA
$breaks
[1] 15 20 25 30 35 40

$counts
[1] 3 7 10 4 1

$density
[1] 0.024 0.056 0.080 0.032 0.008

$mids
[1] 17.5 22.5 27.5 32.5 37.5

$xname
[1] "A"

$equidist
[1] TRUE

attr(,"class")
[1] "histogram"

You can see that the data are stored in $ components and that you can access the frequency or density data. The breakpoints are set at this time and you cannot alter them unless you re-run the command and specify different values.

In order to plot a histogram object you simply use plot(). You can specify add = TRUE to plot a second histogram in the same plot window.

Plot two histograms

If you have a histogram object, all the data you need is contained in that object. Using plot() will simply plot the histogram as if you’d typed hist() from the start. However, you can now use add = TRUE as a parameter, which allows a second histogram to be plotted on the same chart/axis.

To make sure that both histograms fit on the same x-axis you’ll need to specify the appropriate xlim() command to set the x-axis limits. Alternatively, (and probably better) is to set the breakpoints for both histograms to cover the combined range of the samples.

Inevitably some bars will overlap, which is where the transparent colors come in useful. The following steps illustrate the process using the data examples you’ve already seen.

> A;B # The data
[1] 17 26 28 27 29 28 25 26 34 32 23 29 24 21 26 31 31 22 26 19 36 23 21 16 30
[1] 27 36 38 37 39 38 35 36 44 42 33 39 34 31 36 41 41 32 36 29 46 33 31 26 40

> b <- min(c(A,B)) – 0.001 # Set the minimum for the breakpoints
> e <- max(c(A,B)) # Set the maximum for the breakpoints
> ax <- pretty(b:e, n = 12) # Make a neat vector for the breakpoints
> ax
[1] 14 16 18 20 22 24 26 28 30 32 34 36 38 40 42 44 46

> hgA <- hist(A, breaks = ax, plot = FALSE) # Save first histogram data
> hgB <- hist(B, breaks = ax, plot = FALSE) # Save 2nd histogram data

> plot(hgA, col = c1) # Plot 1st histogram using a transparent color
> plot(hgB, col = c2, add = TRUE) # Add 2nd histogram using different color

The result looks something like the following:

In this example the y-axis is sufficient to cover both samples but if your data contain quite different frequencies you can use the ylim parameter to set the appropriate size for the y-axis.

If you want to plot the densities instead of the frequencies you can use freq = FALSE as you would when using the hist() command. This means you could also add the density lines to your plots as well as the histograms. Note that you cannot set the breaks in this manner.

Histograms with different breakpoints

If your histograms have different breakpoints, you’ll need to juggle the xlim parameter to get the right size for the x-axis. The ylim parameter may also need tweaking if frequencies are different.

> hgA <- hist(A, breaks = 12)
> hgB <- hist(B)

> range(c(hgA$breaks, hgB$breaks)) # Get range for x-axis
[1] 16 50

> max(c(hgA$count, hgB$count)) # Get range for y-axis
[1] 10

> plot(hgA, col = c1, xlim = c(16, 50), ylim = c(0,10))
> plot(hgB, add = TRUE, col = c2)

In the previous example both xlim and ylim parameters needed to be altered. Note that although the xlim parameter set the minimum to 16, the axis ended up with a minimum of 15. This is because the plot() command has used pretty() internally to “neaten” the axis intervals.

You only need to alter the xlim and ylim parameters for the first plot because the plot dimensions are already set by the time you add the second histogram.

Combine multiple columns as row names in R

Sometimes your data contains more than one column that you want to use as labels. You may want “combination” labels for graphs or perhaps to use as row names.

This is where the paste() command comes in useful, as it allows you to combine items into a new item.

Use paste() to combine elements

The paste() command allows you to combine several items and reform them into a single item. There are many uses for this command such as making labels for plots or row names for a data object. Look at these example data for example (you can get/view the datafile here):

Colour    Coat Obs Ratio
 Green  Smooth 116     9
 Green Wrinkle  40     3
Yellow  Smooth  31     3
Yellow Wrinkle  13     1

The data shows some phenotype data. There are four varieties of pea plants with two main characteristics, the colour and the smoothness. The Obs column shows the observed frequency of plants in an experiment. The Ratio column shows the expected ratio of the four phenotypes under standard genetic theory.

It would be helpful to have the Colour and Coat combined into one and used as the row names. You can use the paste() command to achieve this.

The paste() command in its simplest form requires the names of the objects to combine and the separator (a text character).

Start by getting the data from the .txt file:

> peas <- read.table(file.choose(), header = TRUE, sep = "\t")
> peas
  Colour    Coat Obs Ratio
1  Green  Smooth 116     9
2  Green Wrinkle  40     3
3 Yellow  Smooth  31     3
4 Yellow Wrinkle  13     1

Then you need to combine the Colour and Coat columns, the separator will be a colon “:”.

> rownames(peas) <- with(peas, paste(Colour, Coat, sep = ":"))

Now you’ve got the rownames sorted you can remove the original Coat and Colour columns:

> peas <- peas[, -1:-2]
> peas
               Obs Ratio
Green:Smooth   116     9
Green:Wrinkle   40     3
Yellow:Smooth   31     3
Yellow:Wrinkle  13     1

Now you have a combination of coat and colour as a single element.

Read column names as numbers when importing a data file to R

When you read in a data file from CSV or some other text format, R will examine the column headings and ensure that they are in correct “R-format”. If your headings are all numbers, for example years, then R will convert them to character strings and prepend “X” to each heading.

You can overcome this behaviour and force R to read the headings as they come. Your headings are still converted to character strings but these are easier to coerce to a numeric value.

Use check.names = FALSE

Look at these data (you can view/get the datafile here):

Spp     1996 1997 1998 1999 2000
M.bro     88   47   13   33   86
Or.tip    90   14   36   24   47
Paint.l   50    0    0    0    4
Pea       48  110   85   54   65
Red.ad     6    3    8   10   15
Ring     190   80   96  179  145

The first column contains species names and the rest are numbers. The column names are mostly years (the sampling year). If you read these data into R the column names will be prepended with “X”, which is not helpful.

If you add check.names = FALSE as a parameter to read.xxxx() you’ll force R to accept the names of the columns as they are.

> bfy <- read.table(file.choose(), header = TRUE, sep = "\t", check.names = FALSE, row.names = 1)

> bfy
        1996 1997 1998 1999 2000
M.bro     88   47   13   33   86
Or.tip    90   14   36   24   47
Paint.l   50    0    0    0    4
Pea       48  110   85   54   65
Red.ad     6    3    8   10   15
Ring     190   80   96  179  145

> as.numeric(colnames(bfy))
[1] 1996 1997 1998 1999 2000

The data object will contain column names that are character but it is easy to coerce them to numeric with the as.numeric() command.

Of course this is a bit of a cheat… the column names are not real numbers. You cannot make the column names “properly” numeric but in this (character) form you can easily coerce them to be numeric when you need with the as.numeric() command.

Make transparent colors in R

You can easily make transparent colors using R and the rgb() command. These colors can be useful for charts and graphics with overlapping elements.

The rgb() command is the key: you define a new color using numerical values (0–255) for red, green and blue. In addition, you set an alpha value (also 0–255), which sets the transparency (0 being fully transparent and 255 being “solid”).

You also need to set the maximum color value, so that the command can relate your alpha value to a level of transparency. In practice setting max = 255 works well (since RGB colors are usually defined in the range 0–255).

The following example takes the standard blue and makes it transparent (~50%):

mycol <- rgb(0, 0, 255, max = 255, alpha = 125, names = "blue50")

Note that the names parameter sets a name attribute for your color. You cannot use the name directly but it can be useful to see a name, e.g.

> mycol
     blue50
"#0000FF7D"

Get the RGB values

Use the col2rgb() command to get the red, green and blue values you need for the rgb() command e.g.:

> col2rgb("lightblue")
      [,1]
red    173
green  216
blue   230

This gives you a matrix with three rows (red, blue, green). This means you can get values for several colors at once:

> col2rgb(c("lightblue", "lightgreen", "pink"))
      [,1] [,2] [,3]
red    173  144  255
green  216  238  192
blue   230  144  203

Custom function for transparent colors

It is fairly easy to make a custom function that takes a named color and makes a transparent version. Here is some code that makes a single color. It could be extended to make a matrix of several colors quite easily:

## Transparent colors
## Mark Gardener 2015
## www.dataanalytics.org.uk

t_col <- function(color, percent = 50, name = NULL) {
  #      color = color name
  #    percent = % transparency
  #       name = an optional name for the color

## Get RGB values for named color
rgb.val <- col2rgb(color)

## Make new color using input color as base and alpha set by transparency
t.col <- rgb(rgb.val[1], rgb.val[2], rgb.val[3],
             max = 255,
             alpha = (100 - percent) * 255 / 100,
             names = name)

## Save the color
invisible(t.col)
}
## END

Example of using transparent color

Here is a quick example of the function in action:

> opar <- par(mfrow = c(1,2))
> set.seed(1)
> hist(rnorm(100), col = "pink")
> mycol <- t_col("pink", perc = 50, name = "lt.pink")
> hist(rnorm(100), col = mycol)
> par(opar)

Save all objects to disk as separate files

You can easily save one or more objects to disk using save() or save.image(). However, if you want to save many items, each as a separate file, you have to use a loop and a paste() command…

It is easy to save the objects you have stored in the R Console. You can save individual items with save().

save(..., list = character(), file = "filename")

You can either type the names of the items to save, separated by commas, or provide a list. The list can be any object or command that produces names of objects.

If you want to save all items, you have then you can use:

save(list = ls(), file = "filename")

A convenience function save.image() does the same thing. You just specify the filename to hold the data. If you omit the filename R saves to a default file: this is what is opened when R starts up and what happens when you say “yes” when asked if you want to save the workspace.

Both save() and save.image() will create a single file when you run them. In most cases this is helpful but there are times when you want to save objects as separate files. One such occasion might be when you are making an R package. You will need to save data items as separate .RData files and functions as .R files.

You can use a simple loop and the paste() function to save all objects to a folder as separate files.

obj <- ls()

for(i in 1:length(obj))
 {
save(list = (obj[i]),
file = paste(obj[I], ".RData", sep = ""))
 }

rm(i, obj)

You start by listing everything and saving the result to a named object (obj in this case).

The loop runs for as many times as there are items in the listing; length(obj). Each time the loop goes around it takes the name of an item; list = obj[i] and saves it to a file.

The filename is built using the paste() command. This joins things together, with the sep =parameter telling R what character to use (if any) between items. In the example I used the name of the item; obj[i] and added .RData to the name. Note that “.RData” is in quotes. The sep = “” part tells R to use no character in the joining.

The result is that every item obtained by ls() is saved to disk as a separate .RData file. The files will go to your working directory unless you alter it, either using the setwd() command or by prepending a pathname to the paste() command.

Separate listing for functions and variables

If you have both data and functions in your workspace, you’ll need to split them. I found these custom functions on the [R] Help forum via a web search:

ls.funs <- function(env=sys.frame(-1)) unlist(lapply(ls(env=env),
function(x) if(is.function(get(x)))x))

ls.vars <- function(env=sys.frame(-1)) unlist(lapply(ls(env=env),
function(x) if(!is.function(get(x)))x))

# To use type:
ls.funs()
ls.vars()

You can use these new functions instead of ls() and make a new object containing the names. Simply use “.RData” for varaibles (data) and “.R” for functions to get the correct file extensions.