# Manipulating an R formula

# Manipulating an R formula

The `formula()`

is an integral part of the R language. You use a `formula`

to specify relationships between variables for tasks such as graphics and analysis (e.g. linear modelling). Manipulating an R formula is a useful skill.

When you’ve created some kind of analysis model in R you will have specified the variables in some kind of `formula`

. R “recognises” `formula`

objects, which have their own `class`

`"formula"`

. If, for example you used the `lm()`

command to create a regression result you will be able to extract the formula from the result.

```
mod <- lm(Fertility ~ ., data = swiss)
formula(mod)
```

```
Fertility ~ Agriculture + Examination + Education + Catholic + Infant.Mortality
```

It can be useful to be able to extract the components of the model formula. For example you may want to examine how the R^{2} value alters as you add variables to the model.

## Extract the predictor variables

To access the parts of a formula you need the `terms()`

command:

```
terms(formula(mod))
```

The result contains various components; you want the `term.labels`

.

```
attr(terms(formula(mod)), which = "term.labels")
```

```
[1] "Agriculture" "Examination" "Education" "Catholic"
[5] "Infant.Mortality"
```

You now have the variables, that is the predictor variables, from the formula. The next step is to get the response variable.

## Extract the response variable

The response variable can be seen using the terms() command and the variables component, like so:

```
attr(terms(formula(mod)), which = "variables")
```

```
list(Fertility, Agriculture, Examination, Education, Catholic, Infant.Mortality)
```

The result looks slightly odd but essentially it is a list and the 2nd component is the response.

```
vv <- attr(terms(formula(mod)), which = "variables")
rr <- as.character(vv[[2]]) # The response variable name
rr
```

```
[1] "Fertility"
```

Now you have the response variable, and the predictors from earlier, which you can use to “build” a formula.

## Building a formula

In its most basic sense a formula is simply a character string that “conforms” to the formula syntax: `y ~ x + z`

for example. You can build a formula with the `paste()`

command by joining the response, a `~`

character and the predictors you want (these themselves separated by `+`

characters).

The following example uses the `swiss`

dataset, which is built into base R.

```
mod <- lm(Fertility ~ ., data = swiss)
# Get the (predictor) variables
vars <- attr(terms(formula(mod)), which = "term.labels")
# Get the response
vv <- attr(terms(formula(mod)), which = "variables")
rr <- as.character(vv[[2]]) # The response variable name
# Now the predictors
pp <- paste(vars, collapse = " + ") # All
pp <- paste(vars[1], collapse = " + ") # 1st
pp <- paste(vars[1:3], collapse = " + ") # 1,2,3
# Build a formula
fml <- paste(rr, " ~ ", pp)
fml
```

```
[1] "Fertility ~ Agriculture + Examination + Education + Catholic + Infant.Mortality"
```

Once you have your `formula`

as a `character`

object you can use it in place of a regular `formula`

in commands.

## Using a “built” formula

The character string representing a `formula`

can be used exactly as you would a “regular” formula:

```
lm(fml, data = swiss)
```

```
Call:
lm(formula = fml, data = swiss)
Coefficients:
(Intercept) Agriculture Examination Education
66.9152 -0.1721 -0.2580 -0.8709
Catholic Infant.Mortality
0.1041 1.0770
```

One use for building a formula is in model testing. For example you create your regression model containing five predictors but maybe only the first three are really necessary. You can re-build the `formula`

term by term and extract the R^{2} value for example. This would show you how the explained variance alters as you add more variables.

See more articles in our Tips and Tricks pages.

## Comments are closed.