Author : Vincent D. Warmerdam

Blog: http://koaning.github.io

This document contains a notebook created at an open Rstudio session in Amsterdam. Free for all to use/refer to. It is meant as a sprint through R functionality to help you get started using the free software such that you aren’t depended on software you need to pay for (SPPS/Excel/SAS).

In this document I will assume that you have installed the `dplyr`

and `ggplot2`

package and that you have activated them. You can check this by running:

```
library(dplyr)
library(ggplot2)
```

We can declare variables just like excel cells.

```
a = 1
b = 3
a + b
```

`## [1] 4`

We can also assign variables just like excel cells.

```
c = a + b
c
```

`## [1] 4`

We can make it as complex as we want, just like in excel. We’ll shortly see that this might not be the best way of doing things though.

```
one = 1
two = 2
three = 3
four = 4
five = 5
total = one + two + three + four + five
total
```

`## [1] 15`

You will have used a function before in Excel even though you might not have been aware of this.

`max(one,5)`

`## [1] 5`

Notice that you can also use variables that you’ve defined just like cells. Remember that `four`

and `five`

are now variables?

`max(four, five)`

`## [1] 5`

There are many other functions that excel uses that R also has. Some examples:

`sqrt(2)`

`## [1] 1.414214`

`sum(1,2,3,4,5)`

`## [1] 15`

`log(2)`

`## [1] 0.6931472`

R even has some variables predefined for you that might be useful.

`pi`

`## [1] 3.141593`

And if you want to, you can offcourse use a function within a function.

`log(sqrt(2))`

`## [1] 0.3465736`

This is where excel doesn’t excel (pun intended). In excel you have a list of useful functions, in R you can make your own.

Suppose that you are given a radius of a circle and you want to know the area of the circle.

```
circle_area = function(radius){
resultaat = radius*radius*pi
return(resultaat)
}
add = function(n1, n2){
return(n1 + n2)
}
add(1,circle_area(2))
```

`## [1] 13.56637`

`circle_area(1)`

`## [1] 3.141593`

Just like that, you can create **ANY** function! You may not be able to appreciate the power that this gives you, but you may soon.

Try to create a function that calculates the amount of money on a savings account. Write a function that takes into account a starting amount, an interest rate and a number of years. When calling the function `money(start=100, interest=0.03, years =1)`

it should give back 103.

Let’s get a little bit philosophical about lanuages now. By typing code we are giving instructions to the computer in a way that the computer understands. But this is not by definition the way we understand language. Note that in a computer language, just like in a real language, we can usually explain things in two ways.

For example, take this bit of code:

`sqrt(2)`

`## [1] 1.414214`

We are reading it as: “take the square root of two”. Another way in which we can describe this is in human terms is “take the number two and calculate the square root of it”. R is a nice language, because it allows both human thoughts to be translated into a computer program.

`2 %>% sqrt `

`## [1] 1.414214`

Notice the similarity. This is just notation but it makes code just a bit easier to read. Just a simple example:

`sqrt(log(sqrt(log(2))))`

`## Warning in sqrt(log(sqrt(log(2)))): NaNs produced`

`## [1] NaN`

`2 %>% log %>% sqrt %>% log %>% sqrt`

`## Warning in sqrt(.): NaNs produced`

`## [1] NaN`

It feels more as if I am reading a chain of commands that I can give the computer without having to write a very ugly function.

So we have variables, which are basically names of objects. We can apply functions to these variables. But not every function will work on every variable. Just like in excel, this would produce an error:

`3 + 'three' `

This is because the computer doesn’t know how to add `3`

(a number) to `'three'`

(a sequence of characters, also known as ‘strings’). This might give a lot of errors and this is something you will need to be aware of. Certain functions work on certain variables.

Some examples of functions that work on characters:

`paste("hello", "world")`

`## [1] "hello world"`

```
a = 12
substr("mattie",a , 2)
```

`## [1] ""`

You can aslo change a type of a variable (this is know as casting):

`as.numeric("104.5")`

`## [1] 104.5`

`as.character(1)`

`## [1] "1"`

An array is another type of object we can use in R. It is basically a list of other variables, just like a row or a column in excel. These can be created easily.

`c(1,2,3,4,5,6,7,8,9,0)`

`## [1] 1 2 3 4 5 6 7 8 9 0`

`1:100`

```
## [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
## [18] 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34
## [35] 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51
## [52] 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68
## [69] 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85
## [86] 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100
```

`seq(1, 10, 0.1)`

```
## [1] 1.0 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2.0 2.1 2.2 2.3
## [15] 2.4 2.5 2.6 2.7 2.8 2.9 3.0 3.1 3.2 3.3 3.4 3.5 3.6 3.7
## [29] 3.8 3.9 4.0 4.1 4.2 4.3 4.4 4.5 4.6 4.7 4.8 4.9 5.0 5.1
## [43] 5.2 5.3 5.4 5.5 5.6 5.7 5.8 5.9 6.0 6.1 6.2 6.3 6.4 6.5
## [57] 6.6 6.7 6.8 6.9 7.0 7.1 7.2 7.3 7.4 7.5 7.6 7.7 7.8 7.9
## [71] 8.0 8.1 8.2 8.3 8.4 8.5 8.6 8.7 8.8 8.9 9.0 9.1 9.2 9.3
## [85] 9.4 9.5 9.6 9.7 9.8 9.9 10.0
```

These arrays can also be store in variables just like anything else. They can also have functions being applied on them.

```
a = seq(1, 10, 0.1)
sqrt(a)
```

```
## [1] 1.000000 1.048809 1.095445 1.140175 1.183216 1.224745 1.264911
## [8] 1.303840 1.341641 1.378405 1.414214 1.449138 1.483240 1.516575
## [15] 1.549193 1.581139 1.612452 1.643168 1.673320 1.702939 1.732051
## [22] 1.760682 1.788854 1.816590 1.843909 1.870829 1.897367 1.923538
## [29] 1.949359 1.974842 2.000000 2.024846 2.049390 2.073644 2.097618
## [36] 2.121320 2.144761 2.167948 2.190890 2.213594 2.236068 2.258318
## [43] 2.280351 2.302173 2.323790 2.345208 2.366432 2.387467 2.408319
## [50] 2.428992 2.449490 2.469818 2.489980 2.509980 2.529822 2.549510
## [57] 2.569047 2.588436 2.607681 2.626785 2.645751 2.664583 2.683282
## [64] 2.701851 2.720294 2.738613 2.756810 2.774887 2.792848 2.810694
## [71] 2.828427 2.846050 2.863564 2.880972 2.898275 2.915476 2.932576
## [78] 2.949576 2.966479 2.983287 3.000000 3.016621 3.033150 3.049590
## [85] 3.065942 3.082207 3.098387 3.114482 3.130495 3.146427 3.162278
```

Notice that our variable `a`

has now been overwritten. You can confirm this by looking at the environment tab of Rstudio. This variable is no longer a number, but a list of numbers. Simple arrays that only have numbers in them can also be plotted very easily using the `plot`

function.

```
# plot(sqrt(a))
a %>% sqrt %>% plot(t='l')
```

We can also create an array with random numbers and use a histogram to draw them.

`1000 %>% rnorm %>% hist`

`hist(rnorm(1000))`

We can also perform some operations before we actually plot something. Also, note that an array is also an object that can use certain functions.

```
a = sqrt(seq(1, 500, 0.1))
b = rnorm(length(a))
plot(a + b)
```

Again, because of pipes, we could also do this:

```
a = seq(1, 500, 0.1) %>% sqrt
b = a %>% length %>% rnorm
(a + b) %>% plot
```