## Deep Learning in R

So far you know that artificial can do pretty cool stuff. Machine learning is a subfield in Computer Science (CS). Deep learning, then, is a subfield of machine learning that is a set of algorithms that is inspired by the structure and function of the brain and which is usually called Artificial Neural Networks (ANN). Deep learning is in fact making the the models which humans can’t even understand in their most complex forms. Today’s tutorial will give you a short introduction to deep learning in R.

- you’ll get started with
`keras`

package: you’ll learn how to first prepare your workspace and load built-in datasets, dummy data, and data from CSVs; - Next, you’ll see how you can pre-process the data that you loaded in from a CSV file: you’ll normalize and split the data into training and test sets.
- How to build your model
- how to fit the model to data and visualize it. you’ll predict target values based on test data;
- evaluate the model so as to tailor fit it to improve accuracy.
- Deploy your model

## deep learning in R

Keras provide an R interface to the Python deep learning package Keras for deep learning in R, of which you might have already heard. For those of you who don’t know what the Keras package has to offer to Python users, it’s “a high-level neural networks API, written in Python and capable of running on top of either TensorFlow, Microsoft Cognitive Toolkit (CNTK) or Theano”.

### Interfaces?

Keras is one of the easiest way to start with deep learning in Python.

In this case, it’s good for you to understand what it exactly means when a package, such as the R `keras`

, is “an interface” to another package for deep learning in R, the Python Keras. In simple terms, this means that the `keras`

R package with the interface allows you to enjoy the benefit of R programming while having access to the capabilities of the Python Keras package.

**Note** that this is not an uncommon practice: for example, also the `h2o`

package provides an interface, but in this case -and as the name kind of already suggests- to H2O, an open source math engine for big data that you can use to compute parallel distributed machine learning algorithms. Other packages that you might know that provide interfaces are `RWeka`

(R interface to Weka), `tensorflow`

(R interface to TensorFlow), `openml-r`

(R interface to OpenML), … You can keep on going on and on!

### What’s the difference between Keras python and keras R package?

Now that you know all of this, you might ask yourself the following question first: how would you compare the original Python package with the R packages for deep learning in R?

In essence, you won’t find too many differences between the R packages and the original Python package, mostly because the function names are almost all the same; The only differences that you notice are mainly in the programming languages themselves (variable assignment, library loading, …), but the most important thing to notice lies in the fact of how much of the original functionality has been incorporated in the R package for deep learning in R.

Now that you have gathered some background, it’s time to get started with Keras in R for real to begin with deep learning in R. As you mentioned above, you’ll first go over the setup of your workspace. Then, you’ll load in some data and after a short data exploration and preprocessing step, you will be able to start constructing your MLP!

Let’s get on with it!

## Installing The keras Package

As always, the first step to getting started with any package is to set up your workspace: install and load in the library into RStudio or whichever environment you’re working in.

First, make sure that you install the `keras`

: you can easily do this by running `devtools::install_github("rstudio/keras")`

in your console. Next, you can load in the package and install TensorFlow:

```
# Load in the keras package
library(keras)
# Install TensorFlow
install_tensorflow()
```

That’s fast, right?

## Loading The Data

Now that the installation is done and your workspace is ready, you can start loading in your data!

### Built-in Datasets

Keras built-in datasets can be accessed with functions such as `mnist.load_data()`

, `cifar10.load_data()`

, or `imdb.load_data()`

.

Here are some examples where you load in the MNIST, CIFAR10 and IMDB data with the `keras`

package:

# Read in MNIST data mnist <- dataset_mnist() # Read in CIFAR10 data cifar10 <- dataset_cifar10() # Read in IMDB data imdb <- dataset_imdb()

**Note** that all functions to load in built-in data sets with `keras`

follow the same pattern; For MNIST data, you’ll use the `dataset_mnist()`

function to load in your data.

### Dummy Data

Alternatively, you can also quickly make some dummy data to get started. You can easily use the `matrix()`

function to accomplish this:

#make your dummy data data <- matrix(rexp(1000*784),nrow = 1000, ncol = 784) #Make dummy target values for your dummy datalabels <- matrix(round(runif(1000*10, min = 0, max =9)), nrow = 1000, ncol =10)

**Note** that it’s a good idea to check out the data structure of your data; It’s crucial to be already aware of what data you’re working with because it will be necessary for the later steps that you’ll need to take. You’ll learn more about this later on in the tutorial!

### Reading Data From Files

Let’s use the `read.csv()`

function from the `read.table`

package to load in a data set from the UCI Machine Learning Repository:

# Read in `iris` data iris <- read.csv(url("http://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data"), header = FALSE) # Return the first part of `iris` head(iris) # Inspect the structure str(iris) # Obtain the dimensions dim(iris)

### Output

It’s always a good idea to check out whether your data import was successful. You usually use functions such as `head()`

, `str()`

and `dim()`

to quickly do this.

The results of these three functions do not immediately point out anything out of the ordinary; By looking at the output of the `str()`

function, you see that the strings of the `Species`

column are read in as factors. This is no problem, but it’s definitely good to know for the next steps, where you’re going to explore and preprocess the data.

## Data Exploration

For this tutorial, you’ll continue to work with the famous `iris`

dataset that you imported with the `read.csv()`

function.

For those of you who don’t have the biology knowledge that is needed to work with this data, here’s some background information: all flowers contain a sepal and a petal. The sepal encloses the petals and is typically green and leaf-like, while the petals generally are colored leaves. For the iris flowers, this is just a little bit different, as you can see in the following picture:

You might have already seen in the previous section that the `iris`

data frame didn’t have any column names after the import. Now, for the remainder of the tutorial, that’s not too important: even though the `read.csv()`

function returns the data in a `data.frame`

to you, the data that you’ll need to pass to the `fit()`

function needs to be a matrix or array.

Some things to keep in mind about these two data structures that were just mentioned: - Matrices and arrays don’t have column names; - Matrices are two-dimensional objects of a single data type; - Arrays are multi-dimensional objects of a single data type;

**Note** that the data frame, on the other hand, is a special kind of named list where all elements have the same length. It’s a multi-dimensional object that can contain multiple data types. You already saw that this is true when you checked out the structure of the `iris`

data frame in the previous section. Knowing this and taking into account that you’ll need to work towards a two- or multi-dimensional object of a *single* data type, you should already prepare to do some pre-processing before you start building your neural network!

For now, column names can be handy for exploring purposes, and they will most definitely facilitate your understanding of the data, so let’s add some column names with the help of the `names()`

function. Next, you can immediately use the `iris`

variable in your data exploration! Plot, for example, how the petal length and the petal width correlate with the `plot()`

function.

names(iris) <- c("Sepal.Length", "Sepal.Width", "Petal.Length", "Petal.Width", "Species") plot(iris$Petal.Length, iris$Petal.Width, pch=21, bg=c("red","green3","blue")[unclass(iris$Species)], xlab="Petal Length", ylab="Petal Width")

**Note** that you use the `unclass()`

function to convert the names of the species, that is, “setosa, versicolor”, and “virginica”, to the numeric 1, 2, and 3.

Now take a closer look at the result of the plotting function:

The graph indicates a positive correlation between the `Petal.Length`

and the `Petal.Width`

for the different species of the iris flowers. However, this is something that you probably want to test with the `cor()`

function, which will give you the overall correlation between all attributes that are included in the data set:

#Overall correlation between`Petal.Length`

and`Petal.Width`

cor(iris$Petal.Length, iris$Petal.Width)

0.9627571

Additionally, you can use the `corrplot`

package in combination with the `cor()`

function to plot the correlations between your data’s attributes; In this case, you calculate the overall correlation for all attributes of the `iris`

data frame. You store the result of this calculation in a variable `M`

and pass it to the `corrplot()`

function.

Also, don’t forget to specify a `method`

argument to indicate how you want the data to be plotted!

#load corrplot library(corrplot) #Store the overall correlation in`M`

M <- cor(iris[,1:4]) # Plot the correlation plot with`M`

corrplot(M, method="circle")

Make use of the R console to explore your data further.

If you want to make plots for this data with the `ggplot`

package, which is the interactive grammar of graphics, take a look at my post about ggplot and esquisse

## Data Preprocessing

Before you can build your model, you also need to make sure that your data is cleaned, normalized (if applicable) and divided into training and test sets. Since the dataset comes from the UCI Machine Learning Repository, you can expect it to already be somewhat clean, but let’s double check the quality of your data anyway.

At first sight, when you inspected the data with `head()`

, you didn’t really see anything out of the ordinary, right? Let’s make use of `summary()`

and `str()`

to briefly recap what you learned when you checked whether the import of your data was successful:

# Pull up a summary of `iris` summary(iris) # Inspect the structure of `iris` str(iris)

Now that you’re sure that the data is clean enough, you can start by checking if the normalization is necessary for any of the data with which you’re working for this tutorial.

### Normalizing Your Data With A User Defined Function (UDF)

From the result of the `summary()`

function , you see that the Iris data set doesn’t need to be normalized: the `Sepal.Length`

attribute has values that go from `4.3`

to `7.9`

and `Sepal.Width`

contains values from `2`

to `4.4`

, while `Petal.Length`

’s values range from `1`

to `6.9`

and `Petal.Width`

goes from `0.1`

to `2.5`

. In other words, all values of all the attributes of the Iris data set are contained within the range of `0.1`

and `7.9`

, which you can consider acceptable.

However, it can still be a good idea to study the effect of normalization on your data; You can even go as far as passing the normalized data to your model to see if there is any effect.

### Normalize Your Data With `keras`

To use the `normalize()`

function from the `keras`

package, you first need to make sure that you’re working with a matrix. As you probably remember from earlier, the characteristic of matrices is that the matrix data elements are of the same basic type; In this case, you have target values that are of type factor, while the rest is all numeric.

This needs to change first.

You can use the `as.numeric()`

function to convert the data to numbers:

iris[,5] <- as.numeric(iris[,5]) -1 # Turn `iris` into a matrix iris <- as.matrix(iris) # Set `iris` `dimnames` to `NULL` dimnames(iris) <- NULL

A numerical data frame is alright, but you’ll need to convert the data to an array or a matrix if you want to make use of the `keras`

package. You can easily do this with the `as.matrix()`

function; Don’t forget here to set the `dimnames`

to `NULL`

.

As you might have read in the section above, normalizing the Iris data is not necessary. Nevertheless, it’s still a good idea to study normalization and its effect, and to see how this can not only be done with a UDF but also with the `keras`

built-in `normalize()`

function.

With your data converted to a matrix, you can indeed also use the `keras`

package to study the effect of a possible normalization on your data:

# Normalize the `iris` data iris <- normalize(iris[,1:4]) # Return the summary of `iris` summary(iris)

**Note** that here, you use `dimnames()`

to set the dimnames of `iris`

to `NULL`

. This ensures that there are no column names in your data.

### Training And Test Sets

Now that you have checked the quality of your data and you know that it’s not necessary to normalize your data, you can continue to work with the original data and split it into training and test sets so that you’re finally ready to start building your model. By doing this, you ensure that you can make honest assessments of the performance of your predicted model afterwards.

Before you split your data into training and test sets, you best first set a seed. You can easily do this with `set.seed()`

: use this exact function and just pass a random integer to it. A seed is a number of R’s random number generator. The major advantage of setting a seed is that you can get the same sequence of random numbers whenever you supply the same seed in the random number generator.

This is great for the reproducibility of your code!

You use the `sample()`

function to take a sample with a size that is set as the number of rows of the Iris data set, or 150. You sample with replacement: you choose from a vector of 2 elements and assign either 1 or 2 to the 150 rows of the Iris data set. The assignment of the elements is subject to probability weights of `0.67`

and `0.33`

.

# Determine sample size ind <- sample(2, nrow(iris), replace=TRUE, prob=c(0.67, 0.33)) # Split the `iris` data iris.training <- iris[ind==1, 1:4] iris.test <- iris[ind==2, 1:4] # Split the class attribute iris.trainingtarget <- iris[ind==1, 5] iris.testtarget <- iris[ind==2, 5]

The `replace`

argument of the `sample()`

function is set to `TRUE`

, which means that you assign a `1`

or a `2`

to a certain row and then reset the vector of `2`

to its original state.

In other words, for the next rows in your data set, you can either assign a `1`

or a `2`

, each time again. The probability of choosing a `1`

or a `2`

should not be proportional to the weights amongst the remaining items, so you specify probability weights.

*Side note:* if you would have used a built-in dataset with the specific `dataset_imdb()`

function, for example, your data can easily be split by using the `$`

operator:

```
x_train <- imdb$train$x
y_train <- imdb$train$y
x_test <- imdb$test$x
y_test <- imdb$test$y
```

### One-Hot Encoding

You have successfully split your data, but there is still one step that you need to go through to start building your model. Can you guess which one?

When you want to model multi-class classification problems with neural networks, it is generally a good practice to make sure that you transform your target attribute from a vector that contains values for each class value to a matrix with a boolean for each class value and whether or not a given instance has that class value or not.

This is a loose explanation of One Hot Encoding (OHE). It sounds quite complex, doesn’t it?

Luckily, the `keras`

package has a `to_categorical()`

function that will do all of this for you; Pass in the `iris.trainingtarget`

and the `iris.testtarget`

to this function and store the result in `iris.trainLabels`

and `iris.testLabels`

:

# One hot encode training target iris.trainLabels <- to_categorical(iris.trainingtarget) # One hot encode test target values iris.testLabels <- to_categorical(iris.testtarget) # Print out the iris.testLabels to double check the result print(iris.testLabels)

Now you have officially reached the end of the exploration and preprocessing steps in this tutorial. You can now go on to building your neural network with `keras`

!

## Constructing the Model

To start constructing a model, you should first initialize a sequential model with the help of the `keras_model_sequential()`

function. Then, you’re ready to start modeling.

However, before you begin, it’s a good idea to revisit your original question about this data set: can you predict the species of a certain Iris flower? It’s easier to work with numerical data, and you have pre-processed the data and one hot encoded the values of the target variable: a flower is either of type versicolor, setosa or virginica and this is reflected with binary `1`

and `0`

values.

A type of network that performs well on such a problem is a multi-layer perceptron. This type of neural network is often fully connected. That means that you’re looking to build a relatively simple stack of fully-connected layers to solve this problem. As for the activation functions that you will use, it’s best to use one of the most common ones here for the purpose of getting familiar with Keras and neural networks, which is the relu activation function. This rectifier activation function is used in a hidden layer, which is generally speaking a good practice.

In addition, you also see that the softmax activation function is used in the output layer. You do this because you want to make sure that the output values are in the range of 0 and 1 and may be used as predicted probabilities:

# Initialize a sequential model model <- keras_model_sequential() # Add layers to the model model %>% layer_dense(units = 8, activation = 'relu', input_shape = c(4)) %>% layer_dense(units = 3, activation = 'softmax')

**Note** how the output layer creates 3 output values, one for each Iris class (versicolor, virginica or setosa). The first layer, which contains 8 hidden notes, on the other hand, has an `input_shape`

of 4. This is because your training data `iris.training`

has 4 columns.

You can further inspect your model with the following functions:

- You can use the
`summary()`

function to print a summary representation of your model; `get_config()`

will return a list that contains the configuration of the model;`get_layer()`

will return the layer configuration.`layers`

attribute can be used to retrieve a flattened list of the model’s layers;- To list the input tensors, you can use the
`inputs`

attribute; and - Lastly, to retrieve the output tensors, you can make use of the
`outputs`

attribute.

# Print a summary of a model summary(model) # Get model configuration get_config(model) # Get layer configuration get_layer(model, index = 1) # List the model's layers model$layers # List the input tensors model$inputs # List the output tensorsmodel$outputs

## Compile And Fit The Model

Now that you have set up the architecture of your model, it’s time to compile and fit the model to the data. To compile your model, you configure the model with the `adam`

optimizer and the `categorical_crossentropy`

loss function. Additionally, you also monitor the accuracy during the training by passing `'accuracy'`

to the metrics argument.

# Compile the model model %>% compile( loss = 'categorical_crossentropy', optimizer = 'adam', metrics = 'accuracy' )

The optimizer and the loss are two arguments that are required if you want to compile the model.

Some of the most popular optimization algorithms used are the Stochastic Gradient Descent (SGD), ADAM and RMSprop. Depending on whichever algorithm you choose, you’ll need to tune certain parameters, such as learning rate or momentum. The choice for a loss function depends on the task that you have at hand: for example, for a regression problem, you’ll usually use the Mean Squared Error (MSE).

As you see in this example, you used `categorical_crossentropy`

loss function for the multi-class classification problem of determining whether an iris is of type versicolor, virginica or setosa. However, note that if you would have had a binary-class classification problem, you should have made use of the `binary_crossentropy`

loss function.

Next, you can also fit the model to your data; In this case, you train the model for 200 epochs or iterations over all the samples in `iris.training`

and `iris.trainLabels`

, in batches of 5 samples.

# Fit the model model %>% fit( iris.training, iris.trainLabels, epochs = 200, batch_size = 5, validation_split = 0.2 )

What you do with the code above is training the model for a specified number of epochs or exposures to the training dataset. An epoch is a single pass through the entire training set, followed by testing of the verification set. The batch size that you specify in the code above defines the number of samples that going to be propagated through the network. Also, by doing this, you optimize efficiency because you make sure that you don’t load too many input patterns into memory at the same time.

## Visualize The Model Training History

Also, it’s good to know that you can also visualize the fitting if you assign the lines of code in the DataCamp Light chunk above to a variable. You can then pass the variable to the `plot()`

function, as you see in this particular code chunk!

# Store the fitting history in `history` history <- model %>% fit( iris.training, iris.trainLabels, epochs = 200, batch_size = 5, validation_split = 0.2 ) # Plot the history plot(history)

Make sure to study the plot in more detail.

At first sight, it’s no surprise that this all looks a tad messy. You might not entirely know what you’re looking at, right?

One good thing to know is that the `loss`

and `acc`

indicate the loss and accuracy of the model for the training data, while the `val_loss`

and `val_acc`

are the same metrics, loss and accuracy, for the test or validation data.

But, even as you know this, it’s not easy to interpret these two graphs. Let’s try to break this up into pieces that you might understand more easily! You’ll split up these two plots and make two separate ones instead: you’ll make one for the model loss and another one for the model accuracy. Luckily, you can easily make use of the `$`

operator to access the data and plot it step by step.

# Plot the model loss of the training dataplot(history$metrics$loss, main="Model Loss", xlab = "epoch", ylab="loss", col="blue", type="l") # Plot the model loss of the test datalines(history$metrics$val_loss, col="green") # Add legend legend("topright", c("train","test"), col=c("blue", "green"), lty=c(1,1))

In this first plot, you plotted the loss of the model on the training and test data. Now it’s time to also do the same, but then for the accuracy of the model:

# Plot the accuracy of the training data plot(history$metrics$acc, main="Model Accuracy", xlab = "epoch", ylab="accuracy", col="blue", type="l") # Plot the accuracy of the validation datalines(history$metrics$val_acc, col="green") # Add Legendlegend("bottomright", c("train","test"), col=c("blue", "green"), lty=c(1,1))

Some things to keep in mind here are the following:

- If your training data accuracy keeps improving while your validation data accuracy gets worse, you are probably overfitting: your model starts to just memorize the data instead of learning from it.
- If the trend for accuracy on both datasets is still rising for the last few epochs, you can clearly see that the model has not yet over-learned the training dataset.

## Predict Labels of New Data

Now that your model is created, compiled and has been fitted to the data, it’s time to actually use your model to predict the labels for your test set `iris.test`

. As you might have expected, you can use the `predict()`

function to do this. After, you can print out the confusion matrix to check out the predictions and the real labels of the `iris.test`

data with the help of the `table()`

function.

# Predict the classes for the test data classes <- model %>% predict_classes(iris.test, batch_size = 128) # Confusion matrix table(iris.testtarget, classes)

What do you think of the results? At first sight, does this model that you have created make the right predictions?

## Evaluating Your Model

Even though you already have a slight idea of how your model performed by looking at the predicted labels for `iris.test`

, it’s still important that you take the time to evaluate your model. Use the `evaluate()`

function to do this: pass in the test data `iris.test`

, the test labels `iris.testLabels`

and define the batch size. Store the result in a variable `score`

, like in the code example below:

# Evaluate on test data and labels score <- model %>% evaluate(iris.test, iris.testLabels, batch_size = 128) # Print the score print(score)

By printing `score`

, you’ll get back the loss value and the metric value (in this case `'accuracy'`

) back.

## Fine-tuning Your Model

Fine-tuning your model is probably something that you’ll be doing a lot, especially in the beginning, because not all classification and regression problems are as straightforward as the one that you saw in the first part of this tutorial. As you read above, there are already two key decisions that you’ll probably want to adjust: how many layers you’re going to use and how many “hidden units” you will choose for each layer.

In the beginning, this will really be quite a journey.

Besides playing around with the number of epochs or the batch size, there are other ways in which you can tweak your model in the hopes that it will perform better: by adding layers, by increasing the number of hidden units and by passing your own optimization parameters to the `compile()`

function. This section will go over these three options.

### Adding Layers

What would happen if you add another layer to your model? What if it would look like this?

# Initialize the sequential model model <- keras_model_sequential() # Add layers to model model %>% layer_dense(units = 8, activation = 'relu', input_shape = c(4)) %>% layer_dense(units = 5, activation = 'relu') %>% layer_dense(units = 3, activation = 'softmax') # Compile the model model %>% compile( loss = 'categorical_crossentropy', optimizer = 'adam', metrics = 'accuracy' ) # Fit the model to the data model %>% fit( iris.training, iris.trainLabels, epochs = 200, batch_size = 5, validation_split = 0.2 ) # Evaluate the model score <- model %>% evaluate(iris.test, iris.testLabels, batch_size = 128) # Print the score print(score)

# Initialize a sequential model model <- keras_model_sequential() # Add layers to the model model %>% layer_dense(units = 8, activation = 'relu', input_shape = c(4)) %>% layer_dense(units = 5, activation = 'relu') %>% layer_dense(units = 3, activation = 'softmax') # Compile the model model %>% compile( loss = 'categorical_crossentropy', optimizer = 'adam', metrics = 'accuracy' ) # Save the training history in history history <- model %>% fit( iris.training, iris.trainLabels, epochs = 200, batch_size = 5, validation_split = 0.2 ) # Plot the model loss plot(history$metrics$loss, main="Model Loss", xlab = "epoch", ylab="loss", col="blue", type="l")lines(history$metrics$val_loss, col="green")legend("topright", c("train","test"), col=c("blue", "green"), lty=c(1,1)) # Plot the model accuracy plot(history$metrics$acc, main="Model Accuracy", xlab = "epoch", ylab="accuracy", col="blue", type="l")lines(history$metrics$val_acc, col="green")legend("bottomright", c("train","test"), col=c("blue", "green"), lty=c(1,1))

### Hidden Units

Also try out the effect of adding more hidden units to your model’s architecture and study the impact on the evaluation, just like this:

# Initialize a sequential model model <- keras_model_sequential() # Add layers to the model model %>% layer_dense(units = 28, activation = 'relu', input_shape = c(4)) %>% layer_dense(units = 3, activation = 'softmax') # Compile the model model %>% compile( loss = 'categorical_crossentropy', optimizer = 'adam', metrics = 'accuracy' ) # Fit the model to the data model %>% fit( iris.training, iris.trainLabels, epochs = 200, batch_size = 5, validation_split = 0.2 ) # Evaluate the modelscore <- model %>% evaluate(iris.test, iris.testLabels, batch_size = 128) # Print the score print(score)

Note that, in general, this is not always the best optimization because, if you don’t have a ton of data, the overfitting can and will be worse. That’s why you should try to use a small network with small datasets like this one.

Why don’t you try visualizing the effect of the addition of the hidden nodes in your model? Try it out below:

# Initialize the sequential model model <- keras_model_sequential() # Add layers to the model model %>% layer_dense(units = 28, activation = 'relu', input_shape = c(4)) %>% layer_dense(units = 3, activation = 'softmax') # Compile the model model %>% compile( loss = 'categorical_crossentropy', optimizer = 'adam', metrics = 'accuracy' ) # Save the training history in the history variablehistory <- model %>% fit( iris.training, iris.trainLabels, epochs = 200, batch_size = 5, validation_split = 0.2 ) # Plot the model loss plot(history$metrics$loss, main="Model Loss", xlab = "epoch", ylab="loss", col="blue", type="l")lines(history$metrics$val_loss, col="green")legend("topright", c("train","test"), col=c("blue", "green"), lty=c(1,1)) # Plot the model accuracy plot(history$metrics$acc, main="Model Accuracy", xlab = "epoch", ylab="accuracy", col="blue", type="l")lines(history$metrics$val_acc, col="green")legend("bottomright", c("train","test"), col=c("blue", "green"), lty=c(1,1))

> Run

### Optimization Parameters

Besides adding layers and playing around with the hidden units, you can also try to adjust (some of) the parameters of the optimization algorithm that you give to the `compile()`

function. Up until now, you have always passed a vector with a string, `adam`

, to the `optimizer`

argument.

But that doesn’t always need to be like this!

Also, try out experimenting with other optimization algorithms, like the Stochastic Gradient Descent (SGD). Try, for example, using the `optimizer_sgd()`

function to adjust the learning rate `lr`

. Do you notice an effect?

# Initialize a sequential model model <- keras_model_sequential() # Build up your model by adding layers to it model %>% layer_dense(units = 8, activation = 'relu', input_shape = c(4)) %>% layer_dense(units = 3, activation = 'softmax') # Define an optimizer sgd <- optimizer_sgd(lr = 0.01) # Use the optimizer to compile the model model %>% compile(optimizer=sgd, loss='categorical_crossentropy', metrics='accuracy') # Fit the model to the training data model %>% fit( iris.training, iris.trainLabels, epochs = 200, batch_size = 5, validation_split = 0.2 ) # Evaluate the model score <- model %>% evaluate(iris.test, iris.testLabels, batch_size = 128) # Print the loss and accuracy metrics print(score)

Besides using another optimizer, you can also try using a smaller learning rate to train your network. This is one of the most common fine-tuning techniques; A common practice is to make the initial learning rate 10 times smaller than the one that you used to train the model before.

Let’s visualize the training history one more time to see the effect of this small adjustment:

# Define an optimizer sgd <- optimizer_sgd(lr = 0.01) # Compile the model model %>% compile(optimizer=sgd, loss='categorical_crossentropy', metrics='accuracy') # Fit the model to the training data history <- model %>% fit( iris.training, iris.trainLabels, epochs = 200, batch_size = 5, validation_split = 0.2 ) # Plot the model lossplot(history$metrics$loss, main="Model Loss", xlab = "epoch", ylab="loss", col="blue", type="l")lines(history$metrics$val_loss, col="green")legend("topright", c("train","test"), col=c("blue", "green"), lty=c(1,1)) # Plot the model accuracyplot(history$metrics$acc, main="Model Accuracy", xlab = "epoch", ylab="accuracy", col="blue", type="l")lines(history$metrics$val_acc, col="green")legend("bottomright", c("train","test"), col=c("blue", "green"), lty=c(1,1))

## Saving, Loading or Exporting Your Model

There is one last thing that remains in your journey with the `keras`

package and that is saving or exporting your model so that you can load it back in at another moment.

- Firstly, you can easily make use of the
`save_model_hdf5()`

and`load_model_hdf5()`

functions to save and load your model into your workspace:

```
save_model_hdf5(model, "my_model.h5")
model <- load_model_hdf5("my_model.h5")
```

- Additionally, you can also save and load the model weights with the
`save_model_weights_hdf5()`

and`load_model_weights_hdf5()`

functions:

```
save_model_weights_hdf5("my_model_weights.h5")
model %>% load_model_weights_hdf5("my_model_weights.h5")
```

- Lastly, it’s good to know that you can also export your model configuration to JSON or YAML. Here, the functions
`model_to_json()`

and`model_to_yaml()`

will help you out. To load the configurations back into your workspace, you can just use the`model_from_json()`

and`model_from yaml()`

functions:

```
json_string <- model_to_json(model)
model <- model_from_json(json_string)
yaml_string <- model_to_yaml(model)
model <- model_from_yaml(yaml_string)
```

## One thought on “Deep Learning in R”