R Lesson 4 – Data Preprocessing

Today we will be looking at how to go about data pre-processing in R. This can be done by reading in some arbitrary data. You can find data sets using this link : https://www.openml.org/

We want to begin by reading the data in from the file, which can be done by using this function:

Once the data is read in, it’s possible to do a lot with it. We’ll start with PCA, which essentially reduces dimensionality by combining attributes to form principle components. These principle components then represent a percentage of the data.

To do this in R, we use the following code:

Now we’ll look at scaling the data, which is also known as normalizing the data. This is a process of converting attribute values into new values within given bounds, usually [0,1] or [-1,1]. This can be done in R as follows:

Regularization is intended to solve overfitting problems in machine learning through a loss function with two main types; L1 and L2.

Next time

That’ll cover it for this one, next time we’ll look at visualising data in R.

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Design a site like this with WordPress.com
Get started