Skip to content

2. Prepare the Data
attachment
Students can submit work

Playback Speed:
Transcript

In this video, you will build two sets of data based on the information you gathered from your classmates’ responses to your survey.

Machine learning typically works with two data sets: a training data set and a test data set.

The “training data set” is used to develop a model for your recommendation system.

The “test data set” compares the actual results with the predictions of your model.

Machine learning enables the same type of recommendation process.

The computers are first trained on one set of data, which they use to develop a model.

Then, they test the model on another data set to check its accuracy and possibly make changes to improve the model.

To begin, create a label for the two data sets you will be working with.

You do not need the date and time information included in the first column of your spreadsheet.

Instead, use this column to label the data as either a training set or a test set.

Label the column “data set.”

Then, select the first cell of the collected responses.

Instead of the date and time information, type “Training Data.”

Drag the cell handle to copy the text to label half of your data as “Training Data.”

Next, do the same for the remaining half, labeling it “Test Data.”

Now, insert another column to the right of Column A.

This column will list the classmates who took your survey as “users.”

This will make it easier to compare responses.

Label the column “users.”

Then, type “user one” in the first data row and “user two” in the second row.

Drag the cell handle to copy the pattern to the rest of the data.

Next, select the last row of your training data and add a border to the bottom of the row to visually separate the training and test data.

If you’d like, change the style, weight, or color of the border to make it easier to see.

Great job!

Move on to the next video to begin developing your own model that will help you predict how much someone will like an item.

Now, it’s your turn: Label the training and test data sets over the existing date and time information in your spreadsheet.

Insert another column and number the users.

And add a border to visually separate the data sets.

Next
Instructions
  1. Label the training and test data sets over the existing date and time information in your spreadsheet.
  2. Insert another column and number the users.
  3. Add a border to visually separate the data sets.