Skip to content

2. Prepare the Data
attachment
Students can submit work

Playback Speed:
Transcript

In this video, you will build two sets of data based on the information you gathered from your classmates’ responses to your survey.

Machine learning typically works with two data sets: a training data set and a test data set.

The “training data set” is used to develop a model for your recommendation system.

The “test data set” compares the actual results with the predictions of your model.

Machine learning enables the same type of recommendation process.

The computers are first trained on one set of data, which they use to develop a model.

Then, they test the model on another data set to check its accuracy and possibly make changes to improve the model.

To begin, create a label for the two data sets you will be working with.

You do not need the date and time information included in the first column of your spreadsheet.

Instead, use this column to label the data as either a training set or a test set.

Label the column “data set.”

Then, select the first cell of the collected responses.

Instead of the date and time information, type “Training Data.”

Drag the cell handle to copy the text to label half of your data as “Training Data.”

Next, do the same for the remaining half, labeling it “Test Data.”

Now, insert another column to the right of Column A.

This column will list the classmates who took your survey as “users.”

This will make it easier to compare responses.

Label the column “users.”

Then, type “user one” in the first data row and “user two” in the second row.

Drag the cell handle to copy the pattern to the rest of the data.

Next, select the last row of your training data and add a border to the bottom of the row to visually separate the training and test data.

If you’d like, change the style, weight, or color of the border to make it easier to see.

Great job!

Move on to the next video to begin developing your own model that will help you predict how much someone will like an item.

Now, it’s your turn: Label the training and test data sets over the existing date and time information in your spreadsheet.

Insert another column and number the users.

And add a border to visually separate the data sets.

Next
Instructions
  1. Label the training and test data sets over the existing date and time information in your spreadsheet.
  2. Insert another column and number the users.
  3. Add a border to visually separate the data sets.
Shared work attachment
URL not accepted. Please paste a link to a Scratch project. URL not accepted. Please paste a link from google.com (such as a Google Doc). You don't appear to be signed in. Please refresh the page and try again. Something went wrong. Please refresh the page and try again.

This project will be shared with your teachers

Students can submit their work on this page. View their submitted work on the student progress page of My Classes.

Students can submit their work on this page.

Students: sign in to submit your work.
Teachers: sign in to view submitted work.