3. Develop a Model
In the last video, you separated your data into training and test sets.
Now, you will use that data to begin developing a model that will help you predict how much someone will like an item.
Begin by calculating the change in ratings for the first two users and the first two items in your spreadsheet.
You can do this in your head or on a piece of paper.
In this example, user one rated the first item a “four” and the second item a “one.”
The rating decreased by three, so the change is negative three.
User two rated the first item a “one” and the second item a “five.”
The rating went up four points.
The change is positive four.
Now, determine the average change in ratings between the first and second items for user one and user two.
Negative three plus positive four equals one.
Then, divide the sum by the number of ratings -- two.
One divided by two is one-half.
Record it as a decimal.
That simple calculation is a miniature model.
According to these two data points, to predict a user’s rating on an unknown item, you would add one-half of a point to the user’s rating of the second item.
Models get better -- and machines get smarter -- with more training data.
However, it’s important to have the right data in the “right” format.
That can be a challenge for even the most experienced data scientists.
For example, pictures of pandas aren’t very useful if you are training a machine about human facial recognition.
The best data contains information that relates to the outcome you want to predict.
To use the data in your training set to calculate the average difference, you could do calculations of every user and response in your spreadsheet.
But that would take a long time and you might make mistakes.
Instead, use an array formula.
An “array formula” performs calculations on one or more items in a range of cells.
To begin, add space for the array function.
Insert a row at the top of the spreadsheet, above the column labels.
Label the row “Average Difference from Item one.”
Next, find the distance between the user rating for the first and second items in your form.
In Google Sheets, formulas always begin with an equals sign.
Then, type “Array.”
Choose “array formula” from the menu.
Find the average of the differences.
Inside the array formula, after the open parenthesis, type “average” and another open parenthesis.
Select the range of cells to add -- in this case, all of the training data for item two.
Then, type a minus sign and select all of the training data for item one.
Then, add a close parenthesis to finish the formula.
The array formula takes the user rating for item one ...
And subtracts it from the rating for item two ...
For each user in the training data set.
Then, it calculates the average of those differences.
Your model incorporates all of the data in the training set, making it more accurate.
In the next video, you will continue improving your model.
Now, it’s your turn: Insert a row at the top of your spreadsheet and label it “average difference from item one.”
And use an array formula to find the average distance between the user rating for the first and second items.
- Insert a row at the top of your spreadsheet and label it “Average Difference from Item 1.”
- Use an array formula to find the average distance between the user rating for the first and second items.
Shared work attachment
This project will be shared with your teachers
Students can submit their work on this page. View their submitted work on the student progress page of My Classes.