Starter SQL question for movie ratings database
I have a database with one table, for example:
UserID (int), MovieID (int), Rating (real)
UserIDs and movieIDs are big numbers, but my database only has a sample of many possible values (4000 unique users and 3000 unique movies).
I'm going to do an SVD (singular value decomposition) matrix on it, so I want to return this database as an ordered array. Basically, I want each user to be ok, and for each user, return each movie in order and then return a rating for that user, a couple of movies, or null if that user hasn't rated that movie. Example:
USERID | MOVIEID | RATING
-------------------------
99835 8847874 4
99835 8994385 3
99835 9001934 null
99835 3235524 2
.
.
.
109834 8847874 null
109834 8994385 1
109834 9001934 null
etc
This way I can just read these results into a 2D array suitable for my SVD algorithm. (Any other suggestions for getting a database of information into a simple 2D array of floats would be appreciated)
It is important that this is returned so that when I return my 2D array, I can re-display the values for the respective users and movies in order to perform my analysis.
source to share
SELECT m.UserID, m.MovieID, r.Rating
FROM (SELECT a.userid, b.movieid
FROM (SELECT DISTINCT UserID FROM Ratings) AS a,
(SELECT DISTINCT MovieID FROM Ratings) AS b
) AS m LEFT OUTER JOIN Ratings AS r
ON (m.MovieID = r.MovieID AND m.UserID = r.UserID)
ORDER BY m.UserID, m.MovieID;
Tested now and seems to work!
The concept is to create a Cartesian product from a list of UserID values in the Ratings table with a list of MovieID values in the Ratings table (ouch!), And then outer join that full matrix using the Ratings table (again) to collect the rating values.
It is NOT .
This can be effective.
You may be able to just run a simple simple data selection and arrange to populate the arrays as data comes in. If you have many thousands of users and movies, you will return many millions of rows, but most of them will be null. You should treat the incoming data as a description of a sparse matrix and first set the matrix in the program to all zeros (or some other default), and then read the stream from the database and set only those rows that were actually present.
This request is mostly trivial:
SELECT UserID, MovieID, Rating
FROM Ratings
ORDER BY UserID, MovieID;
source to share
Sometimes it's best to reorganize the table / normalize your data (if that's an option).
Normalize data structure:
User table: (all individual users)
UserId, FirstName, LastName
Movie table: (all different movies)
MovieId, Name
UserMovieRatings: (ratings users have given to movies)
UserId, MovieId, Rating
You can do a Cartesian join if you want every combination of users and movies, then use the UserMovieRatings table as needed.
It is probably best to refactor now, before the system gets more complex. Take this time in advance and I'm sure any requests you need to make will come naturally ... hope this helps ...
Example request:
select UserId, FirstName, LastName, MoveId, Name, cast(null as int) as Rating
into #FinalResults
from Users
cross join Movies
update #FinalResults
set Rating = UMR.Rating
from #FinalResults FR
inner join UserMovieRatings UMR
on FR.UserId = UMR.UserId and FR.MovieId = UMR.MovieId
source to share