Starter SQL question for movie ratings database

I have a database with one table, for example:

UserID (int), MovieID (int), Rating (real)

      

UserIDs and movieIDs are big numbers, but my database only has a sample of many possible values ​​(4000 unique users and 3000 unique movies).

I'm going to do an SVD (singular value decomposition) matrix on it, so I want to return this database as an ordered array. Basically, I want each user to be ok, and for each user, return each movie in order and then return a rating for that user, a couple of movies, or null if that user hasn't rated that movie. Example:

USERID | MOVIEID | RATING
-------------------------
99835   8847874    4
99835   8994385    3
99835   9001934    null
99835   3235524    2
           .
           .
           .
109834  8847874    null
109834  8994385    1
109834  9001934    null

etc

      

This way I can just read these results into a 2D array suitable for my SVD algorithm. (Any other suggestions for getting a database of information into a simple 2D array of floats would be appreciated)

It is important that this is returned so that when I return my 2D array, I can re-display the values ​​for the respective users and movies in order to perform my analysis.

+1


source to share


3 answers


SELECT m.UserID, m.MovieID, r.Rating
    FROM (SELECT a.userid, b.movieid
              FROM (SELECT DISTINCT UserID FROM Ratings) AS a,
                   (SELECT DISTINCT MovieID FROM Ratings) AS b
         ) AS m LEFT OUTER JOIN Ratings AS r
         ON (m.MovieID = r.MovieID AND m.UserID = r.UserID)
    ORDER BY m.UserID, m.MovieID;

      

Tested now and seems to work!

The concept is to create a Cartesian product from a list of UserID values ​​in the Ratings table with a list of MovieID values ​​in the Ratings table (ouch!), And then outer join that full matrix using the Ratings table (again) to collect the rating values.

It is NOT .



This can be effective.

You may be able to just run a simple simple data selection and arrange to populate the arrays as data comes in. If you have many thousands of users and movies, you will return many millions of rows, but most of them will be null. You should treat the incoming data as a description of a sparse matrix and first set the matrix in the program to all zeros (or some other default), and then read the stream from the database and set only those rows that were actually present.

This request is mostly trivial:

SELECT UserID, MovieID, Rating
    FROM Ratings
    ORDER BY UserID, MovieID;

      

+6


source


Sometimes it's best to reorganize the table / normalize your data (if that's an option).

Normalize data structure:

User table: (all individual users)
UserId, FirstName, LastName

Movie table: (all different movies)
MovieId, Name

UserMovieRatings: (ratings users have given to movies)
UserId, MovieId, Rating



You can do a Cartesian join if you want every combination of users and movies, then use the UserMovieRatings table as needed.

It is probably best to refactor now, before the system gets more complex. Take this time in advance and I'm sure any requests you need to make will come naturally ... hope this helps ...

Example request:


select UserId, FirstName, LastName, MoveId, Name, cast(null as int) as Rating
into #FinalResults
from Users
cross join Movies

      


update #FinalResults
set Rating = UMR.Rating
from #FinalResults FR
inner join UserMovieRatings UMR
on FR.UserId = UMR.UserId and FR.MovieId = UMR.MovieId

      

+1


source


If I understand your question correctly, you have all the data in your table and you just want to extract it in the correct order. It's right? If so, then it should be simple:

select userid, movieid, rating
from ratings
order by userid, movieid

      

0


source







All Articles