I came across this blog posting about optimizing order by rand() and decided to make my queries better. Here is my real life example on how to optimize this query:
Normally, when you click the random link on jemdiary (jemdiary.com/read/random), you are retrieving a user’s username to view their diary. This user has to have their account flagged to allow random reads, and not be globally password protected. Finally, they need to be an active user. The settings are in the Tsettings table, the users are in the Tuser table. See my current sql statement:
1 2 3 4 5 6 7 | select u.userID, u.userName from Tuser u inner join TuserSettings s on u.userID=s.userID where s.allowRandom=1 and s.globalPasswordProtect=0 and u.status='A' order by rand() limit 1 |
This, unfortunately is going to be slow because of the order by portion of the query. Using what I learned in that blog post, I was able to modify the sql to take advantage of their speedy query. Its important to know that I have only deleted a few rows from the user-database. About 3% are missing. The rest have just been inactivated (eh… design changes as time goes on ya know
). So, I didn’t implement the balancing algorithm. Additionally, I added a few extra items in for testing, so I could make sure that my users were actually validating correctly. See this sql:
1 2 3 4 5 6 7 8 9 10 11 12 | select u.userID, u.userName, s.globalPasswordProtect, s.allowRandom from Tuser u JOIN (select (rand() * (select max(userID) from Tuser)) as id) AS r2 inner join TuserSettings s on u.userID=s.userID where u.userID >= r2.id and s.allowRandom=1 and s.globalPasswordProtect=0 and u.status='A' order by u.userID ASC limit 1 |
This was a nice fast way to do it. (The production version of this only selects u.userID and u.userName).

Nice solution but it will only work for selecting one random result from the table, or am i wrong.
Shore it will give a different sub table each time but it’s only the first ‘id’ that will be random since you do “where u.userID >= r2.id” this will give a sub set of rows starting from userID greather then the random number.
In your case you Limit the result to 1 “one” row, but what if you need more like five or ten randomly selected rows?
Just wanted to give my feedback and if you or anyone else has a tip or solution in store for selecting a greater random generated sub set I’d love to here about it.
@Tord: You are correct – it will only choose one random row. I’ve never really chosen more than one random row at a time. I think what I would do in that case is to write a stored procedure and execute that. True, it has to be crafted (its easy to just say “write one”) – but thats the direction I would start to look at.