Need help with MySQL grouping

But then you would count some sets of points more than once. You can't let them overlap.

There is no overlap. There is only experience, knowledge, and understanding. And reading comprehension, that's the most important thing.
 
Thanks guys for giving input on this. I had some time to sleep on it last night and think I've finally managed to get what I need.

The example in the OP wasn't an exact illustration of what my data looked like and while it is what I wanted to achieve essentially, I was able to work around it by knowing the source data.

Basically, every users will only accumulate these points maybe once every few weeks, and they would be in batches of between 2 - 10 entries per user.

So, what I would have is something like this:

Code:
USERNAME POINTS DATESTAMP  
USER1    50     2014-01-16 23:59:03
USER1    22     2014-01-17 00:01:04
USER1    16     2014-01-17 00:19:17
USER2    70     2014-01-10 23:59:03
USER2    82     2014-01-11 00:01:04
USER2    56     2014-01-11 00:19:17
USER3    79     2014-01-01 23:59:03
USER3    89     2014-01-02 00:01:04
USER3    59     2014-01-02 00:19:17

Unlike in the example I wouldn't ever have one user span over more than one 24 hour period (more like 1 hour though - which is why I initially wanted the 1 hour period).

Essentially what I've done is create a counter and create a unique user_grouping field for each batch of user entries.

Code:
set @a = 'N/A';
set @b = 0;

SELECT t1.username, SUM(t1.points) as total_points
		COUNT(t1.username) AS user_count
	FROM 
    (
		SELECT IF(@a != username, @b := @b + 1, @b := @b) AS user_counter,
			username, points, datestamp
            @a := username username_2,
			CONCAT(@a, CAST(@b AS CHAR(20))) AS user_grouping
		FROM points_table
		ORDER BY username, datestamp
	) t1
	GROUP BY t1.user_grouping

Now each batch of users have their own unique grouping field which doesn't rely on me using a date.

It's not exactly what I set out to do, but the result is what I was after and it seems I've got it.
 
Thanks guys for giving input on this. I had some time to sleep on it last night and think I've finally managed to get what I need...

Can you please tell me what solution your algorithm gives to the following?

Code:
USERNAME POINTS DATESTAMP  
USER1    50     2014-01-16 23:59:03
USER1    22     2014-01-17 00:01:04
USER1    16     2014-01-17 00:19:17
USER1    55     2014-01-17 01:00:00
 
Can you please tell me what solution your algorithm gives to the following?

Code:
USERNAME POINTS DATESTAMP  
USER1    50     2014-01-16 23:59:03
USER1    22     2014-01-17 00:01:04
USER1    16     2014-01-17 00:19:17
USER1    55     2014-01-17 01:00:00

It would output the sum of all of those values, and that is what I would want.
 
Can you please tell me what solution your algorithm gives to the following?

The OP found a solution by changing the requirements. Wasting time trying to solve the original problem is pointless unless you're doing it out of academic curiosity.
 
The OP found a solution by changing the requirements. Wasting time trying to solve the original problem is pointless unless you're doing it out of academic curiosity.

Yep, that's about it.

An answer to the original question would still work for my dataset but after realising it was practically impossible to do I had to make another plan.
 
Yep, that's about it.

An answer to the original question would still work for my dataset but after realising it was practically impossible to do I had to make another plan.

Meh, alright then.
 
But 23:59 and 01:00 are > 1 hour apart?

It's difficult to explain when it comes to the source data, it essentially wouldn't matter if they were more than an hour apart. As I mentioned, I know exactly how the source data is being written so I worked with that.
 
:D

Thanks for all the help though. If it wasn't for you saying it was very difficult I might still have been trying to get it right doing it like I originally wanted :D

The book Computers and Intractability: A Guide to the Theory of NP-Completeness has a very nice introduction. It's essentially about a person who has been given a problem to solve, but he just cannot seem to come up with a feasible solution:

gj1.gif

gj2.gif

gj3.gif


So it's useful to know when you can't actually find an efficient solution (although I still don't think (educated guess) your problem is NP-complete). ;)
 
While working with the current version of my algorithm I started thinking I could actually make a few changes which would answer my original question.

As long as the sorting is correct, it looks like the following query would give me my "period" grouping. It would use the first date record available for each user as the starting time.

Here is an example of grouping by 30 second intervals, just changing the 30 to 3600 would give us one hour intervals.

Code:
set @a = ''; # Previous username variable
set @c = ''; # Previous timestamp variable
set @d = ''; # Grouping timestamp variable
set @e = 0; # Timestamp counter variable

SELECT 
    IF(@a != username, @e := 0, @e := @e + IFNULL((ABS(TIMESTAMPDIFF(SECOND, timestamp, @c))), 0)) AS time_counter,
    IF((ABS(TIMESTAMPDIFF(SECOND, timestamp, @c)) > 30 OR @c = '') OR 
		(@a != username) OR
        (@e > 30), @d := timestamp, @d := @d) as timestamp_groups,
    IF(@a != username OR @e > 30, @e := 0, @e := @e) AS counter_reset,
    @c as timestamp_previous,
	username, timestamp, points,
	@a := username username_calc, @c := timestamp timestamp_calc,
	CONCAT(@a, CAST(@d AS CHAR(20))) AS user_grouping
FROM points_table
ORDER BY username, timestamp

Each 30 second range will now have a unique grouping identifier, so you can use this as a subquery and then group by the user_grouping column.

Haven't tested it extensively but seems to provide the correct results.

EDIT: Also seems to run quickly enough:

Code:
59182 row(s) returned	0.371 sec / 2.480 sec
 
Last edited:
Top
Sign up to the MyBroadband newsletter
X