SQL Group By with max - need all fields

darsha
NewLounger
Posts: 19
Joined: 15 Mar 2010, 11:56

SQL Group By with max - need all fields

Post by darsha »

I have a table of data that I need to summarise. Typically, the data contains 1 row for each ID, in some case there are a few rows.

I need all the data fields (50 in total) in my output query. The fields I need to query are listed below:

ID
TYPE (1,2 or 3)
SCORE
DATE
(obviously I don't actually have fields called 'type' and 'date')

I need a single row for each ID, returning ALL FIELDS for the row with the 'maximum score'. Unfortunately it's a bit more complicated than that.
If the records are of the same TYPE I need the one with the most recent DATE.
If the records are different TYPEs I need the MAX SCORE.

For a single ID there could be a records with the same type but different dates, or different types.
Due to data quality issues beyond my control there also appear to be exact duplicate rows!

The dataset has approx 130,000 rows with approx 120,000 unique IDs. Most of the 'duplicate' IDs seem to have 2 records so its a small percentage of the overal dataset.

User avatar
HansV
Administrator
Posts: 78671
Joined: 16 Jan 2010, 00:14
Status: Microsoft MVP
Location: Wageningen, The Netherlands

Re: SQL Group By with max - need all fields

Post by HansV »

There may be a more efficient way, but I'd do it in several steps:
1) A query qryUniqueTypes that returns the unique (non-repeated) TYPE values for each ID.
2) A query qryUniqueTypeCount that counts the number of unique TYPE values for each ID.
3) A query qryMaxDate based on the table and qryUniqueTypeCount joined on ID that returns the records where the type count is 1. Group on ID and return the Max of DATE.
4) A query qryAllMaxDate based on the table and on qryMaxDate joined on ID vs ID and DATE vs MaxDATE. Return all fields from the table.
5) A query qryMaxScore based on the table and qryUniqueTypeCount joined on ID that returns the records where the type count is greater than 1. Group on ID and return the Max of SCORE.
6) A query qryAllMaxScore based on the table and on qryMaxScore joined on ID vs ID and SCORE vs MaxSCORE. Return all fields from the table.
7) A union query qryAllMax that combines qryAllMaxDate and qryAllMaxScore.

This will be slow with a large table. It might help to isolate the IDs that have duplicates first.

See the attached sample database.
GroupBy.zip
You do not have the required permissions to view the files attached to this post.
Best wishes,
Hans