I have a table of data that I need to summarise. Typically, the data contains 1 row for each ID, in some case there are a few rows.
I need all the data fields (50 in total) in my output query. The fields I need to query are listed below:
ID
TYPE (1,2 or 3)
SCORE
DATE
(obviously I don't actually have fields called 'type' and 'date')
I need a single row for each ID, returning ALL FIELDS for the row with the 'maximum score'. Unfortunately it's a bit more complicated than that.
If the records are of the same TYPE I need the one with the most recent DATE.
If the records are different TYPEs I need the MAX SCORE.
For a single ID there could be a records with the same type but different dates, or different types.
Due to data quality issues beyond my control there also appear to be exact duplicate rows!
The dataset has approx 130,000 rows with approx 120,000 unique IDs. Most of the 'duplicate' IDs seem to have 2 records so its a small percentage of the overal dataset.
SQL Group By with max - need all fields
-
- Administrator
- Posts: 78671
- Joined: 16 Jan 2010, 00:14
- Status: Microsoft MVP
- Location: Wageningen, The Netherlands
Re: SQL Group By with max - need all fields
There may be a more efficient way, but I'd do it in several steps:
1) A query qryUniqueTypes that returns the unique (non-repeated) TYPE values for each ID.
2) A query qryUniqueTypeCount that counts the number of unique TYPE values for each ID.
3) A query qryMaxDate based on the table and qryUniqueTypeCount joined on ID that returns the records where the type count is 1. Group on ID and return the Max of DATE.
4) A query qryAllMaxDate based on the table and on qryMaxDate joined on ID vs ID and DATE vs MaxDATE. Return all fields from the table.
5) A query qryMaxScore based on the table and qryUniqueTypeCount joined on ID that returns the records where the type count is greater than 1. Group on ID and return the Max of SCORE.
6) A query qryAllMaxScore based on the table and on qryMaxScore joined on ID vs ID and SCORE vs MaxSCORE. Return all fields from the table.
7) A union query qryAllMax that combines qryAllMaxDate and qryAllMaxScore.
This will be slow with a large table. It might help to isolate the IDs that have duplicates first.
See the attached sample database.
1) A query qryUniqueTypes that returns the unique (non-repeated) TYPE values for each ID.
2) A query qryUniqueTypeCount that counts the number of unique TYPE values for each ID.
3) A query qryMaxDate based on the table and qryUniqueTypeCount joined on ID that returns the records where the type count is 1. Group on ID and return the Max of DATE.
4) A query qryAllMaxDate based on the table and on qryMaxDate joined on ID vs ID and DATE vs MaxDATE. Return all fields from the table.
5) A query qryMaxScore based on the table and qryUniqueTypeCount joined on ID that returns the records where the type count is greater than 1. Group on ID and return the Max of SCORE.
6) A query qryAllMaxScore based on the table and on qryMaxScore joined on ID vs ID and SCORE vs MaxSCORE. Return all fields from the table.
7) A union query qryAllMax that combines qryAllMaxDate and qryAllMaxScore.
This will be slow with a large table. It might help to isolate the IDs that have duplicates first.
See the attached sample database.
You do not have the required permissions to view the files attached to this post.
Best wishes,
Hans
Hans