MemSQL MIN/MAX Row Correlation
MemSQL 5 removes a non-standard additional SQL semantics guarantee provided by MemSQL 4.1 and earlier for selecting non-aggregated, non-grouped fields alongside MIN
or MAX
.
When performing a grouping, MemSQL allows you to reference fields not in the group-by list (or functionally dependent on the group-by list), for example in the select list, even though there may be multiple values for such fields within each group. Note that MemSQL accepting these queries is an extension to standard SQL, also supported by MySQL (without the ONLY_FULL_GROUP_BY mode, which we will assume for the rest of this section). MemSQL may return an arbitrary value from any row within the group, the same behavior as MySQL.
MemSQL versions 4.1 and earlier provided an additional guarantee: when selecting a MIN
or MAX
aggregate along with another field not in the group-by list (or functionally dependent on the group-by list), the value returned for the field was guaranteed to correlate with the minimum/maximum value - MemSQL returned the first field value encountered in a row that also contained the minimum / maximum value. In MemSQL 5 and later, this guarantee is removed, and MemSQL may return an arbitrary value from any row within the group.
In MemSQL 5 and later, you may write standard SQL-based queries to get the same results using the ROW_NUMBER() OVER(...)
window function.
Example
We will use this example table:
create table people (name varchar(20), age int, category char(1));
insert into people
(name , age, category) values
('Lauren Ipsum' , 10, 'a'),
('Hugh Rustic' , 34, 'a'),
('N. Veterate Tinker', 29, 'b'),
('Modus P. Ponens' , 14, 'b');
Suppose we want to select the name and age of the oldest person in each category.
This query selects name
, even though there are multiple values. In MySQL and MemSQL 5 and later, this may return any arbitrary value from within the group, such as:
memsql> select max(age) as age, name, category from people group by category;
+------+-----------------+----------+
| age | name | category |
+------+-----------------+----------+
| 34 | Hugh Rustic | a |
| 29 | Modus P. Ponens | b |
+------+-----------------+----------+
In MemSQL 4.1 and earlier, the value of name
was guaranteed to correlate to the max value, so the query returns:
memsql> select max(age) as age, name, category from people group by category;
+------+--------------------+----------+
| age | name | category |
+------+--------------------+----------+
| 34 | Hugh Rustic | a |
| 29 | N. Veterate Tinker | b |
+------+--------------------+----------+
The following standard SQL-based query returns the same desired results:
select age, name, category from (
select *,
row_number() over (partition by category order by age desc) as rownumber
from people) sub
where rownumber = 1;