Conditional Filter For Aggregates In Oracle Database 26ai

BACKGROUND

Consider this problem from LeetCode: leetcode_db_1158, about figuring out for each user, the number of orders they made in 2019.

The code below can create the sample data set needed for the question.

Create table If Not Exists Users (user_id NUMBER, join_date date, favorite_brand varchar2(10)) ;
Create table If Not Exists Orders (order_id NUMBER, order_date date, item_id NUMBER, buyer_id NUMBER, seller_id NUMBER);
Create table If Not Exists Items (item_id NUMBER, item_brand varchar2(10));
Truncate table Users;
insert into Users (user_id, join_date, favorite_brand) values (1, date'2018-01-01', 'Lenovo');
insert into Users (user_id, join_date, favorite_brand) values (2, date'2018-02-09', 'Samsung');
insert into Users (user_id, join_date, favorite_brand) values (3, date'2018-01-19', 'LG');
insert into Users (user_id, join_date, favorite_brand) values (4, date'2018-05-21', 'HP');
Truncate table Orders;
insert into Orders (order_id, order_date, item_id, buyer_id, seller_id) values (1, date'2019-08-01', 4, 1, 2);
insert into Orders (order_id, order_date, item_id, buyer_id, seller_id) values (2, date'2018-08-02', 2, 1, 3);
insert into Orders (order_id, order_date, item_id, buyer_id, seller_id) values (3, date'2019-08-03', 3, 2, 3);
insert into Orders (order_id, order_date, item_id, buyer_id, seller_id) values (4, date'2018-08-04', 1, 4, 2);
insert into Orders (order_id, order_date, item_id, buyer_id, seller_id) values (5, date'2018-08-04', 1, 3, 4);
insert into Orders (order_id, order_date, item_id, buyer_id, seller_id) values (6, date'2019-08-05', 2, 2, 4);
Truncate table Items;
insert into Items (item_id, item_brand) values (1, 'Samsung');
insert into Items (item_id, item_brand) values (2, 'Lenovo');
insert into Items (item_id, item_brand) values (3, 'LG');
insert into Items (item_id, item_brand) values (4, 'HP');

SQL> SELECT * FROM ITEMS;

ITEM_ID ITEM_BRAND 
_______ __________ 
      1 Samsung    
      2 Lenovo     
      3 LG         
      4 HP         

4 rows selected. 

SQL> 
SQL> 
SQL> 
SQL> SELECT * FROM ORDERS;

ORDER_ID ORDER_DATE ITEM_ID BUYER_ID SELLER_ID 
________ __________ _______ ________ _________ 
       1 01-AUG-19        4        1         2 
       2 02-AUG-18        2        1         3 
       3 03-AUG-19        3        2         3 
       4 04-AUG-18        1        4         2 
       5 04-AUG-18        1        3         4 
       6 05-AUG-19        2        2         4 

6 rows selected. 

SQL> 
SQL> 
SQL> SELECT * FROM USERS;

USER_ID JOIN_DATE FAVORITE_BRAND 
_______ _________ ______________ 
      1 01-JAN-18 Lenovo         
      2 09-FEB-18 Samsung        
      3 19-JAN-18 LG             
      4 21-MAY-18 HP             

4 rows selected. 

Here’s an Oracle SQL solution to this problem

SELECT 
         u.user_id                          buyer_id
        ,to_char(u.join_date, 'yyyy-mm-dd') join_date
        ,count(o.order_id)                  orders_in_2019
FROM
       users u
LEFT JOIN (SELECT 
                 order_id
                ,buyer_id 
                ,order_date 
           FROM orders 
           WHERE to_char(order_date, 'yyyy') = '2019'  
          ) o  ON u.user_id = o.buyer_id
GROUP BY 
         u.user_id
        ,u.join_date
;
BUYER_ID JOIN_DATE  ORDERS_IN_2019 
________ __________ ______________ 
       1 2018-01-01              1 
       2 2018-02-09              2 
       4 2018-05-21              0 
       3 2018-01-19              0 

4 rows selected. 

Notice the left join to a subquery highlighted above!

This is done to join the users table solely to a subset of the orders table for orders completed in the year 2019. This is the solution I arrived at, at my first shot, without thinking about it way too much.

The code below can also be used.

SELECT 
         u.user_id                          buyer_id
        ,to_char(u.join_date, 'yyyy-mm-dd') join_date
        ,count( case when 
                  to_char(o.order_date, 'yyyy') = '2019' then o.order_id else null end
             )       orders_in_2019
FROM
       users u
LEFT JOIN orders o  ON u.user_id = o.buyer_id
GROUP BY 
         u.user_id
        ,u.join_date
;
BUYER_ID JOIN_DATE  ORDERS_IN_2019 
________ __________ ______________ 
       1 2018-01-01              1 
       2 2018-02-09              2 
       4 2018-05-21              0 
       3 2018-01-19              0 

4 rows selected. 

The case statement highlighted works to the same effect! Explained above (link to text)

NB: a query like ;

SELECT 
         u.user_id                          buyer_id
        ,to_char(u.join_date, 'yyyy-mm-dd') join_date
        ,count(o.order_id)orders_in_2019
FROM
       users u
LEFT JOIN orders o  ON u.user_id = o.buyer_id
WHERE to_char(o.order_date, 'yyyy') = '2019'
GROUP BY 
         u.user_id
        ,u.join_date
;


BUYER_ID JOIN_DATE  ORDERS_IN_2019 
________ __________ ______________ 
       1 2018-01-01              1 
       2 2018-02-09              2 

2 rows selected. 

Fails (on LeetCode at least) as the where clause does not include any row for users 3 and 4, as they have no orders in 2019. The aim is to include all users. So we want to join to a table with only 2019 orders.

In Oracle 26ai (23.26.1.0.0), there’s an easier way I would have written up this solution using the filter clause. This post explains this feature and shows an example of using it.

NB: This feature is available from Oracle Database 23.26.1.0.0. If you try it on 23.26.0.0.0, it will fail.

INTRODUCTION

Aggregation filters allow developers to restrict the rows considered in aggregate functions such as SUM, COUNT, and AVG. This provides finer control over the data being aggregated without requiring additional subqueries.

Syntax:

aggregate_function ( aggregate_function_arguments ) [ FILTER ( WHERE condition )]
  • aggregate_function: Any supported aggregate function (For example, SUM, COUNT, AVG, MAX, MIN).
  • aggregate_function_arguments: The column or expression to be aggregated.
  • condition: A Boolean predicate that determines which rows are included in the aggregation

USAGE

Example

Consider the example above, this query can be rewritten in Oracle 26ai (23.26.1.0.0)as

SELECT 
         u.user_id                          buyer_id
        ,to_char(u.join_date, 'yyyy-mm-dd') join_date
        ,count(o.order_id) FILTER (WHERE to_char(o.order_date, 'yyyy') = '2019') orders_in_2019
FROM
       users u
LEFT JOIN orders o  ON u.user_id = o.buyer_id
GROUP BY 
         u.user_id
        ,u.join_date
;

BUYER_ID JOIN_DATE  ORDERS_IN_2019 
________ __________ ______________ 
       1 2018-01-01              1 
       2 2018-02-09              2 
       4 2018-05-21              0 
       3 2018-01-19              0 

4 rows selected.

This gives a simple way to compute conditional totals as it does not require writing a case statement or using a subquery, as in the earlier code examples used to arrive at the same solution.

Performance Comparison

I wanted to compare the performance of these queries and see which one produced a better plan/ executed at a lower cost to the database. In order to test this on my machine, I decided to add a lot more rows to the tables.

TRUNCATE TABLE Orders;
TRUNCATE TABLE Users;
TRUNCATE TABLE Items;

-- Populate 10,000 Users
INSERT INTO Users (user_id, join_date, favorite_brand)
SELECT 
    LEVEL, 
    DATE '2017-01-01' + DBMS_RANDOM.VALUE(0, 1000), 
    CASE MOD(LEVEL, 4) 
        WHEN 0 THEN 'Lenovo' WHEN 1 THEN 'Samsung' 
        WHEN 2 THEN 'LG' ELSE 'HP' END
FROM DUAL 
CONNECT BY LEVEL <= 10000;

-- Populate Items 
INSERT INTO Items (item_id, item_brand) VALUES (1, 'Samsung');
INSERT INTO Items (item_id, item_brand) VALUES (2, 'Lenovo');
INSERT INTO Items (item_id, item_brand) VALUES (3, 'LG');
INSERT INTO Items (item_id, item_brand) VALUES (4, 'HP');

-- Populate 100,000 Orders
INSERT INTO Orders (order_id, order_date, item_id, buyer_id, seller_id)
SELECT 
    LEVEL,
    DATE '2018-01-01' + DBMS_RANDOM.VALUE(0, 730), -- Dates spanning 2018-2019
    TRUNC(DBMS_RANDOM.VALUE(1, 5)),                -- Random item_id 1-4
    TRUNC(DBMS_RANDOM.VALUE(1, 10001)),            -- Random buyer_id 1-10000
    TRUNC(DBMS_RANDOM.VALUE(1, 10001))             -- Random seller_id 1-10000
FROM DUAL 
CONNECT BY LEVEL <= 100000;

COMMIT;

Note that the performance may vary based on your Operating system. I ran this on a virtual box running on my laptop, and I have not added any indexes to the tables!

Below are the results of running the queries ;

SQL> get code_1
  1  SET FEEDBACK ON SQL_ID
  2  SET TERMOUT OFF
  3  SELECT 
  4           u.user_id                          buyer_id
  5          ,to_char(u.join_date, 'yyyy-mm-dd') join_date
  6          ,count(o.order_id)                  orders_in_2019
  7  FROM
  8         users u
  9  LEFT JOIN (SELECT 
 10                   order_id
 11                  ,buyer_id 
 12                  ,order_date 
 13             FROM orders 
 14             WHERE to_char(order_date, 'yyyy') = '2019'  
 15            ) o  ON u.user_id = o.buyer_id
 16  GROUP BY 
 17           u.user_id
 18*         ,u.join_date

BUYER_ID JOIN_DATE  ORDERS_IN_2019 
________ __________ ______________ 
    7024 2019-01-07              4 
    1018 2018-09-10             11 
    3552 2017-06-20              8 
    7674 2017-04-30              5 
    6370 2017-06-11              5 
    7979 2018-06-14              8 
    6248 2018-04-09              6 
    8808 2017-05-24              6 
    8576 2019-03-14              6 
    9741 2018-04-21              8 
    1195 2019-07-08              6 
    5292 2019-07-04              5 
    1250 2019-04-01             10 
    9241 2017-09-27              5 
...
    1675 2017-01-01              0 
    5462 2018-09-04              0 
    6335 2018-10-05              0 
    7747 2018-03-30              0 
    8929 2018-10-31              0 
    3840 2017-09-01              0 
    9051 2019-05-17              0 
    7138 2019-01-27              0 
    5628 2017-12-16              0 

10,000 rows selected. 

SQL_ID: 7j1ynyy53w1xz


SQL> get code_2
  1  SET FEEDBACK ON SQL_ID
  2  SET TERMOUT OFF
  3  
  4  SELECT 
  5           u.user_id                          buyer_id
  6          ,to_char(u.join_date, 'yyyy-mm-dd') join_date
  7          ,count(o.order_id) FILTER (WHERE to_char(o.order_date, 'yyyy') = '2019') orders_in_2019
  8  FROM
  9         users u
 10  LEFT JOIN orders o  ON u.user_id = o.buyer_id
 11  GROUP BY 
 12           u.user_id
 13*         ,u.join_date
SQL> set pause on
SQL> /


BUYER_ID JOIN_DATE  ORDERS_IN_2019 
________ __________ ______________ 
    8254 2019-01-10              2 
    7024 2019-01-07              4 
    1018 2018-09-10             11 
    3552 2017-06-20              8 
    7674 2017-04-30              5 
    3027 2017-12-06              6 
    6670 2018-05-06              6 
    6370 2017-06-11              5 
    4119 2019-01-12              5 
    4024 2019-05-02              8 
...
    5540 2018-11-12              5 
    3887 2019-02-07              2 
    9051 2019-05-17              0 
    7407 2018-01-20              3 
    4180 2019-05-22              3 
    6156 2017-04-13              2 

10,000 rows selected. 

SQL_ID: duqdzm2rdzydx

SQL> 
SQL> get code_3
  1  SET FEEDBACK ON SQL_ID
  2  SET TERMOUT OFF
  3  SELECT 
  4           u.user_id                          buyer_id
  5          ,to_char(u.join_date, 'yyyy-mm-dd') join_date
  6          ,count( case when 
  7                    to_char(o.order_date, 'yyyy') = '2019' then o.order_id else null end
  8               )       orders_in_2019
  9  FROM
 10         users u
 11  LEFT JOIN orders o  ON u.user_id = o.buyer_id
 12  GROUP BY 
 13           u.user_id
 14*         ,u.join_date
BUYER_ID JOIN_DATE  ORDERS_IN_2019 
________ __________ ______________ 
    2419 2017-10-04              5 
    5501 2017-12-22              3 
    8777 2018-03-25              2 
    7163 2017-10-17              4 
    7508 2018-11-29              1 
    3852 2019-03-23              4 
    5540 2018-11-12              5 
    3887 2019-02-07              2 
    9051 2019-05-17              0 
    7407 2018-01-20              3 
    4180 2019-05-22              3 
    6156 2017-04-13              2 
...
10,000 rows selected. 

SQL_ID: ahjt2c1na9bs3
SQL> 
SQL> 

I then compared the plans used:

plan 1 (Subquery)

SQL> SELECT * FROM dbms_xplan.display_cursor('7j1ynyy53w1xz');


PLAN_TABLE_OUTPUT                                                              
______________________________________________________________________________ 
SQL_ID  7j1ynyy53w1xz, child number 0                                          
-------------------------------------                                          
SELECT           u.user_id                          buyer_id                   
,to_char(u.join_date, 'yyyy-mm-dd') join_date                                  
,count(o.order_id)                  orders_in_2019 FROM        users u         
LEFT JOIN (SELECT                   order_id                 ,buyer_id         
                ,order_date             FROM orders             WHERE          
to_char(order_date, 'yyyy') = '2019'             ) o  ON u.user_id =           
o.buyer_id GROUP BY           u.user_id         ,u.join_date                   
                                                                               
Plan hash value: 691365108                                                     
                                                                               
------------------------------------------------------------------------------ 
| Id  | Operation           | Name   | Rows  | Bytes | Cost (%CPU)| Time     | 
------------------------------------------------------------------------------ 
|   0 | SELECT STATEMENT    |        |       |       |   150 (100)|          | 
|   1 |  HASH GROUP BY      |        |    23 |  1311 |   150   (2)| 00:00:01 | 
|*  2 |   HASH JOIN OUTER   |        |    23 |  1311 |   149   (2)| 00:00:01 | 
|   3 |    TABLE ACCESS FULL| USERS  |     4 |    88 |    11   (0)| 00:00:01 | 
|*  4 |    TABLE ACCESS FULL| ORDERS | 55292 |  1889K|   138   (2)| 00:00:01 | 
------------------------------------------------------------------------------ 
                                                                               
Predicate Information (identified by operation id):                            
---------------------------------------------------                            
                                                                               
   2 - access("U"."USER_ID"="BUYER_ID")                                        
   4 - filter(TO_CHAR(INTERNAL_FUNCTION("ORDER_DATE"),'yyyy')='2019')          
                                                                               
Note                                                                           
-----                                                                          
   - dynamic statistics used: dynamic sampling (level=2)                       
                                                                               

32 rows selected. 

SQL_ID: 4p9dw7njt8wam

PLAN 2 (FILTER)

SQL> SELECT * FROM dbms_xplan.display_cursor('duqdzm2rdzydx');


PLAN_TABLE_OUTPUT                                                              
______________________________________________________________________________ 
SQL_ID  duqdzm2rdzydx, child number 0                                          
-------------------------------------                                          
SELECT           u.user_id                          buyer_id                   
,to_char(u.join_date, 'yyyy-mm-dd') join_date                                  
,count(o.order_id) FILTER (WHERE to_char(o.order_date, 'yyyy') =               
'2019') orders_in_2019 FROM        users u LEFT JOIN orders o  ON              
u.user_id = o.buyer_id GROUP BY           u.user_id         ,u.join_date       
                                                                               
Plan hash value: 691365108                                                     
                                                                               
------------------------------------------------------------------------------ 
| Id  | Operation           | Name   | Rows  | Bytes | Cost (%CPU)| Time     | 
------------------------------------------------------------------------------ 
|   0 | SELECT STATEMENT    |        |       |       |   151 (100)|          | 
|   1 |  HASH GROUP BY      |        |   111K|  6183K|   151   (3)| 00:00:01 | 
|*  2 |   HASH JOIN OUTER   |        |   111K|  6183K|   148   (1)| 00:00:01 | 
|   3 |    TABLE ACCESS FULL| USERS  | 10000 |   214K|    11   (0)| 00:00:01 | 
|   4 |    TABLE ACCESS FULL| ORDERS |   110K|  3782K|   137   (1)| 00:00:01 | 
------------------------------------------------------------------------------ 
                                                                               
Predicate Information (identified by operation id):                            
---------------------------------------------------                            
                                                                               
   2 - access("U"."USER_ID"="O"."BUYER_ID")                                    
                                                                               
Note                                                                           
-----                                                                          
   - dynamic statistics used: dynamic sampling (level=2)                       
                                                                               

29 rows selected. 

PLAN 3 (CASE)

SQL> SELECT * FROM dbms_xplan.display_cursor('ahjt2c1na9bs3');

PLAN_TABLE_OUTPUT                                                              
______________________________________________________________________________ 
SQL_ID  ahjt2c1na9bs3, child number 0                                          
-------------------------------------                                          
SELECT           u.user_id                          buyer_id                   
,to_char(u.join_date, 'yyyy-mm-dd') join_date         ,count( case when        
                   to_char(o.order_date, 'yyyy') = '2019' then                 
o.order_id else null end              )       orders_in_2019 FROM              
 users u LEFT JOIN orders o  ON u.user_id = o.buyer_id GROUP BY                
  u.user_id         ,u.join_date                                               
                                                                               
Plan hash value: 691365108                                                     
                                                                               
------------------------------------------------------------------------------ 
| Id  | Operation           | Name   | Rows  | Bytes | Cost (%CPU)| Time     | 
------------------------------------------------------------------------------ 
|   0 | SELECT STATEMENT    |        |       |       |   151 (100)|          | 
|   1 |  HASH GROUP BY      |        |   111K|  6183K|   151   (3)| 00:00:01 | 
|*  2 |   HASH JOIN OUTER   |        |   111K|  6183K|   148   (1)| 00:00:01 | 
|   3 |    TABLE ACCESS FULL| USERS  | 10000 |   214K|    11   (0)| 00:00:01 | 
|   4 |    TABLE ACCESS FULL| ORDERS |   110K|  3782K|   137   (1)| 00:00:01 | 
------------------------------------------------------------------------------ 
                                                                               
Predicate Information (identified by operation id):                            
---------------------------------------------------                            
                                                                               
   2 - access("U"."USER_ID"="O"."BUYER_ID")                                    
                                                                               
Note                                                                           
-----                                                                          
   - dynamic statistics used: dynamic sampling (level=2)                       
                                                                               

30 rows selected. 

SQL_ID: 10ay7p3rq6m7g

Results

For this specific scenario, the Subquery actually performs best as the subquery filters rows before joining, thus processing about 55292 from the orders table rows as compared to 110K rows as the other 2 plans. I’m sure there are lots of other scenarios where the Filter clause would result in a better plan.

Can be used in window functions too!

Remember that in Oracle Database 26ai, the qualify clause can be used to filter the results of window functions. The filter clause can be used to get conditional running totals or moving averages right inside the window prior to filtering on the results.

For example

Display all users, and the number of “lgs” ordered where the number is more than 2500

SQL> SELECT 
            u.user_id
            ,count(o.order_id) FILTER (WHERE i.item_id = 3) OVER ( order by o.order_date ) orders
    FROM  
          items i 
    INNER JOIN orders o ON i.item_id = o.item_id
    INNER JOIN users u ON  o.order_id = u.user_id
    QUALIFY  orders > 2500
  ;

USER_ID ORDERS 
_______ ______ 
   7350   2501 
   9993   2501 
   4549   2502 
   4283   2502 
   4332   2503 
   6873   2503 
    506   2503 
   7722   2503 
   3232   2503 
   9795   2504 
   6417   2504 
   3423   2504 
   2930   2505 
    808   2505 
   8277   2506 
   1050   2506 
   7457   2506 
92 rows selected. 

Not the greatest logical example, but focus on the syntax, will ya!

CONCLUSION

The introduction of the FILTER clause marks a significant step forward in making Oracle SQL more readable, standard-compliant, and expressive. By moving conditional logic directly into your aggregate functions, you can ditch the cluttered CASE statements of the past in favor of a cleaner, more intuitive syntax. Whether you are building complex financial reports or performing high-speed analytics in an AI-driven environment, this feature ensures your queries remain as streamlined as your data.

References:

oracle docs
Oracle Blogs

Leave a Comment

Your email address will not be published. Required fields are marked *