BACKGROUND
Consider this problem from LeetCode: leetcode_db_1158, about figuring out for each user, the number of orders they made in 2019.
The code below can create the sample data set needed for the question.
Create table If Not Exists Users (user_id NUMBER, join_date date, favorite_brand varchar2(10)) ;
Create table If Not Exists Orders (order_id NUMBER, order_date date, item_id NUMBER, buyer_id NUMBER, seller_id NUMBER);
Create table If Not Exists Items (item_id NUMBER, item_brand varchar2(10));
Truncate table Users;
insert into Users (user_id, join_date, favorite_brand) values (1, date'2018-01-01', 'Lenovo');
insert into Users (user_id, join_date, favorite_brand) values (2, date'2018-02-09', 'Samsung');
insert into Users (user_id, join_date, favorite_brand) values (3, date'2018-01-19', 'LG');
insert into Users (user_id, join_date, favorite_brand) values (4, date'2018-05-21', 'HP');
Truncate table Orders;
insert into Orders (order_id, order_date, item_id, buyer_id, seller_id) values (1, date'2019-08-01', 4, 1, 2);
insert into Orders (order_id, order_date, item_id, buyer_id, seller_id) values (2, date'2018-08-02', 2, 1, 3);
insert into Orders (order_id, order_date, item_id, buyer_id, seller_id) values (3, date'2019-08-03', 3, 2, 3);
insert into Orders (order_id, order_date, item_id, buyer_id, seller_id) values (4, date'2018-08-04', 1, 4, 2);
insert into Orders (order_id, order_date, item_id, buyer_id, seller_id) values (5, date'2018-08-04', 1, 3, 4);
insert into Orders (order_id, order_date, item_id, buyer_id, seller_id) values (6, date'2019-08-05', 2, 2, 4);
Truncate table Items;
insert into Items (item_id, item_brand) values (1, 'Samsung');
insert into Items (item_id, item_brand) values (2, 'Lenovo');
insert into Items (item_id, item_brand) values (3, 'LG');
insert into Items (item_id, item_brand) values (4, 'HP');
SQL> SELECT * FROM ITEMS;
ITEM_ID ITEM_BRAND
_______ __________
1 Samsung
2 Lenovo
3 LG
4 HP
4 rows selected.
SQL>
SQL>
SQL>
SQL> SELECT * FROM ORDERS;
ORDER_ID ORDER_DATE ITEM_ID BUYER_ID SELLER_ID
________ __________ _______ ________ _________
1 01-AUG-19 4 1 2
2 02-AUG-18 2 1 3
3 03-AUG-19 3 2 3
4 04-AUG-18 1 4 2
5 04-AUG-18 1 3 4
6 05-AUG-19 2 2 4
6 rows selected.
SQL>
SQL>
SQL> SELECT * FROM USERS;
USER_ID JOIN_DATE FAVORITE_BRAND
_______ _________ ______________
1 01-JAN-18 Lenovo
2 09-FEB-18 Samsung
3 19-JAN-18 LG
4 21-MAY-18 HP
4 rows selected. Here’s an Oracle SQL solution to this problem
SELECT
u.user_id buyer_id
,to_char(u.join_date, 'yyyy-mm-dd') join_date
,count(o.order_id) orders_in_2019
FROM
users u
LEFT JOIN (SELECT
order_id
,buyer_id
,order_date
FROM orders
WHERE to_char(order_date, 'yyyy') = '2019'
) o ON u.user_id = o.buyer_id
GROUP BY
u.user_id
,u.join_date
;
BUYER_ID JOIN_DATE ORDERS_IN_2019
________ __________ ______________
1 2018-01-01 1
2 2018-02-09 2
4 2018-05-21 0
3 2018-01-19 0
4 rows selected. Notice the left join to a subquery highlighted above!
This is done to join the users table solely to a subset of the orders table for orders completed in the year 2019. This is the solution I arrived at, at my first shot, without thinking about it way too much.
The code below can also be used.
SELECT
u.user_id buyer_id
,to_char(u.join_date, 'yyyy-mm-dd') join_date
,count( case when
to_char(o.order_date, 'yyyy') = '2019' then o.order_id else null end
) orders_in_2019
FROM
users u
LEFT JOIN orders o ON u.user_id = o.buyer_id
GROUP BY
u.user_id
,u.join_date
;
BUYER_ID JOIN_DATE ORDERS_IN_2019
________ __________ ______________
1 2018-01-01 1
2 2018-02-09 2
4 2018-05-21 0
3 2018-01-19 0
4 rows selected. The case statement highlighted works to the same effect! Explained above (link to text)
NB: a query like ;
SELECT
u.user_id buyer_id
,to_char(u.join_date, 'yyyy-mm-dd') join_date
,count(o.order_id)orders_in_2019
FROM
users u
LEFT JOIN orders o ON u.user_id = o.buyer_id
WHERE to_char(o.order_date, 'yyyy') = '2019'
GROUP BY
u.user_id
,u.join_date
;
BUYER_ID JOIN_DATE ORDERS_IN_2019
________ __________ ______________
1 2018-01-01 1
2 2018-02-09 2
2 rows selected. Fails (on LeetCode at least) as the where clause does not include any row for users 3 and 4, as they have no orders in 2019. The aim is to include all users. So we want to join to a table with only 2019 orders.
In Oracle 26ai (23.26.1.0.0), there’s an easier way I would have written up this solution using the filter clause. This post explains this feature and shows an example of using it.
NB: This feature is available from Oracle Database 23.26.1.0.0. If you try it on 23.26.0.0.0, it will fail.
INTRODUCTION
Aggregation filters allow developers to restrict the rows considered in aggregate functions such as SUM, COUNT, and AVG. This provides finer control over the data being aggregated without requiring additional subqueries.
Syntax:
aggregate_function ( aggregate_function_arguments ) [ FILTER ( WHERE condition )]- aggregate_function: Any supported aggregate function (For example, SUM, COUNT, AVG, MAX, MIN).
- aggregate_function_arguments: The column or expression to be aggregated.
- condition: A Boolean predicate that determines which rows are included in the aggregation
USAGE
Example
Consider the example above, this query can be rewritten in Oracle 26ai (23.26.1.0.0)as
SELECT
u.user_id buyer_id
,to_char(u.join_date, 'yyyy-mm-dd') join_date
,count(o.order_id) FILTER (WHERE to_char(o.order_date, 'yyyy') = '2019') orders_in_2019
FROM
users u
LEFT JOIN orders o ON u.user_id = o.buyer_id
GROUP BY
u.user_id
,u.join_date
;
BUYER_ID JOIN_DATE ORDERS_IN_2019
________ __________ ______________
1 2018-01-01 1
2 2018-02-09 2
4 2018-05-21 0
3 2018-01-19 0
4 rows selected.This gives a simple way to compute conditional totals as it does not require writing a case statement or using a subquery, as in the earlier code examples used to arrive at the same solution.
Performance Comparison
I wanted to compare the performance of these queries and see which one produced a better plan/ executed at a lower cost to the database. In order to test this on my machine, I decided to add a lot more rows to the tables.
TRUNCATE TABLE Orders;
TRUNCATE TABLE Users;
TRUNCATE TABLE Items;
-- Populate 10,000 Users
INSERT INTO Users (user_id, join_date, favorite_brand)
SELECT
LEVEL,
DATE '2017-01-01' + DBMS_RANDOM.VALUE(0, 1000),
CASE MOD(LEVEL, 4)
WHEN 0 THEN 'Lenovo' WHEN 1 THEN 'Samsung'
WHEN 2 THEN 'LG' ELSE 'HP' END
FROM DUAL
CONNECT BY LEVEL <= 10000;
-- Populate Items
INSERT INTO Items (item_id, item_brand) VALUES (1, 'Samsung');
INSERT INTO Items (item_id, item_brand) VALUES (2, 'Lenovo');
INSERT INTO Items (item_id, item_brand) VALUES (3, 'LG');
INSERT INTO Items (item_id, item_brand) VALUES (4, 'HP');
-- Populate 100,000 Orders
INSERT INTO Orders (order_id, order_date, item_id, buyer_id, seller_id)
SELECT
LEVEL,
DATE '2018-01-01' + DBMS_RANDOM.VALUE(0, 730), -- Dates spanning 2018-2019
TRUNC(DBMS_RANDOM.VALUE(1, 5)), -- Random item_id 1-4
TRUNC(DBMS_RANDOM.VALUE(1, 10001)), -- Random buyer_id 1-10000
TRUNC(DBMS_RANDOM.VALUE(1, 10001)) -- Random seller_id 1-10000
FROM DUAL
CONNECT BY LEVEL <= 100000;
COMMIT;Note that the performance may vary based on your Operating system. I ran this on a virtual box running on my laptop, and I have not added any indexes to the tables!
Below are the results of running the queries ;
SQL> get code_1
1 SET FEEDBACK ON SQL_ID
2 SET TERMOUT OFF
3 SELECT
4 u.user_id buyer_id
5 ,to_char(u.join_date, 'yyyy-mm-dd') join_date
6 ,count(o.order_id) orders_in_2019
7 FROM
8 users u
9 LEFT JOIN (SELECT
10 order_id
11 ,buyer_id
12 ,order_date
13 FROM orders
14 WHERE to_char(order_date, 'yyyy') = '2019'
15 ) o ON u.user_id = o.buyer_id
16 GROUP BY
17 u.user_id
18* ,u.join_date
BUYER_ID JOIN_DATE ORDERS_IN_2019
________ __________ ______________
7024 2019-01-07 4
1018 2018-09-10 11
3552 2017-06-20 8
7674 2017-04-30 5
6370 2017-06-11 5
7979 2018-06-14 8
6248 2018-04-09 6
8808 2017-05-24 6
8576 2019-03-14 6
9741 2018-04-21 8
1195 2019-07-08 6
5292 2019-07-04 5
1250 2019-04-01 10
9241 2017-09-27 5
...
1675 2017-01-01 0
5462 2018-09-04 0
6335 2018-10-05 0
7747 2018-03-30 0
8929 2018-10-31 0
3840 2017-09-01 0
9051 2019-05-17 0
7138 2019-01-27 0
5628 2017-12-16 0
10,000 rows selected.
SQL_ID: 7j1ynyy53w1xz
SQL> get code_2
1 SET FEEDBACK ON SQL_ID
2 SET TERMOUT OFF
3
4 SELECT
5 u.user_id buyer_id
6 ,to_char(u.join_date, 'yyyy-mm-dd') join_date
7 ,count(o.order_id) FILTER (WHERE to_char(o.order_date, 'yyyy') = '2019') orders_in_2019
8 FROM
9 users u
10 LEFT JOIN orders o ON u.user_id = o.buyer_id
11 GROUP BY
12 u.user_id
13* ,u.join_date
SQL> set pause on
SQL> /
BUYER_ID JOIN_DATE ORDERS_IN_2019
________ __________ ______________
8254 2019-01-10 2
7024 2019-01-07 4
1018 2018-09-10 11
3552 2017-06-20 8
7674 2017-04-30 5
3027 2017-12-06 6
6670 2018-05-06 6
6370 2017-06-11 5
4119 2019-01-12 5
4024 2019-05-02 8
...
5540 2018-11-12 5
3887 2019-02-07 2
9051 2019-05-17 0
7407 2018-01-20 3
4180 2019-05-22 3
6156 2017-04-13 2
10,000 rows selected.
SQL_ID: duqdzm2rdzydx
SQL>
SQL> get code_3
1 SET FEEDBACK ON SQL_ID
2 SET TERMOUT OFF
3 SELECT
4 u.user_id buyer_id
5 ,to_char(u.join_date, 'yyyy-mm-dd') join_date
6 ,count( case when
7 to_char(o.order_date, 'yyyy') = '2019' then o.order_id else null end
8 ) orders_in_2019
9 FROM
10 users u
11 LEFT JOIN orders o ON u.user_id = o.buyer_id
12 GROUP BY
13 u.user_id
14* ,u.join_date
BUYER_ID JOIN_DATE ORDERS_IN_2019
________ __________ ______________
2419 2017-10-04 5
5501 2017-12-22 3
8777 2018-03-25 2
7163 2017-10-17 4
7508 2018-11-29 1
3852 2019-03-23 4
5540 2018-11-12 5
3887 2019-02-07 2
9051 2019-05-17 0
7407 2018-01-20 3
4180 2019-05-22 3
6156 2017-04-13 2
...
10,000 rows selected.
SQL_ID: ahjt2c1na9bs3
SQL>
SQL>
I then compared the plans used:
plan 1 (Subquery)
SQL> SELECT * FROM dbms_xplan.display_cursor('7j1ynyy53w1xz');
PLAN_TABLE_OUTPUT
______________________________________________________________________________
SQL_ID 7j1ynyy53w1xz, child number 0
-------------------------------------
SELECT u.user_id buyer_id
,to_char(u.join_date, 'yyyy-mm-dd') join_date
,count(o.order_id) orders_in_2019 FROM users u
LEFT JOIN (SELECT order_id ,buyer_id
,order_date FROM orders WHERE
to_char(order_date, 'yyyy') = '2019' ) o ON u.user_id =
o.buyer_id GROUP BY u.user_id ,u.join_date
Plan hash value: 691365108
------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | 150 (100)| |
| 1 | HASH GROUP BY | | 23 | 1311 | 150 (2)| 00:00:01 |
|* 2 | HASH JOIN OUTER | | 23 | 1311 | 149 (2)| 00:00:01 |
| 3 | TABLE ACCESS FULL| USERS | 4 | 88 | 11 (0)| 00:00:01 |
|* 4 | TABLE ACCESS FULL| ORDERS | 55292 | 1889K| 138 (2)| 00:00:01 |
------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("U"."USER_ID"="BUYER_ID")
4 - filter(TO_CHAR(INTERNAL_FUNCTION("ORDER_DATE"),'yyyy')='2019')
Note
-----
- dynamic statistics used: dynamic sampling (level=2)
32 rows selected.
SQL_ID: 4p9dw7njt8wam
PLAN 2 (FILTER)
SQL> SELECT * FROM dbms_xplan.display_cursor('duqdzm2rdzydx');
PLAN_TABLE_OUTPUT
______________________________________________________________________________
SQL_ID duqdzm2rdzydx, child number 0
-------------------------------------
SELECT u.user_id buyer_id
,to_char(u.join_date, 'yyyy-mm-dd') join_date
,count(o.order_id) FILTER (WHERE to_char(o.order_date, 'yyyy') =
'2019') orders_in_2019 FROM users u LEFT JOIN orders o ON
u.user_id = o.buyer_id GROUP BY u.user_id ,u.join_date
Plan hash value: 691365108
------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | 151 (100)| |
| 1 | HASH GROUP BY | | 111K| 6183K| 151 (3)| 00:00:01 |
|* 2 | HASH JOIN OUTER | | 111K| 6183K| 148 (1)| 00:00:01 |
| 3 | TABLE ACCESS FULL| USERS | 10000 | 214K| 11 (0)| 00:00:01 |
| 4 | TABLE ACCESS FULL| ORDERS | 110K| 3782K| 137 (1)| 00:00:01 |
------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("U"."USER_ID"="O"."BUYER_ID")
Note
-----
- dynamic statistics used: dynamic sampling (level=2)
29 rows selected.
PLAN 3 (CASE)
SQL> SELECT * FROM dbms_xplan.display_cursor('ahjt2c1na9bs3');
PLAN_TABLE_OUTPUT
______________________________________________________________________________
SQL_ID ahjt2c1na9bs3, child number 0
-------------------------------------
SELECT u.user_id buyer_id
,to_char(u.join_date, 'yyyy-mm-dd') join_date ,count( case when
to_char(o.order_date, 'yyyy') = '2019' then
o.order_id else null end ) orders_in_2019 FROM
users u LEFT JOIN orders o ON u.user_id = o.buyer_id GROUP BY
u.user_id ,u.join_date
Plan hash value: 691365108
------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | 151 (100)| |
| 1 | HASH GROUP BY | | 111K| 6183K| 151 (3)| 00:00:01 |
|* 2 | HASH JOIN OUTER | | 111K| 6183K| 148 (1)| 00:00:01 |
| 3 | TABLE ACCESS FULL| USERS | 10000 | 214K| 11 (0)| 00:00:01 |
| 4 | TABLE ACCESS FULL| ORDERS | 110K| 3782K| 137 (1)| 00:00:01 |
------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("U"."USER_ID"="O"."BUYER_ID")
Note
-----
- dynamic statistics used: dynamic sampling (level=2)
30 rows selected.
SQL_ID: 10ay7p3rq6m7g
Results
For this specific scenario, the Subquery actually performs best as the subquery filters rows before joining, thus processing about 55292 from the orders table rows as compared to 110K rows as the other 2 plans. I’m sure there are lots of other scenarios where the Filter clause would result in a better plan.
Can be used in window functions too!
Remember that in Oracle Database 26ai, the qualify clause can be used to filter the results of window functions. The filter clause can be used to get conditional running totals or moving averages right inside the window prior to filtering on the results.
For example
Display all users, and the number of “lgs” ordered where the number is more than 2500
SQL> SELECT
u.user_id
,count(o.order_id) FILTER (WHERE i.item_id = 3) OVER ( order by o.order_date ) orders
FROM
items i
INNER JOIN orders o ON i.item_id = o.item_id
INNER JOIN users u ON o.order_id = u.user_id
QUALIFY orders > 2500
;
USER_ID ORDERS
_______ ______
7350 2501
9993 2501
4549 2502
4283 2502
4332 2503
6873 2503
506 2503
7722 2503
3232 2503
9795 2504
6417 2504
3423 2504
2930 2505
808 2505
8277 2506
1050 2506
7457 2506
92 rows selected.
Not the greatest logical example, but focus on the syntax, will ya!
CONCLUSION
The introduction of the FILTER clause marks a significant step forward in making Oracle SQL more readable, standard-compliant, and expressive. By moving conditional logic directly into your aggregate functions, you can ditch the cluttered CASE statements of the past in favor of a cleaner, more intuitive syntax. Whether you are building complex financial reports or performing high-speed analytics in an AI-driven environment, this feature ensures your queries remain as streamlined as your data.
