SQL Interview Prep - Day 2

Find new and Repeat Customers

The Problem Statement

Imagine a scenario where you manage customer data for an e-commerce platform. Your goal is to determine:

  1. How many customers are purchasing for the first time (new customers)?

  2. How many customers are returning to make additional purchases (repeat customers)?

Concepts that will be covered while solving this problem:

  • Approach to complex query step by step.

  • Usage of CASE WHEN with SUM

  • Usage of common table expression (CTE)

Building Table

CREATE TABLE Query:

create table customer_orders (
order_id integer,
customer_id integer,
order_date date,
order_amount integer
);

Insert data into the table cutomer_orders:

insert into customer_orders values
(1,100,cast('2022-01-01' as date),2000),
(2,200,cast('2022-01-01' as date),2500),
(3,300,cast('2022-01-01' as date),2100),
(4,100,cast('2022-01-02' as date),2000),
(5,400,cast('2022-01-02' as date),2200),
(6,500,cast('2022-01-02' as date),2700),
(7,100,cast('2022-01-03' as date),3000),
(8,400,cast('2022-01-03' as date),1000),
(9,600,cast('2022-01-03' as date),3000);

Final INPUT Table:

Expected Output:

Solution:

Approach

  • If we know when a customer placed their first order, we can identify the earliest order date. We can comparing this first order date with subsequent order dates, we can ascertain whether the customer has placed previous orders or not. This can be achieved effectively using the aggregation function MIN(order_date) along with GROUP BY.

  • Now we need to check, if the order_date = first_visit_date then the customer arrived is for the first time. If the order_date ≠ first_visit_date then the cutomer arrived is an old customer.

  • In order to do that we will join the query used in the first step as a CTE and join back with the original table customer_orders ON customer_id

  • Use CASE statements in order to compare the order_date with the first_visit_date obtained from step 1

  • Aggregate the values by counting the new and repeat customers

Query using Two Comman Table Expression(CTE)

WITH 
first_visit  as 
(select customer_id, MIN(order_date) as first_visit_date 
from customer_orders
group by customer_id),

visit_flag as(
select c.*,f.first_visit_date, 
CASE WHEN c.order_date=f.first_visit_date then 1 else 0 end as first_visit_flag,
CASE WHEN c.order_date!=f.first_visit_date then 1 else 0 end as repeat_visit_flag
from customer_orders c JOIN first_visit f
ON c.customer_id = f.customer_id)

select order_date, 
sum(first_visit_flag) as no_of_new_customers, 
sum(repeat_visit_flag) as no_of_repeat_customers
from visit_flag
group by order_date;

Query using One Comman Table Expression(CTE)

with 
first_visit as
(select customer_id, min(order_date) as first_visit_date from customer_orders
group by customer_id)

select co.order_date, 
SUM(CASE WHEN co.order_date=fv.first_visit_date then 1 else 0 end) as no_of_new_customers,
SUM(CASE WHEN co.order_date!=fv.first_visit_date then 1 else 0 end) as no_of_repeat_customers 
from customer_orders co JOIN first_visit fv
ON co.customer_id = fv.customer_id
group by co.order_date;

Adding further usecase to the existing problem statement

Also find the amount generated by the new customers and from the repeat customers

Query for the above usecase:

with first_visit as
(select customer_id, min(order_date) as first_visit_date from customer_orders
group by customer_id)

select co.order_date, 
SUM(CASE WHEN co.order_date=fv.first_visit_date then 1 else 0 end) as no_of_new_customers,
SUM(CASE WHEN co.order_date!=fv.first_visit_date then 1 else 0 end) as no_of_repeat_customers, 
SUM(CASE WHEN co.order_date=fv.first_visit_date then co.order_amount else 0 end) as new_customers_amount,
SUM(CASE WHEN co.order_date!=fv.first_visit_date then co.order_amount else 0 end) as repeat_customers_amount
from customer_orders co JOIN first_visit fv
ON co.customer_id = fv.customer_id
group by co.order_date;

Output:

The key to SQL mastery lies in consistent practice and curiosity. As you dive deeper into SQL, challenge yourself with complex queries and optimize for efficiency. Stay tuned for Day 3 of SQL Interview Prep, where we’ll tackle even more exciting problems.

Happy querying! 🚀

Content reference: https://www.youtube.com/watch?v=MpAMjtvarrc&list=PLBTZqjSKn0IeKBQDjLmzisazhqQy4iGkb&index=2&ab_channel=AnkitBansal