LeetCode - SQL 50 - Problem 22

Game Play Analysis IV

Calculating the Fraction of Players Who Logged In Consecutively Using SQL

As user engagement becomes an increasingly important metric for online platforms, understanding how users interact with your application over time can provide valuable insights. In this blog, we’ll walk you through how to calculate the fraction of players who logged in on consecutive days, starting from their first login, using SQL.

Problem Overview

Let’s imagine you are managing a gaming platform that tracks player activities. You have a table, Activity, that logs when a player logs into your platform. The structure of the table is as follows:

Column NameType
player_idint
device_idint
event_datedate
games_playedint
  • The combination of player_id and event_date forms a unique record for each login.

  • The objective is to calculate the fraction of players who logged in the day after their first login.

The result should be the number of such players divided by the total number of players who logged in, and the final output should be rounded to two decimal places.


SQL Schema:

Create table If Not Exists Activity (player_id int, device_id int, event_date date, games_played int)
Truncate table Activity
insert into Activity (player_id, device_id, event_date, games_played) values ('1', '2', '2016-03-01', '5')
insert into Activity (player_id, device_id, event_date, games_played) values ('1', '2', '2016-03-02', '6')
insert into Activity (player_id, device_id, event_date, games_played) values ('2', '3', '2017-06-25', '1')
insert into Activity (player_id, device_id, event_date, games_played) values ('3', '1', '2016-03-02', '0')
insert into Activity (player_id, device_id, event_date, games_played) values ('3', '4', '2018-07-03', '5')

Solution Outline in SQL:

We can approach this with a SQL query that performs the following:

  1. Use MIN to find the first login date for each player.

  2. Use a JOIN or LEFT JOIN to check if the player logged in on the day after their first login date.

  3. Use COUNT to determine how many players meet the criteria (logged in again the day after).

  4. Calculate the fraction and round it to 2 decimal places.

Here is the SQL query that implements this:

WITH FirstLogin AS (
    -- Get the first login date for each player
    SELECT player_id, MIN(event_date) AS first_login_date
    FROM Activity
    GROUP BY player_id
),
ConsecutiveLogin AS (
    -- Get the players who logged in the day after their first login date
    SELECT f.player_id
    FROM FirstLogin f
    JOIN Activity a ON f.player_id = a.player_id
    WHERE a.event_date = DATE_ADD(f.first_login_date, INTERVAL 1 DAY)
)
-- Calculate the fraction
SELECT ROUND(COUNT(DISTINCT c.player_id) / COUNT(DISTINCT a.player_id), 2) AS fraction
FROM Activity a
LEFT JOIN ConsecutiveLogin c ON a.player_id = c.player_id;

SQL Solution Breakdown

Let’s dive into the solution step-by-step.

1. Identify the First Login Date for Each Player

The first step is to find out when each player logged in for the first time. We can do this using SQL's MIN() function, which will return the earliest login date for each player.

Here’s how we can write this query:

WITH FirstLogin AS (
    SELECT player_id, MIN(event_date) AS first_login_date
    FROM Activity
    GROUP BY player_id
)

Explanation:

  • MIN(event_date) retrieves the earliest date on which each player logged in.

  • GROUP BY player_id groups the data by player so that we get the first login for each one.

For example, if Player 1 first logged in on 2016-03-01, Player 2 on 2017-06-25, and Player 3 on 2016-03-02, the result of the FirstLogin CTE would look like this:

player_idfirst_login_date
12016-03-01
22017-06-25
32016-03-02

2. Find Players Who Logged In on the Day After Their First Login

Now, for each player, we need to check if they logged in the day after their first login. To do this, we can join the FirstLogin CTE with the Activity table and check if the player's event_date is one day after their first login date.

Here’s how you can do that:

ConsecutiveLogin AS (
    SELECT f.player_id
    FROM FirstLogin f
    JOIN Activity a ON f.player_id = a.player_id
    WHERE a.event_date = DATE_ADD(f.first_login_date, INTERVAL 1 DAY)
)

Explanation:

  • DATE_ADD(f.first_login_date, INTERVAL 1 DAY) adds one day to the player's first login date.

  • We then check if the player logged in on that day by matching a.event_date with the date one day after their first login.

The ConsecutiveLogin CTE will contain only those players who logged in on the day after their first login.

3. Calculate the Fraction of Players Who Logged In Consecutively

Now that we know which players logged in consecutively, we can calculate the fraction of players who logged in consecutively. To do this:

  • Count the number of distinct players in ConsecutiveLogin (the players who logged in on consecutive days).

  • Count the total number of distinct players in the Activity table (the total number of players who logged in at least once).

Here’s how you can calculate this fraction:

SELECT ROUND(COUNT(DISTINCT c.player_id) / COUNT(DISTINCT a.player_id), 2) AS fraction
FROM Activity a
LEFT JOIN ConsecutiveLogin c ON a.player_id = c.player_id;

Explanation:

  • COUNT(DISTINCT c.player_id) counts the number of distinct players who logged in consecutively (i.e., players in ConsecutiveLogin).

  • COUNT(DISTINCT a.player_id) counts all distinct players who logged in at least once.

  • The LEFT JOIN ensures we count all players, even those who didn't log in consecutively, which is essential for the denominator.

The ROUND() function rounds the result to two decimal places as required.


Example Walkthrough

Let’s consider an example to understand the query better.

Given the following Activity table:

player_iddevice_idevent_dategames_played
122016-03-015
122016-03-026
232017-06-251
312016-03-020
342018-07-035
  1. FirstLogin CTE:

    • Player 1's first login is 2016-03-01.

    • Player 2's first login is 2017-06-25.

    • Player 3's first login is 2016-03-02.

  2. ConsecutiveLogin CTE:

    • Player 1 logged in again on 2016-03-02, which is the day after their first login on 2016-03-01. So Player 1 is included in ConsecutiveLogin.

    • Player 2 didn’t log in the next day, so they aren’t included.

    • Player 3 didn’t log in the next day, so they aren’t included either.

  3. Final Calculation:

    • Only Player 1 logged in consecutively.

    • There are 3 distinct players in total.

The fraction is:

1 / 3 = 0.3331​

So, the result is 0.33, meaning 33% of the players logged in the day after their first login.


Conclusion

In this article, we’ve walked through how to calculate the fraction of players who logged in the day after their first login. By using SQL's WITH clauses (CTEs), JOIN operations, and date functions like DATE_ADD, we can efficiently solve this problem and extract valuable insights about user behavior.

This method is highly applicable not just for gaming platforms, but for any application that tracks user activity and aims to measure engagement over time.

Happy querying, and feel free to reach out with any questions or feedback!