Analysing Data for a Hotel Group

Analysing Data for a Hotel Group

Background:

The Hotel Group seeks to use intricate details of bookings, guest demographics, distribution channels, and financial metrics to extract meaningful insights that will inform operational improvements, increase revenue, and contribute to the group's overall success in delivering unparalleled hospitality.

The Business Questions to be answered are as follows;

Booking Patterns:

  • What is the trend in booking patterns over time, and are there specific seasons or months with increased booking activity?

  • How does lead time vary across different booking channels, and is there a correlation between lead time and customer type?

Customer Behavior Analysis:

  • Which distribution channels contribute the most to bookings, and how does the average daily rate (ADR) differ across these channels?

  • Can we identify any patterns in the distribution of guests based on their country of origin, and how does this impact revenue?

Cancellation Analysis:

  • How does the revenue loss from cancellations compare across different customer segments and distribution channels?

Revenue Optimization:

  • What is the overall revenue trend, and are specific customer segments or countries contributing significantly?

  • Can we identify optimal pricing strategies based on the Average Daily Rate (ADR) for different customer types and distribution channels?

Geographical Analysis:

  • How does the distribution of guests vary across different countries, and are there specific countries that should be targeted for marketing efforts?

  • Is there a correlation between the country of origin and the likelihood of cancellations?

Operational Efficiency:

  • What is the average length of stay for guests, and how does it differ based on booking channels or customer types?

Impact of Deposit Types:

  • How does the presence or absence of a deposit impact the likelihood of cancellations and revenue generation?

  • Can we identify any patterns in deposit types across different customer segments?

Analysis of Corporate Bookings:

  • What is the proportion of corporate bookings, and how does their Average Daily Rate (ADR) compare to other customer types?

Time-to-Event Analysis:

  • How does the time between booking and arrival date (lead time) affect revenue and the likelihood of cancellations?

  • Are specific lead time ranges associated with higher customer satisfaction or revenue?

Comparison of Online and Offline Travel Agents:

  • What is the revenue contribution of online travel agents compared to offline travel agents?

  • How do cancellation rates and revenue vary between bookings made through online and offline travel agents?

About the Dataset

The data used in this project is a .csv file containing 17 columns and 119, 390 rows of data, ranging from booking dates to financial metrics, and can all be found here.

Data Cleaning

Before diving into the data, I had to check for errors and inconsistencies that may skew my analysis.

To wit, I found some empty rows in the 'Country' column:

SELECT DISTINCT
    country
FROM
    kachi.booking_data

There were 488 Null rows in the Country column, which were removed.

Data Exploration

To answer the business questions, we thoroughly analyzed the dataset. Below are the key steps and SQL queries used to explore the data and derive insights.

PS: The queries were executed using MySQL RDBMS.

Booking Patterns:

  • What is the trend in booking patterns over time, and are there specific seasons or months with increased booking activity?
-- number of bookings by months
SELECT 
    COUNT(booking_id) AS no_of_bookings,
    CASE
        WHEN EXTRACT(MONTH FROM booking_date) = 1 THEN 'January'
        WHEN EXTRACT(MONTH FROM booking_date) = 2 THEN 'February'
        WHEN EXTRACT(MONTH FROM booking_date) = 3 THEN 'March'
        WHEN EXTRACT(MONTH FROM booking_date) = 4 THEN 'April'
        WHEN EXTRACT(MONTH FROM booking_date) = 5 THEN 'May'
        WHEN EXTRACT(MONTH FROM booking_date) = 6 THEN 'June'
        WHEN EXTRACT(MONTH FROM booking_date) = 7 THEN 'July'
        WHEN EXTRACT(MONTH FROM booking_date) = 8 THEN 'August'
        WHEN EXTRACT(MONTH FROM booking_date) = 9 THEN 'September'
        WHEN EXTRACT(MONTH FROM booking_date) = 10 THEN 'October'
        WHEN EXTRACT(MONTH FROM booking_date) = 11 THEN 'November'
        WHEN EXTRACT(MONTH FROM booking_date) = 12 THEN 'December'
    END AS Month
FROM
    kachi.booking_data
GROUP BY Month
ORDER BY no_of_bookings DESC;

The query result shows that January and February have the highest bookings, with 16,611 and 13,402 bookings, respectively.

  • How does lead time vary across different booking channels, and is there a correlation between lead time and customer type?
-- average lead time by booking channels
SELECT 
    ROUND(AVG(lead_time)), 
    distribution_channel
FROM
    kachi.booking_data
GROUP BY distribution_channel
ORDER BY AVG(lead_time) DESC;

-- correlation between lead time and customer type
SELECT 
 ROUND(AVG(lead_time)), customer_type
FROM 
 kachi.booking_data
GROUP BY customer_type;

In this multipart query, the first execution shows us that the Offline Distribution Agent has, on average, the highest lead time compared to other distribution channels.

The second query shows that bookings from that Contract customer type are more likely to be cancelled than any other.

Customer Behaviour Analysis:

  • Which distribution channels contribute the most to bookings, and how does the average daily rate (ADR) differ across these channels?

  •   -- number of bookings by distribution channels 
      SELECT 
       COUNT(booking_id), distribution_channel
      FROM 
       kachi.booking_data
      GROUP BY distribution_channel
      ORDER BY COUNT(Booking_id) DESC;
    
      -- average daily rate accross different channels
      SELECT 
       distribution_channel,
       CONCAT('$', ROUND(AVG(CAST(REPLACE(avg_daily_rate, '$', '') AS DECIMAL(10, 2))))) AS avg_daily_rate
      FROM 
       kachi.booking_data
      GROUP BY distribution_channel;
    

The first query's result pinpointed that the Online Travel Agent has three (3) times the number of bookings compared to other distribution channels.

The second query shows that the average daily rate from Online Travel Agents was the highest, with the Direct distribution channel coming in at a close second.

  • Can we identify any patterns in the distribution of guests based on their country of origin, and how does this impact revenue?
-- distribution of guests by country and revenue impact

SELECT
 country,
 COUNT(guests) AS no_of_guests,
    CONCAT('$', ROUND(SUM(CAST(REPLACE(Revenue, '$', '') AS DECIMAL (10, 2))), 2)) AS revenue
FROM kachi.booking_data
GROUP BY country
ORDER BY COUNT(guests) DESC;

This shows us that Portugal has the highest number of guests, four times the number of guests in the United Kingdom, which comes in second. They also have two times more revenue than The United Kingdom.

  • How does the revenue loss from cancellations compare across different customer segments and distribution channels?
-- revenue loss by customer segment and distribution channels 

SELECT 
    customer_type,
    distribution_channel,
    CONCAT('$',
            ROUND(SUM(CAST(REPLACE(revenue_loss, '$', '') AS DECIMAL (10 , 2 ))),
                    2)) AS avg_revenue_loss
FROM
    kachi.booking_data
GROUP BY customer_type , Distribution_Channel WITH CUBE

From this, we find that the highest revenue loss from the Customer Segment is the Transient Customer Type with $8,049,772 and Online Agent from the Distribution Channel with $7856694.

Revenue Optimization:

  • What is the overall revenue trend, and are there specific customer segments or countries contributing significantly to revenue?
-- Revenue by country 
SELECT 
    country,
    CONCAT('$',
            ROUND(SUM(CAST(REPLACE(revenue, '$', '') AS DECIMAL (10 , 2 ))),
                    2)) AS revenue_sum
FROM
    kachi.booking_data
GROUP BY country;


--Revenue by customer segment
SELECT 
    Customer_Type,
    CONCAT('$',
            ROUND(SUM(CAST(REPLACE(revenue, '$', '') AS DECIMAL (10 , 2 ))),
                    2)) AS revenue_sum
FROM
    kachi.booking_data
GROUP BY customer_type;

The results show that Portugal makes the highest revenue contribution, with $7,461,768, followed by the UK, which makes about half that, $3,336,540.

Among the customer segments, the Transient type contributes the most to revenue, with $18,478,219.

  • Can we identify optimal pricing strategies based on the Average Daily Rate (ADR) for different customer types and distribution channels?
SELECT 
 customer_type,
    CONCAT('$',
            ROUND(AVG(CAST(REPLACE(avg_daily_rate, '$', '') AS DECIMAL (10 , 2 ))),
                    2)) AS adr
FROM
    kachi.booking_data
GROUP BY customer_type;



SELECT 
 distribution_channel,
    CONCAT('$',
            ROUND(AVG(CAST(REPLACE(avg_daily_rate, '$', '') AS DECIMAL (10 , 2 ))),
                    2)) AS adr
FROM
    kachi.booking_data
GROUP BY distribution_channel;

These queries show the optimal pricing strategies for customer segments and distribution channels by finding each group's average ADR (Average Daily Rate).

Geographical Analysis:

  • How does the distribution of guests vary across different countries, and are there specific countries that should be targeted for marketing efforts?
SELECT 
    country, COUNT(guests) AS no_of_guests
FROM
    kachi.booking_data
GROUP BY country
ORDER BY no_of_guests DESC;

The results show that Portugal has the highest number of guests, with 48,590, with the UK coming in second at 12,129.

  • Is there a correlation between the country of origin and the likelihood of cancellations?
SELECT DISTINCT
    (country),
    SUM(CASE WHEN Cancelled = 1 THEN 1 ELSE 0 END) AS cancelled_count,
    COUNT(*) AS total_bookings,
    CONCAT(ROUND((SUM(CASE WHEN Cancelled = 1 THEN 1 ELSE 0 END) / COUNT(*)) * 100, 2), '%') AS percentage_cancelled
FROM
    `kachi`.`booking_data`
GROUP BY country

This query shows us that bookings from Portugal are most likely to be canceled, 56%, relative to their number of bookings (48,590).

Operational Efficiency:

  • What is the average length of stay for guests, and how does it differ based on booking channels or customer types?
SELECT 
 customer_type,
 ROUND(AVG(DATEDIFF(status_update, arrival_date))) AS length_of_stay
FROM kachi.booking_data
WHERE Status_Update > arrival_date AND cancelled = 0
GROUP BY customer_type;


SELECT 
 distribution_channel,
 ROUND(AVG(DATEDIFF(status_update, arrival_date))) AS length_of_stay
FROM kachi.booking_data
WHERE Status_Update > arrival_date AND cancelled = 0
GROUP BY distribution_channel;

The query extracts the length of stay from the 'status_update' and the 'arrival_date' columns using the DATEDIFF function and groups them by customer type and then by distribution channel.

Impact of Deposit Types:

  • How does the presence or absence of a deposit impact the likelihood of cancellations and revenue generation?
SELECT 
    deposit_type, 
    CONCAT('$',
            ROUND(SUM(CAST(REPLACE(revenue_loss, '$', '') AS DECIMAL (10 , 2 ))),
                    2)) AS revenue_loss,
    (SUM(cancelled) / (SELECT SUM(cancelled) FROM kachi.booking_data)) * 100 AS percentage_cancellations
FROM
    kachi.booking_data
GROUP BY deposit_type;

This query finds the revenue loss and the percentage of cancellations based on deposit type. It shows that bookings with no deposit have a 67% cancellation rate and the highest revenue loss of $9,460,331. Therefore, bookings with no deposit are the most likely to impact cancellation and revenue generation.

  • Can we identify any patterns in the use of deposit types across different customer segments?
SELECT 
    customer_type,
    deposit_type,
    COUNT(*) AS num_bookings,
    CONCAT(ROUND((COUNT(*) / SUM(COUNT(*)) OVER (PARTITION BY customer_type)) * 100, 2),'%') AS percentage_of_customer_type
FROM 
    kachi.booking_data
GROUP BY 
    customer_type, deposit_type;

The query displays the percentage of each deposit type used by each customer segment.

Analysis of Corporate Bookings:

  • What is the proportion of corporate bookings, and how does their Average Daily Rate (ADR) compare to other customer types?
WITH booking_stats AS (
    SELECT 
        distribution_channel,
        COUNT(*) AS total_bookings,
        AVG(CAST(REPLACE(avg_daily_rate, '$', '') AS DECIMAL)) AS avg_daily_rate
    FROM 
        kachi.booking_data
    GROUP BY 
        distribution_channel
)
SELECT 
    distribution_channel,
    total_bookings,
    avg_daily_rate,
   CONCAT( (total_bookings / SUM(total_bookings) OVER ()) * 100, '%') AS booking_proportion
FROM 
    booking_stats;

This query calculates the proportion of bookings for each distribution channel and shows that Corporate Bookings take just 5% of total bookings. Their Average Daily Rate is also the second lowest at $70.

Time-to-Event Analysis:

  • How does the time between booking and arrival date (lead time) affect revenue and the likelihood of cancellations?

    SELECT
        CASE
            WHEN lead_time <= 7 THEN 'Short_Lead_Time'
            WHEN lead_time > 7 AND lead_time <= 30 THEN 'Medium_Lead_Time'
            ELSE 'Long_Lead_Time'
        END AS lead_time_range,
        CONCAT('$', ROUND(AVG(CAST(REPLACE(revenue, '$', '') AS DECIMAL)), 2)) 
                AS avg_revenue,
  COUNT(*) AS no_of_bookings,
        SUM(cancelled) AS no_cancelled
    FROM
        kachi.booking_data
 GROUP BY lead_time_range
    ORDER BY avg_revenue;

This query categorizes the different lead times into short, medium, and long lead times and shows how they affect revenue. The Long Lead Time range, on average, has the highest revenue contribution, with $219, but also the highest revenue loss, on average, with $95.

  • Are there specific lead time ranges that are associated with higher customer satisfaction or revenue?

Per the previous query, using revenue as a proxy for customer satisfaction, we can assume that the Long Lead Time range is associated with the highest customer satisfaction.

Comparison of Online and Offline Travel Agents:

  • What is the revenue contribution of online travel agents compared to offline travel agents?
SELECT 
 distribution_channel,
    CONCAT('$', ROUND(SUM(CAST(REPLACE(revenue, '$', '') AS DECIMAL)), 2)) AS total_revenue,
 CONCAT(ROUND(SUM(CAST(REPLACE(revenue, '$', '') AS DECIMAL)) * 100 / 
           (SELECT SUM(CAST(REPLACE(revenue, '$', '') AS DECIMAL)) FROM kachi.booking_data), 2), '%') AS percentage_revenue
FROM 
 kachi.booking_data
GROUP BY 
 distribution_channel
order by 
 percentage_revenue DESC;

This query returns the percentage revenue contribution of each distribution channel and shows that the Online Travel Agent contributes almost 60% to revenue, while the Offline Travel Agent contributes 24%.

  • How do cancellation rates and revenue vary between bookings made through online and offline travel agents?
SELECT distribution_channel,
  SUM(cancelled) AS total_cancelled,
        CONCAT(SUM(cancelled) * 100 / 
           (SELECT SUM(cancelled) FROM kachi.booking_data), '%') AS cancelled_percentage
FROM
    kachi.booking_data
GROUP BY distribution_channel
ORDER BY total_revenue DESC

The query shows us that the Online Travel Agent contributes $14,756,257 with a cancellation rate of 72%, while the Offline Travel Agent contributes $5,943,018 with a cancellation rate of 18%.

You can interact with a dashboard for this project here.

INSIGHTS

  1. Booking patterns show that holidays are the peak booking period, as the months with the most bookings are January and February.

  2. Portugal has the highest number of bookings and also the highest revenue.

  3. Lead time varies across distribution channels, with Travel Agents (Online and Offline) leading the pack, and Contract Customers exhibiting the highest average lead time.

  4. The Online Travel Agent channel provides 62% of bookings as well as the highest Average Daily Rate.

  5. While contributing significantly to bookings, Transient Customers also recorded the highest revenue loss.

RECOMMENDATION

  1. Offer special promotions, discounts, or packages specifically targeted at non-festive periods to incentivize bookings. This could include discounted room rates, complimentary upgrades, etc.

  2. Explore targeted marketing initiatives to further boost bookings from lucrative regions eg. The United Kingdom.

  3. Tailor services to the preferences of high-lead time segments, such as
    Travel Agents, to maximize revenue and minimize cancellations.

  4. Implement flexible booking policies that provide guests with peace of mind and confidence when making reservations during uncertain times. Offer options for free cancellations or changes to accommodate changing travel plans.

  5. Forge partnerships with corporate clients or companies in related industries to secure group bookings or corporate retreats during off-peak times. Offer discounted rates or customized packages for corporate clients to encourage repeat business.

CONCLUSION

This analysis aided in uncovering insights about the booking pattern and would help inform operational improvements, increase revenue, and contribute to the group’s overall success in delivering unparalleled hospitality.

The success of these marketing strategies should be tracked through data analytics and adapted based on ongoing insights and feedback from stakeholders.

RELEVANT LINKS

Dashboard
LinkedIn