Background:
The Hotel Group seeks to use intricate details of bookings, guest demographics, distribution channels, and financial metrics to extract meaningful insights that will inform operational improvements, increase revenue, and contribute to the group's overall success in delivering unparalleled hospitality.
The Business Questions to be answered are as follows;
Booking Patterns:
What is the trend in booking patterns over time, and are there specific seasons or months with increased booking activity?
How does lead time vary across different booking channels, and is there a correlation between lead time and customer type?
Customer Behavior Analysis:
Which distribution channels contribute the most to bookings, and how does the average daily rate (ADR) differ across these channels?
Can we identify any patterns in the distribution of guests based on their country of origin, and how does this impact revenue?
Cancellation Analysis:
- How does the revenue loss from cancellations compare across different customer segments and distribution channels?
Revenue Optimization:
What is the overall revenue trend, and are specific customer segments or countries contributing significantly?
Can we identify optimal pricing strategies based on the Average Daily Rate (ADR) for different customer types and distribution channels?
Geographical Analysis:
How does the distribution of guests vary across different countries, and are there specific countries that should be targeted for marketing efforts?
Is there a correlation between the country of origin and the likelihood of cancellations?
Operational Efficiency:
- What is the average length of stay for guests, and how does it differ based on booking channels or customer types?
Impact of Deposit Types:
How does the presence or absence of a deposit impact the likelihood of cancellations and revenue generation?
Can we identify any patterns in deposit types across different customer segments?
Analysis of Corporate Bookings:
- What is the proportion of corporate bookings, and how does their Average Daily Rate (ADR) compare to other customer types?
Time-to-Event Analysis:
How does the time between booking and arrival date (lead time) affect revenue and the likelihood of cancellations?
Are specific lead time ranges associated with higher customer satisfaction or revenue?
Comparison of Online and Offline Travel Agents:
What is the revenue contribution of online travel agents compared to offline travel agents?
How do cancellation rates and revenue vary between bookings made through online and offline travel agents?
About the Dataset
The data used in this project is a .csv file containing 17 columns and 119, 390 rows of data, ranging from booking dates to financial metrics, and can all be found here.
Data Cleaning
Before diving into the data, I had to check for errors and inconsistencies that may skew my analysis.
To wit, I found some empty rows in the 'Country' column:
SELECT DISTINCT
country
FROM
kachi.booking_data
There were 488 Null rows in the Country column, which were removed.
Data Exploration
To answer the business questions, we thoroughly analyzed the dataset. Below are the key steps and SQL queries used to explore the data and derive insights.
PS: The queries were executed using MySQL RDBMS.
Booking Patterns:
- What is the trend in booking patterns over time, and are there specific seasons or months with increased booking activity?
-- number of bookings by months
SELECT
COUNT(booking_id) AS no_of_bookings,
CASE
WHEN EXTRACT(MONTH FROM booking_date) = 1 THEN 'January'
WHEN EXTRACT(MONTH FROM booking_date) = 2 THEN 'February'
WHEN EXTRACT(MONTH FROM booking_date) = 3 THEN 'March'
WHEN EXTRACT(MONTH FROM booking_date) = 4 THEN 'April'
WHEN EXTRACT(MONTH FROM booking_date) = 5 THEN 'May'
WHEN EXTRACT(MONTH FROM booking_date) = 6 THEN 'June'
WHEN EXTRACT(MONTH FROM booking_date) = 7 THEN 'July'
WHEN EXTRACT(MONTH FROM booking_date) = 8 THEN 'August'
WHEN EXTRACT(MONTH FROM booking_date) = 9 THEN 'September'
WHEN EXTRACT(MONTH FROM booking_date) = 10 THEN 'October'
WHEN EXTRACT(MONTH FROM booking_date) = 11 THEN 'November'
WHEN EXTRACT(MONTH FROM booking_date) = 12 THEN 'December'
END AS Month
FROM
kachi.booking_data
GROUP BY Month
ORDER BY no_of_bookings DESC;
The query result shows that January and February have the highest bookings, with 16,611 and 13,402 bookings, respectively.
- How does lead time vary across different booking channels, and is there a correlation between lead time and customer type?
-- average lead time by booking channels
SELECT
ROUND(AVG(lead_time)),
distribution_channel
FROM
kachi.booking_data
GROUP BY distribution_channel
ORDER BY AVG(lead_time) DESC;
-- correlation between lead time and customer type
SELECT
ROUND(AVG(lead_time)), customer_type
FROM
kachi.booking_data
GROUP BY customer_type;
In this multipart query, the first execution shows us that the Offline Distribution Agent has, on average, the highest lead time compared to other distribution channels.
The second query shows that bookings from that Contract customer type are more likely to be cancelled than any other.
Customer Behaviour Analysis:
Which distribution channels contribute the most to bookings, and how does the average daily rate (ADR) differ across these channels?
-- number of bookings by distribution channels SELECT COUNT(booking_id), distribution_channel FROM kachi.booking_data GROUP BY distribution_channel ORDER BY COUNT(Booking_id) DESC; -- average daily rate accross different channels SELECT distribution_channel, CONCAT('$', ROUND(AVG(CAST(REPLACE(avg_daily_rate, '$', '') AS DECIMAL(10, 2))))) AS avg_daily_rate FROM kachi.booking_data GROUP BY distribution_channel;
The first query's result pinpointed that the Online Travel Agent has three (3) times the number of bookings compared to other distribution channels.
The second query shows that the average daily rate from Online Travel Agents was the highest, with the Direct distribution channel coming in at a close second.
- Can we identify any patterns in the distribution of guests based on their country of origin, and how does this impact revenue?
-- distribution of guests by country and revenue impact
SELECT
country,
COUNT(guests) AS no_of_guests,
CONCAT('$', ROUND(SUM(CAST(REPLACE(Revenue, '$', '') AS DECIMAL (10, 2))), 2)) AS revenue
FROM kachi.booking_data
GROUP BY country
ORDER BY COUNT(guests) DESC;
This shows us that Portugal has the highest number of guests, four times the number of guests in the United Kingdom, which comes in second. They also have two times more revenue than The United Kingdom.
- How does the revenue loss from cancellations compare across different customer segments and distribution channels?
-- revenue loss by customer segment and distribution channels
SELECT
customer_type,
distribution_channel,
CONCAT('$',
ROUND(SUM(CAST(REPLACE(revenue_loss, '$', '') AS DECIMAL (10 , 2 ))),
2)) AS avg_revenue_loss
FROM
kachi.booking_data
GROUP BY customer_type , Distribution_Channel WITH CUBE
From this, we find that the highest revenue loss from the Customer Segment is the Transient Customer Type with $8,049,772 and Online Agent from the Distribution Channel with $7856694.
Revenue Optimization:
- What is the overall revenue trend, and are there specific customer segments or countries contributing significantly to revenue?
-- Revenue by country
SELECT
country,
CONCAT('$',
ROUND(SUM(CAST(REPLACE(revenue, '$', '') AS DECIMAL (10 , 2 ))),
2)) AS revenue_sum
FROM
kachi.booking_data
GROUP BY country;
--Revenue by customer segment
SELECT
Customer_Type,
CONCAT('$',
ROUND(SUM(CAST(REPLACE(revenue, '$', '') AS DECIMAL (10 , 2 ))),
2)) AS revenue_sum
FROM
kachi.booking_data
GROUP BY customer_type;
The results show that Portugal makes the highest revenue contribution, with $7,461,768, followed by the UK, which makes about half that, $3,336,540.
Among the customer segments, the Transient type contributes the most to revenue, with $18,478,219.
- Can we identify optimal pricing strategies based on the Average Daily Rate (ADR) for different customer types and distribution channels?
SELECT
customer_type,
CONCAT('$',
ROUND(AVG(CAST(REPLACE(avg_daily_rate, '$', '') AS DECIMAL (10 , 2 ))),
2)) AS adr
FROM
kachi.booking_data
GROUP BY customer_type;
SELECT
distribution_channel,
CONCAT('$',
ROUND(AVG(CAST(REPLACE(avg_daily_rate, '$', '') AS DECIMAL (10 , 2 ))),
2)) AS adr
FROM
kachi.booking_data
GROUP BY distribution_channel;
These queries show the optimal pricing strategies for customer segments and distribution channels by finding each group's average ADR (Average Daily Rate).
Geographical Analysis:
- How does the distribution of guests vary across different countries, and are there specific countries that should be targeted for marketing efforts?
SELECT
country, COUNT(guests) AS no_of_guests
FROM
kachi.booking_data
GROUP BY country
ORDER BY no_of_guests DESC;
The results show that Portugal has the highest number of guests, with 48,590, with the UK coming in second at 12,129.
- Is there a correlation between the country of origin and the likelihood of cancellations?
SELECT DISTINCT
(country),
SUM(CASE WHEN Cancelled = 1 THEN 1 ELSE 0 END) AS cancelled_count,
COUNT(*) AS total_bookings,
CONCAT(ROUND((SUM(CASE WHEN Cancelled = 1 THEN 1 ELSE 0 END) / COUNT(*)) * 100, 2), '%') AS percentage_cancelled
FROM
`kachi`.`booking_data`
GROUP BY country
This query shows us that bookings from Portugal are most likely to be canceled, 56%, relative to their number of bookings (48,590).
Operational Efficiency:
- What is the average length of stay for guests, and how does it differ based on booking channels or customer types?
SELECT
customer_type,
ROUND(AVG(DATEDIFF(status_update, arrival_date))) AS length_of_stay
FROM kachi.booking_data
WHERE Status_Update > arrival_date AND cancelled = 0
GROUP BY customer_type;
SELECT
distribution_channel,
ROUND(AVG(DATEDIFF(status_update, arrival_date))) AS length_of_stay
FROM kachi.booking_data
WHERE Status_Update > arrival_date AND cancelled = 0
GROUP BY distribution_channel;
The query extracts the length of stay from the 'status_update' and the 'arrival_date' columns using the DATEDIFF function and groups them by customer type and then by distribution channel.
Impact of Deposit Types:
- How does the presence or absence of a deposit impact the likelihood of cancellations and revenue generation?
SELECT
deposit_type,
CONCAT('$',
ROUND(SUM(CAST(REPLACE(revenue_loss, '$', '') AS DECIMAL (10 , 2 ))),
2)) AS revenue_loss,
(SUM(cancelled) / (SELECT SUM(cancelled) FROM kachi.booking_data)) * 100 AS percentage_cancellations
FROM
kachi.booking_data
GROUP BY deposit_type;
This query finds the revenue loss and the percentage of cancellations based on deposit type. It shows that bookings with no deposit have a 67% cancellation rate and the highest revenue loss of $9,460,331. Therefore, bookings with no deposit are the most likely to impact cancellation and revenue generation.
- Can we identify any patterns in the use of deposit types across different customer segments?
SELECT
customer_type,
deposit_type,
COUNT(*) AS num_bookings,
CONCAT(ROUND((COUNT(*) / SUM(COUNT(*)) OVER (PARTITION BY customer_type)) * 100, 2),'%') AS percentage_of_customer_type
FROM
kachi.booking_data
GROUP BY
customer_type, deposit_type;
The query displays the percentage of each deposit type used by each customer segment.
Analysis of Corporate Bookings:
- What is the proportion of corporate bookings, and how does their Average Daily Rate (ADR) compare to other customer types?
WITH booking_stats AS (
SELECT
distribution_channel,
COUNT(*) AS total_bookings,
AVG(CAST(REPLACE(avg_daily_rate, '$', '') AS DECIMAL)) AS avg_daily_rate
FROM
kachi.booking_data
GROUP BY
distribution_channel
)
SELECT
distribution_channel,
total_bookings,
avg_daily_rate,
CONCAT( (total_bookings / SUM(total_bookings) OVER ()) * 100, '%') AS booking_proportion
FROM
booking_stats;
This query calculates the proportion of bookings for each distribution channel and shows that Corporate Bookings take just 5% of total bookings. Their Average Daily Rate is also the second lowest at $70.
Time-to-Event Analysis:
- How does the time between booking and arrival date (lead time) affect revenue and the likelihood of cancellations?
SELECT
CASE
WHEN lead_time <= 7 THEN 'Short_Lead_Time'
WHEN lead_time > 7 AND lead_time <= 30 THEN 'Medium_Lead_Time'
ELSE 'Long_Lead_Time'
END AS lead_time_range,
CONCAT('$', ROUND(AVG(CAST(REPLACE(revenue, '$', '') AS DECIMAL)), 2))
AS avg_revenue,
COUNT(*) AS no_of_bookings,
SUM(cancelled) AS no_cancelled
FROM
kachi.booking_data
GROUP BY lead_time_range
ORDER BY avg_revenue;
This query categorizes the different lead times into short, medium, and long lead times and shows how they affect revenue. The Long Lead Time range, on average, has the highest revenue contribution, with $219, but also the highest revenue loss, on average, with $95.
- Are there specific lead time ranges that are associated with higher customer satisfaction or revenue?
Per the previous query, using revenue as a proxy for customer satisfaction, we can assume that the Long Lead Time range is associated with the highest customer satisfaction.
Comparison of Online and Offline Travel Agents:
- What is the revenue contribution of online travel agents compared to offline travel agents?
SELECT
distribution_channel,
CONCAT('$', ROUND(SUM(CAST(REPLACE(revenue, '$', '') AS DECIMAL)), 2)) AS total_revenue,
CONCAT(ROUND(SUM(CAST(REPLACE(revenue, '$', '') AS DECIMAL)) * 100 /
(SELECT SUM(CAST(REPLACE(revenue, '$', '') AS DECIMAL)) FROM kachi.booking_data), 2), '%') AS percentage_revenue
FROM
kachi.booking_data
GROUP BY
distribution_channel
order by
percentage_revenue DESC;
This query returns the percentage revenue contribution of each distribution channel and shows that the Online Travel Agent contributes almost 60% to revenue, while the Offline Travel Agent contributes 24%.
- How do cancellation rates and revenue vary between bookings made through online and offline travel agents?
SELECT distribution_channel,
SUM(cancelled) AS total_cancelled,
CONCAT(SUM(cancelled) * 100 /
(SELECT SUM(cancelled) FROM kachi.booking_data), '%') AS cancelled_percentage
FROM
kachi.booking_data
GROUP BY distribution_channel
ORDER BY total_revenue DESC
The query shows us that the Online Travel Agent contributes $14,756,257 with a cancellation rate of 72%, while the Offline Travel Agent contributes $5,943,018 with a cancellation rate of 18%.
You can interact with a dashboard for this project here.
INSIGHTS
Booking patterns show that holidays are the peak booking period, as the months with the most bookings are January and February.
Portugal has the highest number of bookings and also the highest revenue.
Lead time varies across distribution channels, with Travel Agents (Online and Offline) leading the pack, and Contract Customers exhibiting the highest average lead time.
The Online Travel Agent channel provides 62% of bookings as well as the highest Average Daily Rate.
While contributing significantly to bookings, Transient Customers also recorded the highest revenue loss.
RECOMMENDATION
Offer special promotions, discounts, or packages specifically targeted at non-festive periods to incentivize bookings. This could include discounted room rates, complimentary upgrades, etc.
Explore targeted marketing initiatives to further boost bookings from lucrative regions eg. The United Kingdom.
Tailor services to the preferences of high-lead time segments, such as
Travel Agents, to maximize revenue and minimize cancellations.Implement flexible booking policies that provide guests with peace of mind and confidence when making reservations during uncertain times. Offer options for free cancellations or changes to accommodate changing travel plans.
Forge partnerships with corporate clients or companies in related industries to secure group bookings or corporate retreats during off-peak times. Offer discounted rates or customized packages for corporate clients to encourage repeat business.
CONCLUSION
This analysis aided in uncovering insights about the booking pattern and would help inform operational improvements, increase revenue, and contribute to the group’s overall success in delivering unparalleled hospitality.
The success of these marketing strategies should be tracked through data analytics and adapted based on ongoing insights and feedback from stakeholders.