Analysis of the Factors that Affect Bus Delays in Toronto

Report

Problem Statement

The project aims to analyze the factors that affect bus delays in Toronto based on the dataset that is sourced from the Toronto Transit Commission (TTC), which provides detailed records of bus delays across the city. It focuses on four key factors, namely bus route, days of the week, time, and incident type. The findings of the project to can be used to inform policy and decision-makers to improve the reliability and efficiency of the public transit system.

TTC Bus Delay 2022 (Python)

Conclusion

Bus delays mostly happened in the afternoon around 2-5pm at the end stations of subway line 2 and with the majority cause categories of “Operations/Operators” and “Mechanical.” 

Based on the information provided, we can concentrate on these causes and investigate each one in detail to arrive at a more definitive conclusion and perhaps find solutions to the problem of bus delays.

Further Analysis: Using Spark SQL to Analyze Data

We could further analyze the TTC bus delay data with the help of Spark SQL to avoid writing spark code. 

For example, with the use of SQL query, we can see the distribution of the length of delays throughout the day and the week using Boxplot charts. Combined with the first half of the analysis, we can filter out the delays that happened frequently and for a longer period of time, and then prioritize improving the routes or stations related to these delays first.