The most infamous disaster which occurred over a century ago on April 15, 1912, that is well known as sinking of “The Titanic”. Many classes of people of all ages and gender where present on that fateful night, but the bad luck was that there were only few life boats to rescue.
The objective is to perform exploratory data analytics to mine various information in the dataset available and to know effect of each field on survival of passengers by applying analytics between every field of dataset with “Survival” field. The predictions are done for newer data sets by applying machine learning algorithm. The data analysis will be done on applied algorithms and accuracy will be checked.
Lets get a quick overview of the dataset. The data contains 12 columns and 891 rows. Let’s see the description of each column.
- PassengerId → This is just the identification of the passenger. Each passenger has s unique Id. Instead of referring to them by their full names
- Survived → This is divided into 0 and 1. 0 means “did not survive” and 1 means “survived’.
- Pclass → Ticket class of each passenger either 1st, 2nd or 3rd.
- Name → Name of each passenger on board.
- Sex → Sex identity of the passengers
- Age → Age of each passenger
- SibSp → Number of siblings /spouses aboard the Titanic
- Parch → Number of parents/children aboard the Titanic
- Ticket → Ticket number
- Fare → Passenger fare
- Cabin → Cabin number
- Embark → Port from where passenger embarked. “C” for Cherbourg, “Q” for Queenstown, “S” for Southampton.
Surviving this incident was a miracle and a lot of factors came into play. we want to know how these factors affected the survival chances of each passenger.
→ How were the survival chances of people with family members affected ?
We combined parch and sibsp column to know family size of a particular passenger. We found that survival rate decreases when family size increases and goes really low hen family size becomes greater than 3, survival rate decrease as shown below.
→ How did their ticket class affect their survival chances?
We found that Passengers who were travelling in first class were more likely to survive than the second and third class. The survival rate reduced as the class level also reduced because people who paid higher were given preference over others.
→ what is the survival rate of men and women?
The sex identification also played a huge role in the survival chance of a passenger as preference were given to women also.
As seen below, over 74% of women survived and just 18.8% of men survived.
This project involves implementation of data analytics and machine learning. We derived from the project that the major factors that determined the survival rate of each passengers are sex, number of family members and the ticket class.
However, other factors also came into play but were not as effective as these three listed above. I also created a machine learning model with an accuracy of 78% to predict more outcomes of survival rate on the titanic ship based on the all necessary features/fields. This can be seen in my github repository.