Skip to content

aditya2005-code/Cab_Fare_Prediction

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Absolutely! Here’s a more elaborated, professional README for your Cab Fare Prediction project. You can directly copy-paste this:


🚖 Cab Fare Prediction

Cab Fare Prediction is a Machine Learning project that predicts taxi fares based on ride details such as pickup and dropoff locations, date, time, and passenger count. This project demonstrates data cleaning, feature engineering, exploratory data analysis (EDA), and regression modeling using Python.

It is designed to provide insights into fare estimation for taxi services and can be extended for real-time cab fare prediction systems.


📝 Project Overview

The goal of this project is to predict the fare amount of a taxi ride in New York City based on historical ride data. Accurate fare prediction is crucial for:

  • Helping customers estimate ride costs
  • Assisting drivers in route planning
  • Reducing disputes between drivers and passengers

The project covers the entire machine learning pipeline:

  1. Data Collection – Uses publicly available NYC Taxi Fare dataset from Kaggle.

  2. Data Cleaning – Handling missing values, removing invalid or outlier entries (e.g., negative fares or impossible coordinates).

  3. Feature Engineering – Extracting relevant features from datetime (Year, Month, Day, Hour, Weekday), calculating distances using the Haversine formula, and adding passenger count.

  4. Exploratory Data Analysis (EDA) – Visualizing distributions, relationships, and outliers to understand the dataset.

  5. Model Building – Implementing multiple regression algorithms:

    • Linear Regression
    • Decision Tree Regression
    • Random Forest Regression
    • Gradient Boosting Regression
  6. Model Evaluation – Comparing models using metrics like RMSE (Root Mean Squared Error) and R² Score to select the best-performing model.


📊 Dataset Description

The dataset contains the following columns:

Column Name Description
pickup_datetime Date and time when the ride started
pickup_latitude Latitude coordinate of the pickup location
pickup_longitude Longitude coordinate of the pickup location
dropoff_latitude Latitude coordinate of the dropoff location
dropoff_longitude Longitude coordinate of the dropoff location
passenger_count Number of passengers on the ride
fare_amount Target variable: Fare of the ride (in USD)

The dataset contains over 5 million records, making it suitable for building robust machine learning models.


⚙️ Installation & Setup

Follow these steps to run the project locally:

  1. Clone the repository:
git clone https://github.com/aditya2005-code/Cab-Fare-Prediction.git
cd Cab-Fare-Prediction
  1. Install dependencies:
pip install -r requirements.txt
  1. Run the Jupyter Notebook:
jupyter notebook
  1. Explore the notebook to train models, visualize data, and make predictions.

🛠 Features Implemented

  • Extract Year, Month, Day, Hour, and Weekday from datetime
  • Calculate Haversine distance between pickup and dropoff points
  • Handle missing values and remove outliers
  • Compare multiple regression models
  • Evaluate models using RMSE and R² Score

📈 Model Performance

Model RMSE R² Score
Linear Regression 5.12 0.71
Decision Tree Regressor 4.75 0.76
Random Forest Regressor 3.92 0.82
Gradient Boosting Regressor 3.85 0.83

Observation: Ensemble models like Random Forest and Gradient Boosting outperform simple linear regression by capturing complex patterns in the data.


🔗 Dataset Source

This project uses the NYC Taxi Fare Prediction dataset: Kaggle NYC Taxi Fare Dataset


💻 Technologies & Libraries Used

  • Python – Programming language
  • Pandas & NumPy – Data manipulation and numerical operations
  • Matplotlib & Seaborn – Data visualization
  • Scikit-learn – Machine learning models and evaluation metrics

🔮 Future Enhancements

  • Deploy the model as a web application using Streamlit or Flask
  • Include real-time fare prediction using live GPS coordinates
  • Experiment with deep learning models for improved accuracy
  • Incorporate traffic, weather, and surge pricing data for more realistic predictions

👨‍💻 Author

Aditya B.Tech in Computer Science (Specialization: Data Science)


About

This repository contains the code of a Machine learning model of Cab Fare Prediction.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors