
The California House Price Prediction project leverages machine learning to forecast house prices based on various factors such as location, median income, and average rooms per household. By building a comprehensive data pipeline, we aim to predict house prices with high accuracy and provide actionable insights.
- π Data Preprocessing & Cleaning: Handle missing values, detect outliers, and normalize data for optimal model performance.
- π Exploratory Data Analysis (EDA): Visualize correlations and trends to extract meaningful insights.
- π Machine Learning Pipeline: Automated feature engineering, model selection, and hyperparameter tuning.
- π Support for Multiple Regression Models: Compare different models to achieve the best prediction accuracy.
california-house-price-prediction/
β
βββ data/ # Dataset files (raw and processed)
βββ Notebooks/ # Jupyter notebooks for analysis and experiments
β βββCalifornia_house_model (1).ipynb # Jupyter Notebook Code
βββ src/ # Source code for data processing and model training
β βββcalifornia_house_model.py # Python Code
βββ requirements.txt # Python dependencies
βββ .gitignore # Ignored files and directories
βββ README.md # Project documentation
- Clone the Repository:
git clone https://github.com/Fujelhrx/California-House-Model.git cd california-house-price-prediction
- Install Dependencies:
pip install -r requirements.txt
- Run the Jupyter Notebook:
jupyter notebook
- Open
California_house_model.ipynb
and run all cells to preprocess the data, train the model, and evaluate predictions. - For command-line usage, execute the training script:
python src/california_house_model.py
The script demonstrates how to build custom transformers using BaseEstimator
and TransformerMixin
, such as:
StandardScalerClone
: A custom standard scaler.ClusterSimilarity
: Computes RBF kernel similarity to cluster centers.
- Add machine learning model training and evaluation.
- Optimize the feature engineering process.
- Implement hyperparameter tuning for better prediction accuracy.
We welcome contributions from the community! Hereβs how you can help:
- Fork the repository.
- Create a new branch for your feature or bugfix.
- Submit a pull request with a detailed explanation.
This project is licensed under the MIT License. See the LICENSE file for details.
- scikit-learn: For providing the essential machine learning tools.
- California Housing Dataset: The backbone of our data.
- Open-source Contributors: For their continuous support and contributions.
We’d love to hear from you! If you have any questions, feedback or suggestions, feel free to reach out:
- π§ Email: machhaliyafujel@gmail.com
- πΌ LinkedIn: Fujel Machhaliya
- π₯οΈ GitHub: Fujelhrx
Q: What dataset is used?
A: The dataset is a publicly available California housing dataset that includes features such as median income, location, and the number of rooms.
Q: Can I adapt this project for another region?
A: Absolutely! Modify the preprocessing and data handling steps to fit your custom dataset.

Leave a Reply