🧑‍💻: Deep_Learning/Spam Vs Ham Mail Classification [With Streamlit GUI]/Model/model1 Enhancement problem #52

Sakeebhasan123456 · 2024-10-03T10:51:32Z

hii @UTSAVS26 i analyzed the deep learning model and these problems come into the picture i would like to work on these problems please assign this project to me

SMS Spam Classification Project

Hidden Problems & Solutions

Identifying and addressing these problems is crucial for enhancing the performance, reliability, and usability of the SMS Spam Classification model.

#	Problem	Description	Solution
1	Class Imbalance Not Addressed	The dataset contains approximately 87.4% ham and 12.6% spam messages without balancing techniques.	- Apply SMOTE (Synthetic Minority Over-sampling Technique) to generate synthetic spam samples. - Use class weighting in algorithms to give more importance to the spam class.
2	Limited Evaluation Metrics	Evaluation primarily relies on accuracy and precision.	- Incorporate recall, F1-score, and ROC AUC Score to get a more comprehensive evaluation of model performance. - Use confusion matrix visualizations.
3	Single Train-Test Split Without Cross-Validation	Only a single train-test split is used for model evaluation.	- Implement Stratified K-Fold Cross-Validation to ensure reliable and generalizable performance estimates. - Utilize cross_val_score for robust metrics.
4	Suboptimal Model Selection and Lack of Hyperparameter Tuning	Multiple models are used without thorough hyperparameter optimization.	- Perform GridSearchCV or RandomizedSearchCV for hyperparameter tuning. - Explore advanced models like Logistic Regression, XGBoost, or LightGBM.
5	Use of GaussianNB with High-Dimensional Sparse Data	Gaussian Naive Bayes is applied to TF-IDF vectors, which are high-dimensional and sparse.	- Prefer models like MultinomialNB or BernoulliNB for high-dimensional sparse data. - Alternatively, use models that handle sparsity better, such as SVM or Random Forest.
6	Lack of Pipeline Integration	Preprocessing, feature extraction, and modeling steps are handled separately.	- Utilize sklearn.pipeline to chain preprocessing and modeling steps, preventing data leakage and enhancing workflow maintainability.
7	Insufficient Text Preprocessing	Basic preprocessing steps are applied, but advanced techniques like handling contractions are missing.	- Implement contraction expansion to maintain semantic integrity. - Apply spelling correction and Named Entity Recognition (NER) to enhance text quality.
8	Potential Overfitting with Multinomial and BernoulliNB	These models can overfit if the feature space is too large or not properly regularized.	- Apply regularization techniques such as adjusting the `alpha` parameter. - Reduce feature dimensionality using TruncatedSVD or SelectKBest.
9	Streamlit App Dependencies Not Managed Properly	The Streamlit app comments out loading the vectorizer and model, indicating potential deployment issues.	- Ensure proper saving and loading of the TF-IDF vectorizer and trained model using `pickle` or `joblib`. - Verify file paths and manage dependencies correctly.
10	No Handling of Rare Words or Enhanced Feature Selection	TF-IDF is used without additional feature selection or handling of rare/high-frequency terms.	- Enhance TF-IDF parameters (`max_df`, `min_df`, `ngram_range`) to better capture important features. - Implement feature selection methods like Chi-Squared or Mutual Information.

✅ To be Mentioned while taking the issue :

Full name : Sakeeb hasan
Open Source Program name:-GSocs Extended

Happy Contributing 🚀

All the best. Enjoy your open source journey ahead. 😎

The text was updated successfully, but these errors were encountered:

github-actions · 2024-10-03T10:51:41Z

🙌 Thank you for bringing this issue to our attention! We appreciate your input and will investigate it as soon as possible.

Feel free to join our community on Discord to discuss more!

Sakeebhasan123456 · 2024-10-04T12:24:44Z

hii @UTSAVS26 please check i send the pull request

github-actions · 2024-10-04T14:28:27Z

✅ This issue has been closed. Thank you for your contribution! If you have any further questions or issues, feel free to join our community on Discord to discuss more!

github-actions bot assigned UTSAVS26 Oct 3, 2024

UTSAVS26 assigned Sakeebhasan123456 and unassigned UTSAVS26 Oct 3, 2024

UTSAVS26 added good first issue Good for newcomers Contributor Denotes issues or PRs submitted by contributors to acknowledge their participation. Status: Assigned💻 Indicates an issue has been assigned to a contributor. level2 gssoc-ext hacktoberfest labels Oct 3, 2024

Sakeebhasan123456 mentioned this issue Oct 4, 2024

Deep_Learning/Spam Vs Ham Mail Classification #90

Merged

2 tasks

UTSAVS26 added level1 Level Update and removed level2 labels Oct 4, 2024

UTSAVS26 closed this as completed in #90 Oct 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

🧑‍💻: Deep_Learning/Spam Vs Ham Mail Classification [With Streamlit GUI]/Model/model1 Enhancement problem #52

🧑‍💻: Deep_Learning/Spam Vs Ham Mail Classification [With Streamlit GUI]/Model/model1 Enhancement problem #52

Sakeebhasan123456 commented Oct 3, 2024

github-actions bot commented Oct 3, 2024

Sakeebhasan123456 commented Oct 4, 2024

github-actions bot commented Oct 4, 2024

🧑‍💻: Deep_Learning/Spam Vs Ham Mail Classification [With Streamlit GUI]/Model/model1 Enhancement problem #52

🧑‍💻: Deep_Learning/Spam Vs Ham Mail Classification [With Streamlit GUI]/Model/model1 Enhancement problem #52

Comments

Sakeebhasan123456 commented Oct 3, 2024

SMS Spam Classification Project

Hidden Problems & Solutions

github-actions bot commented Oct 3, 2024

Sakeebhasan123456 commented Oct 4, 2024

github-actions bot commented Oct 4, 2024