USING OVERSAMPLING TO SOLVE CLASS IMBALANCE PROBLEMS WITH LARGE DATASETS

Authors

  • Pertik Garg, Jarnail Singh Author

Abstract

Data is the important component for any organization decision making purposes. Various applications are producing the multimedia data in millions of bytes. For better analysis of the data there requires better data mining techniques. These techniques will extract the relevant data from the large repository. But while analysis the datasets there can be misclassification of the data items. Developing techniques for the machine learning of a classifier from class-imbalanced data presents an important challenge. One class can have large data compared to the other class. Like in current research the late flights has substantially lower amount of data compared to on-time flights data.  It in results leads to the poor analysis. The oversampling technique is the best technique for balance the minority class. Both classes then will be having balanced classes. All the performance factors like G-mean and AUC (Area under Curve) are giving better results compared to imbalanced classes.

Downloads

Published

2024-12-08

Issue

Section

Articles