Data Normalization

How and Why?

  • Normalisation helps us to minimize data redundancy by removing duplicate data beacuse it may be cause of bias.
  • Normalisation ensures consistency of data across database. Without normalization, if data is duplicated across multiple records and one copy is updated or deleted, it is easy to miss updating or deleting the corresponfing data in other places.
  • Data should not be misclassified otherwise it can be cuase of wrong prediction. For example, if you add text "I am happy" in sadness class, then model will predict it as sadness which is not good to our purpose.
  • Data should be according to feature of the dataset. It is meaningless to have mobile no. or website link in text like "I received a missed call from +919887961384." or "the site of Google, ' https://www.google.com/ ', is not opening on my mobile."