Sunday, July 11, 2021

Data Preprocessing Vs. Data Wrangling

  • Data Preprocessing is performed before Data Wrangling
  • Data Preprocessing data is prepared exactly after receiving the data from the data source.
  • In this initial transformations, Data Cleaning or any aggregation of data is performed. It is executed once.
  • It is the concept that is performed before applying any iterative model and will be executed once in the project.
  • Data Wrangling is performed during the iterative analysis and model building.
  • This concept at the time of feature engineering.
  • The conceptual view of the dataset changes as different models is applied to achieve good analytic model.

Preparing Own Collected Dataset as Benchmark Dataset for Research

1. Must be publicly available (upload on online portal and without any permission / OPEN ACCESS)

2. The dataset must address a specific problem / instance (CLASSIFICATION / REGRESSION / CLUSTERING / ENSEMBLED / DECISION TREE)

3. The dataset should not be generic for all algorithms..... dataset should not be like an all-rounder

4. Preferably Standardized (standard deviation based, consistent variance : statistical formulations should be used)