Knowing your Data

Puppalapriyanka
3 min readJun 27, 2021

--

Priyanka Puppala

Fig. Knowing your Data

Key_points:

—> Collection & managing data effectively

—>Understanding, Examining, and reporting data

— >An explorative approach

— >Analyzing & Evaluating quality data

A key element of modeling any machine learning algorithms involves managing and understanding the data.

Data selection is the process of understanding and defining the type of data suitable methods to retrieve it. It mainly includes the data types, sources, and techniques necessary to provide a good understanding of data, which are sample sales, financial, customer, medical data, etc.

Fig. Data Selection

Reliability/Integrity issues that arise while performing the selection & decision making, which are the accurate data, appropriate method to collect it, and how effectively answer questions. Data Selection and collection involve different concerns like:

Fig. Data Collection

Data Describing is the task to examine and report the collected data, data formats, metadata, and quantity, that need to satisfy the business requirements.

  • By analyzing/examining, we able to understand the complexity of the data
  • By describing the data, able to get the relationship between them, for we can use data tables.
  • Understanding the meaning and ranges of attributes
  • Computing basic statistics (like mean, median, mode, distribution, maximum, minimum, standard deviation, variance, skewness, etc.)
  • While describing the data, we need to keep in mind there are different types of attributes in both Qualitative and Quantitative data, are:
Fig. Describing the data

Qualitative:

  • Ordinal
  • Nominal

Quantitative:

  • Interval
  • Ratio

It involves different types of metadata are:

  • Descriptive
  • Structural
  • Statistical
  • Reference
  • Administrative
Fig. Visual Data Exploration

Exploring data is an approach is just like data analysis, where the data visually explored to understand the data and its characteristics. Visualization, the activity of data exploration using various statical models through graphical representations like heatmaps, boxplot, histogram, etc. Data exploration is usually conducted using a combination of “Automated” and Manual” mechanisms. All the visual data exploration is aimed to create a good understanding of data and analyze it.

Assessing Data Quality is another important part of understanding data by evaluating the quality of data. By integrating data quality, data can be refined by removing unusable, missing, and poorly formatted data.

Improving Data Quality, to enhance the quality of the data further, we can use a four-step approach:

o Defining Scope of the issues

o Exploring the quality need with key stakeholders

o Analyzing the best practices for data quality with the finding from the data exploration step

o Improving by recommending a road map for quality process includes technical architecture and statistical methods.

--

--