Unit 2: Intro to IT

icon picker
Lesson 2.4 Data and Information

Last edited 581 days ago by Makiel [Muh-Keel].
image.png

Structured and Unstructured Data

Structured Data:

Format: Structured data is highly organized and formatted in a way that is easily searchable and understood by computer systems. It follows a specific schema or model.
Examples: This includes data in relational databases (like SQL), spreadsheets, and any other form where data is organized into tables, rows, and columns. Each column has a specific datatype and stores a particular kind of information.
Storage and Management: Because of its organization, structured data is easier to process and analyze using traditional data tools and methods like SQL queries.
Use Cases: Useful in scenarios where consistency and integrity of data are crucial, such as financial records, inventory management, and customer relationship management systems.

Unstructured Data:

Format: Unstructured data does not follow a specific format or schema. It is often text-heavy, but may contain data such as dates, numbers, and facts as well.
Examples: This includes email messages, social media posts, videos, audio recordings, web pages, and other forms of media and text.
Storage and Management: Processing and analyzing unstructured data requires more advanced methods and technologies, such as natural language processing, machine learning, and other data mining techniques.
Use Cases: Predominant in areas requiring qualitative analysis, like sentiment analysis, trend prediction, and customer feedback interpretation.

Data Sources

Companies store data in multiple systems such as:

Customer Relationship Management (CRM)

Sales Records

Finance

Enterprise resource planning (ERP)

Customer Applications

Each of these separate systems has a slice of data about the customer. When these data are integrated, the organization has a better view of the overall customer life cycle.
There are three general steps for transforming institutional knowledge into implementable data solutions:
Capturing
Analyzing
Using.

Data Hygiene

Data Hygiene refers to the processes of ensuring the cleanliness of data (i.e., that the data is relatively error-free).

Dirty Data

Dirty Data can be caused by things such as duplicate records, incomplete or outdated data, and mistakes introduced as data is entered, stored, and managed.
Data quality is crucial to operational and transactional processes within an organization and to the reliability of analytical reporting.

Data Scrubbing

Data Scrubbing is the process of amending or removing data in a database that is incorrect, incomplete, improperly formatted, or duplicated.
Typically, the process involves updating, standardizing, and de-duplicating records to create a single view of the data, even if it is stored in multiple systems.
image.png

Quality Data

Quality Data is needed for effective decision-making. Quality data is typically defined as data that is precise, valid, reliable, timely, and complete.

Precision

Precision is an important attribute of data. Precision describes how precise the data is in the context of its intended use.
Data collected in healthcare, for example, must be more precise than that of other industries

Want to print your doc?
This is not the way.
Try clicking the ⋯ next to your doc name or using a keyboard shortcut (
CtrlP
) instead.