Lesson 2.4 Data and Information

Makiel [Muh-Keel]

Last edited 581 days ago by Makiel [Muh-Keel].

⁠

Structured and Unstructured Data

Structured Data:

Format: Structured data is highly organized and formatted in a way that is easily searchable and understood by computer systems. It follows a specific schema or model.

Examples: This includes data in relational databases (like SQL), spreadsheets, and any other form where data is organized into tables, rows, and columns. Each column has a specific datatype and stores a particular kind of information.

Storage and Management: Because of its organization, structured data is easier to process and analyze using traditional data tools and methods like SQL queries.

Use Cases: Useful in scenarios where consistency and integrity of data are crucial, such as financial records, inventory management, and customer relationship management systems.

Unstructured Data:

Format: Unstructured data does not follow a specific format or schema. It is often text-heavy, but may contain data such as dates, numbers, and facts as well.

Examples: This includes email messages, social media posts, videos, audio recordings, web pages, and other forms of media and text.

Storage and Management: Processing and analyzing unstructured data requires more advanced methods and technologies, such as natural language processing, machine learning, and other data mining techniques.

Use Cases: Predominant in areas requiring qualitative analysis, like sentiment analysis, trend prediction, and customer feedback interpretation.

Data Sources

Companies store data in multiple systems such as:

Customer Relationship Management (CRM)

Sales Records

Finance

Enterprise resource planning (ERP)

Customer Applications

Each of these separate systems has a slice of data about the customer. When these data are integrated, the organization has a better view of the overall customer life cycle.

There are three general steps for transforming institutional knowledge into implementable data solutions:

Capturing

Analyzing

Using.

Data Hygiene

Data Hygiene refers to the processes of ensuring the cleanliness of data (i.e., that the data is relatively error-free).

Dirty Data

Dirty Data can be caused by things such as duplicate records, incomplete or outdated data, and mistakes introduced as data is entered, stored, and managed.

Data quality is crucial to operational and transactional processes within an organization and to the reliability of analytical reporting.

Data Scrubbing

Data Scrubbing is the process of amending or removing data in a database that is incorrect, incomplete, improperly formatted, or duplicated.

Typically, the process involves updating, standardizing, and de-duplicating records to create a single view of the data, even if it is stored in multiple systems.

⁠

Quality Data

Quality Data is needed for effective decision-making. Quality data is typically defined as data that is precise, valid, reliable, timely, and complete.