Structured and Unstructured Data
Structured Data:
Format: Structured data is highly organized and formatted in a way that is easily searchable and understood by computer systems. It follows a specific schema or model. Examples: This includes data in relational databases (like SQL), spreadsheets, and any other form where data is organized into tables, rows, and columns. Each column has a specific datatype and stores a particular kind of information. Storage and Management: Because of its organization, structured data is easier to process and analyze using traditional data tools and methods like SQL queries. Use Cases: Useful in scenarios where consistency and integrity of data are crucial, such as financial records, inventory management, and customer relationship management systems. Unstructured Data:
Format: Unstructured data does not follow a specific format or schema. It is often text-heavy, but may contain data such as dates, numbers, and facts as well. Examples: This includes email messages, social media posts, videos, audio recordings, web pages, and other forms of media and text. Storage and Management: Processing and analyzing unstructured data requires more advanced methods and technologies, such as natural language processing, machine learning, and other data mining techniques. Use Cases: Predominant in areas requiring qualitative analysis, like sentiment analysis, trend prediction, and customer feedback interpretation.
Data Sources
Companies store data in multiple systems such as:
Customer Relationship Management (CRM)
Sales Records
Finance
Enterprise resource planning (ERP)
Customer Applications
Each of these separate systems has a slice of data about the customer. When these data are integrated, the organization has a better view of the overall customer life cycle. There are three general steps for transforming institutional knowledge into implementable data solutions:
Data Hygiene
Data Hygiene refers to the processes of ensuring the cleanliness of data (i.e., that the data is relatively error-free).
Dirty Data
Dirty Data can be caused by things such as duplicate records, incomplete or outdated data, and mistakes introduced as data is entered, stored, and managed.
Data quality is crucial to operational and transactional processes within an organization and to the reliability of analytical reporting. Data Scrubbing
Data Scrubbing is the process of amending or removing data in a database that is incorrect, incomplete, improperly formatted, or duplicated.
Typically, the process involves updating, standardizing, and de-duplicating records to create a single view of the data, even if it is stored in multiple systems. Quality Data
Quality Data is needed for effective decision-making. Quality data is typically defined as data that is precise, valid, reliable, timely, and complete.
Precision
Precision is an important attribute of data. Precision describes how precise the data is in the context of its intended use.
Data collected in healthcare, for example, must be more precise than that of other industries