1.5. Data in AI

Tk 99

Buy Now

Already purchased? To view Sign In

AI Fundamentals Crash Course একটি সম্পূর্ণ বেসিক-টু-ফাউন্ডেশন লেভেলের কোর্স, যেখানে আপনি শিখবেন কীভাবে Artificial Intelligence (AI) কাজ করে এবং আধুনিক AI টুল ও সিস্টেমের মূল কনসেপ্টগুলো পরিষ্কারভাবে বুঝে নিতে হয়—একেবারে শূন্য থেকে।

Data is the backbone of Artificial Intelligence (AI).

1. Definition

Data in AI refers to the raw information (numbers, text, images, audio, video, sensor signals, etc.) that machines use to learn, reason, and make predictions.

Without data, AI cannot recognize patterns or improve performance.

2. Types of Data in AI

Structured Data
- Organized in rows and columns (like Excel sheets).
- Examples: bank transactions, stock prices, customer details.

Unstructured Data
- Raw, unorganized, difficult to store in tables.
- Examples: emails, images, videos, social media posts.

Semi-Structured Data
- Contains some organization but not strictly tabular.
- Examples: JSON files, XML data, log files.

3. Data in AI Learning

AI models require different types of datasets:

Training Data → Used to teach the AI model.
Validation Data → Used to fine-tune and adjust parameters.
Testing Data → Used to check final accuracy and generalization.

4. Qualities of Good Data

For AI to work effectively, data must be:

Accurate (free of errors).
Relevant (fits the problem being solved).
Sufficient (enough volume for training).
Diverse (represents real-world variations).
Clean (well-preprocessed, no duplicates or noise).

5. Sources of Data for AI

Sensors and IoT devices (self-driving cars, wearables).
Databases and CRMs (business/customer data).
Web and Social Media (Twitter, YouTube, reviews).
Government and Open Data (weather, census, healthcare).
Generated Data (synthetic datasets, simulations).

6. Role of Data in AI

Enables pattern recognition (e.g., face recognition).
Powers prediction and forecasting (e.g., stock trends, weather).
Improves decision-making (e.g., medical diagnosis).
Allows personalization (e.g., Netflix, Amazon recommendations).

7. Challenges with Data in AI

Data Quality Issues: noisy, incomplete, biased data.
Privacy Concerns: handling sensitive personal information.
Bias and Fairness: AI may inherit human or systemic biases.
Data Scarcity: not enough labeled data for niche problems.
Cost of Data: collecting and cleaning is resource-intensive.

Metadata

Metadata means “data about data.”

It describes the characteristics, structure, or context of a dataset, helping humans and machines understand, organize, and use the data effectively.

For example:

A photo → data.
Its file size, date taken, location, and camera model → metadata.

Labeled Data vs Unlabelled Data

Labeled Data

https://stackoverflow.com/questions/19170603/what-is-the-difference-between-labeled-and-unlabeled-data

Definition: Data that has meaningful tags or annotations (labels) attached.

Each data point is paired with the correct output or category.
Used for supervised learning in Machine Learning.

Examples:

An image of a cat labeled as “Cat”.
A medical record labeled as “Diabetes: Yes/No”.
A review tagged as “Positive” or “Negative”.

Advantages:

Provides direct guidance to AI models.
Enables accurate prediction and classification.

Disadvantages:

Costly and time-consuming to create (requires human experts).
Not always scalable for very large datasets.

Unlabeled Data

Definition: Raw data without predefined labels.

The system only has input data, no correct answer.
Used for unsupervised learning or semi-supervised learning in Machine Learning.

Examples:

Thousands of raw images without tags.
Social media posts with no sentiment label.
Sensor readings with no outcome attached.

Advantages:

Abundant and cheaper to collect.
Useful for discovering hidden patterns and clusters.

Disadvantages:

Harder to train AI models without labels.
Requires advanced algorithms to interpret meaning.

resently

Instructor

Pijush Saha

Pijush Saha is the Digital Marketing Consultant, Coach and Ex Google Employee. He has been working for 12 years in the digital marketing sector involving predominantly in Performance Marketing including SEO, Media Buying, & Web Analytics.