1. What is SQL?
Definition
SQL stands for Structured Query Language. It is used to store, read, filter, combine, summarize, and manage data inside relational databases.
Think of a database as a well-organized cupboard, tables as shelves, rows as individual items, and columns as labels describing each item. SQL is how we ask the cupboard: "Show me only the items I need."
Why Data Science Students Need SQL
- Most company data lives in databases.
- SQL helps you extract the exact data needed for analysis.
- It is a common interview skill for Data Analyst, Data Scientist, BI, and Data Engineer roles.
- It reduces dependency on manual Excel exports.
SQL Is Used For
- Finding top customers by revenue.
- Calculating monthly sales growth.
- Combining customer, order, and product tables.
- Preparing clean datasets for Python, Tableau, or Power BI.
2. Database Basics
Table
A table is data arranged in rows and columns, similar to an Excel sheet.
Row
A row is one complete record. Example: one customer or one order.
Column
A column is one property of every row. Example: name, city, amount.
Primary Key
A column that uniquely identifies each row. Example: customer_id.
Common SQL Data Types
| Data Type | Meaning | Example |
|---|---|---|
| INT | Whole number | 101, 5000 |
| DECIMAL | Number with decimals | 499.99, 75.50 |
| VARCHAR | Text | 'Hyderabad', 'Laptop' |
| DATE | Date value | '2026-05-05' |
| BOOLEAN | True or false value | TRUE, FALSE |
3. Sample Dataset Used in These Notes
We will use a simple e-commerce dataset. This makes SQL easy to understand because every table has a clear business meaning.
customers
customer_id - primary key
name
city
signup_date
orders
order_id - primary key
customer_id - links to customers
order_date
amount
status
products
product_id - primary key
product_name
category
price
Relationship Diagram
customers
customer_id
orders
customer_id
product_id
products
product_id
4. SELECT Queries
Definition
SELECT is used to choose which columns you want to see from a table.
-- Basic structure SELECT column1, column2 FROM table_name;
Example 1: Show All Customers
SELECT * FROM customers;
Example 2: Show Selected Columns
SELECT name, city FROM customers;
Important Beginner Tip
Avoid using SELECT * in real projects when you need only a few columns. Selecting unnecessary columns makes queries slower and harder to read.
5. Filtering Data with WHERE
WHERE is used to keep only rows that match a condition. In data analysis, filtering is one of the most common operations.
-- Customers from Hyderabad SELECT name, city FROM customers WHERE city = 'Hyderabad';
| Operator | Meaning | Example |
|---|---|---|
| = | Equal to | city = 'Hyderabad' |
| != or <> | Not equal to | status != 'Cancelled' |
| > | Greater than | amount > 1000 |
| BETWEEN | Within a range | amount BETWEEN 500 AND 2000 |
| IN | Matches any value in a list | city IN ('Hyderabad','Pune') |
| LIKE | Pattern matching | name LIKE 'A%' |
-- Orders above 2000 rupees SELECT order_id, amount FROM orders WHERE amount > 2000;
-- Orders placed in 2026
SELECT order_id, order_date, amount
FROM orders
WHERE order_date BETWEEN '2026-01-01'
AND '2026-12-31';
6. Aggregation: Turning Rows into Summary
Definition
Aggregation means combining many rows to create one useful summary value, such as total sales, average order amount, or number of customers.
| Function | Use | Example Question |
|---|---|---|
| COUNT() | Counts rows | How many orders? |
| SUM() | Adds values | Total revenue? |
| AVG() | Average value | Average order amount? |
| MIN() | Smallest value | Lowest price? |
| MAX() | Largest value | Highest sale? |
-- Overall sales summary
SELECT
COUNT(*) AS total_orders,
SUM(amount) AS total_revenue,
AVG(amount) AS average_order_value,
MAX(amount) AS highest_order
FROM orders
WHERE status = 'Delivered';
GROUP BY
GROUP BY creates separate summaries for each group. For example, total sales by city or month.
-- Total sales by city
SELECT
c.city,
SUM(o.amount) AS total_sales
FROM customers c
JOIN orders o ON c.customer_id = o.customer_id
GROUP BY c.city
ORDER BY total_sales DESC;
WHERE vs HAVING
WHERE
Filters rows before grouping.
HAVING
Filters grouped summary results.
-- Cities with sales above 100000 SELECT c.city, SUM(o.amount) AS total_sales FROM customers c JOIN orders o ON c.customer_id = o.customer_id WHERE o.status = 'Delivered' GROUP BY c.city HAVING SUM(o.amount) > 100000;
7. JOINs: Combining Tables
Definition
A JOIN combines rows from two or more tables using a related column, usually a primary key and foreign key.
matching rows
all left rows
all right rows
all rows
INNER JOIN Example
-- Show customer name with their order amount
SELECT
c.name,
c.city,
o.order_id,
o.amount
FROM customers c
INNER JOIN orders o
ON c.customer_id = o.customer_id;
LEFT JOIN Example
Use LEFT JOIN when you want all customers, even customers who have not placed any order.
SELECT
c.name,
o.order_id,
o.amount
FROM customers c
LEFT JOIN orders o
ON c.customer_id = o.customer_id;
Interview Tip
If a question says "include customers with no orders", it is usually asking for LEFT JOIN.
8. CASE: Creating Business Labels
Definition
CASE works like if-else logic. It creates a new column based on conditions.
-- Label orders by size
SELECT
order_id,
amount,
CASE
WHEN amount >= 5000 THEN 'High Value'
WHEN amount >= 1000 THEN 'Medium Value'
ELSE 'Low Value'
END AS order_segment
FROM orders;
Where CASE Is Useful
- Customer segmentation
- Risk category creation
- Pass/fail labels
- Revenue bands
-- Count orders by segment
SELECT
CASE
WHEN amount >= 5000 THEN 'High'
WHEN amount >= 1000 THEN 'Medium'
ELSE 'Low'
END AS segment,
COUNT(*) AS orders
FROM orders
GROUP BY segment;
9. Subqueries and CTEs
Subquery
A subquery is a query written inside another query. It helps answer questions in steps.
-- Orders greater than the average order amount
SELECT order_id, amount
FROM orders
WHERE amount > (
SELECT AVG(amount)
FROM orders
);
CTE: Common Table Expression
A CTE creates a temporary named result that makes complex queries easier to read.
-- Find high value customers
WITH customer_sales AS (
SELECT
customer_id,
SUM(amount) AS total_spent
FROM orders
WHERE status = 'Delivered'
GROUP BY customer_id
)
SELECT
c.name,
c.city,
cs.total_spent
FROM customer_sales cs
JOIN customers c ON cs.customer_id = c.customer_id
WHERE cs.total_spent > 50000
ORDER BY cs.total_spent DESC;
10. Window Functions
Definition
Window functions calculate values across related rows without collapsing them into one row like GROUP BY does.
Simple Difference
GROUP BY: Returns one row per group.
Window function: Keeps every row and adds calculation beside it.
-- Rank each customer's orders by amount
SELECT
customer_id,
order_id,
amount,
RANK() OVER (
PARTITION BY customer_id
ORDER BY amount DESC
) AS order_rank
FROM orders;
-- Running total of sales by date
SELECT
order_date,
amount,
SUM(amount) OVER (
ORDER BY order_date
) AS running_sales
FROM orders
WHERE status = 'Delivered';
11. SQL for Data Cleaning
Before analysis, data usually needs cleaning. SQL can handle missing values, duplicates, inconsistent text, and date formatting.
Remove Extra Spaces
SELECT
TRIM(name) AS clean_name
FROM customers;
Standardize Text
SELECT
LOWER(city) AS city_lowercase
FROM customers;
Handle Missing Values
SELECT
name,
COALESCE(city, 'Unknown') AS city
FROM customers;
Find Duplicates
SELECT email, COUNT(*) AS total FROM customers GROUP BY email HAVING COUNT(*) > 1;
12. Practice Questions
Try these questions after reading the notes. They are arranged from beginner to interview level.
Beginner
- Show all customers from Hyderabad.
- Show all delivered orders.
- Find orders with amount greater than 3000.
- Show customer names sorted alphabetically.
Intermediate
- Find total revenue by city.
- Find average order amount by month.
- List customers who never placed an order.
- Create high, medium, low order segments.
Advanced
- Rank customers by total spending.
- Find month-over-month revenue growth.
- Find each customer's highest value order.
- Find duplicate customer emails.
Mini Project
Build a sales analysis report using SQL: total revenue, city-wise revenue, top customers, cancelled order percentage, monthly growth, and customer segments. Export the result to Power BI or Tableau for dashboarding.
Quick Revision Checklist
Before moving to visualization tools, students should be comfortable with these.