Back to Data Science
Arise Tech Academy
Arise Tech Academy
Learn. Build. Rise.
Data Science Notes | Module 02

SQL Notes for
Data Science

SQL is the language used to ask questions from data stored in tables. These notes are written for non-technical beginners, with definitions, diagrams, examples, and practice queries.

12
Core topics
35+
Query examples
8
Practice sets

How SQL Fits in Data Science

1. Data

Sales, users, payments, app logs

2. Database

Data stored in tables

3. SQL

Ask questions

4. Python/BI

Analyze and visualize

5. Decision

Business action

-- Example business question:
-- Which city generated the highest sales?

SELECT city, SUM(amount) AS total_sales
FROM orders
GROUP BY city
ORDER BY total_sales DESC;

1. What is SQL?

Definition

SQL stands for Structured Query Language. It is used to store, read, filter, combine, summarize, and manage data inside relational databases.

Think of a database as a well-organized cupboard, tables as shelves, rows as individual items, and columns as labels describing each item. SQL is how we ask the cupboard: "Show me only the items I need."

Why Data Science Students Need SQL

  • Most company data lives in databases.
  • SQL helps you extract the exact data needed for analysis.
  • It is a common interview skill for Data Analyst, Data Scientist, BI, and Data Engineer roles.
  • It reduces dependency on manual Excel exports.

SQL Is Used For

  • Finding top customers by revenue.
  • Calculating monthly sales growth.
  • Combining customer, order, and product tables.
  • Preparing clean datasets for Python, Tableau, or Power BI.

2. Database Basics

Table

A table is data arranged in rows and columns, similar to an Excel sheet.

Row

A row is one complete record. Example: one customer or one order.

Column

A column is one property of every row. Example: name, city, amount.

Primary Key

A column that uniquely identifies each row. Example: customer_id.

Common SQL Data Types

Data TypeMeaningExample
INTWhole number101, 5000
DECIMALNumber with decimals499.99, 75.50
VARCHARText'Hyderabad', 'Laptop'
DATEDate value'2026-05-05'
BOOLEANTrue or false valueTRUE, FALSE

3. Sample Dataset Used in These Notes

We will use a simple e-commerce dataset. This makes SQL easy to understand because every table has a clear business meaning.

customers

customer_id - primary key

name

city

signup_date

orders

order_id - primary key

customer_id - links to customers

order_date

amount

status

products

product_id - primary key

product_name

category

price

Relationship Diagram

customers

customer_id

1 : many

orders

customer_id

product_id

many : 1

products

product_id

4. SELECT Queries

Definition

SELECT is used to choose which columns you want to see from a table.

-- Basic structure
SELECT column1, column2
FROM table_name;

Example 1: Show All Customers

SELECT *
FROM customers;

Example 2: Show Selected Columns

SELECT name, city
FROM customers;

Important Beginner Tip

Avoid using SELECT * in real projects when you need only a few columns. Selecting unnecessary columns makes queries slower and harder to read.

5. Filtering Data with WHERE

WHERE is used to keep only rows that match a condition. In data analysis, filtering is one of the most common operations.

-- Customers from Hyderabad
SELECT name, city
FROM customers
WHERE city = 'Hyderabad';
OperatorMeaningExample
=Equal tocity = 'Hyderabad'
!= or <>Not equal tostatus != 'Cancelled'
>Greater thanamount > 1000
BETWEENWithin a rangeamount BETWEEN 500 AND 2000
INMatches any value in a listcity IN ('Hyderabad','Pune')
LIKEPattern matchingname LIKE 'A%'
-- Orders above 2000 rupees
SELECT order_id, amount
FROM orders
WHERE amount > 2000;
-- Orders placed in 2026
SELECT order_id, order_date, amount
FROM orders
WHERE order_date BETWEEN '2026-01-01'
                     AND '2026-12-31';

6. Aggregation: Turning Rows into Summary

Definition

Aggregation means combining many rows to create one useful summary value, such as total sales, average order amount, or number of customers.

FunctionUseExample Question
COUNT()Counts rowsHow many orders?
SUM()Adds valuesTotal revenue?
AVG()Average valueAverage order amount?
MIN()Smallest valueLowest price?
MAX()Largest valueHighest sale?
-- Overall sales summary
SELECT
    COUNT(*) AS total_orders,
    SUM(amount) AS total_revenue,
    AVG(amount) AS average_order_value,
    MAX(amount) AS highest_order
FROM orders
WHERE status = 'Delivered';

GROUP BY

GROUP BY creates separate summaries for each group. For example, total sales by city or month.

-- Total sales by city
SELECT
    c.city,
    SUM(o.amount) AS total_sales
FROM customers c
JOIN orders o ON c.customer_id = o.customer_id
GROUP BY c.city
ORDER BY total_sales DESC;

WHERE vs HAVING

WHERE

Filters rows before grouping.

HAVING

Filters grouped summary results.

-- Cities with sales above 100000
SELECT c.city, SUM(o.amount) AS total_sales
FROM customers c
JOIN orders o ON c.customer_id = o.customer_id
WHERE o.status = 'Delivered'
GROUP BY c.city
HAVING SUM(o.amount) > 100000;

7. JOINs: Combining Tables

Definition

A JOIN combines rows from two or more tables using a related column, usually a primary key and foreign key.

INNER JOIN
matching rows
LEFT JOIN
all left rows
RIGHT JOIN
all right rows
FULL JOIN
all rows

INNER JOIN Example

-- Show customer name with their order amount
SELECT
    c.name,
    c.city,
    o.order_id,
    o.amount
FROM customers c
INNER JOIN orders o
    ON c.customer_id = o.customer_id;

LEFT JOIN Example

Use LEFT JOIN when you want all customers, even customers who have not placed any order.

SELECT
    c.name,
    o.order_id,
    o.amount
FROM customers c
LEFT JOIN orders o
    ON c.customer_id = o.customer_id;

Interview Tip

If a question says "include customers with no orders", it is usually asking for LEFT JOIN.

8. CASE: Creating Business Labels

Definition

CASE works like if-else logic. It creates a new column based on conditions.

-- Label orders by size
SELECT
    order_id,
    amount,
    CASE
        WHEN amount >= 5000 THEN 'High Value'
        WHEN amount >= 1000 THEN 'Medium Value'
        ELSE 'Low Value'
    END AS order_segment
FROM orders;

Where CASE Is Useful

  • Customer segmentation
  • Risk category creation
  • Pass/fail labels
  • Revenue bands
-- Count orders by segment
SELECT
    CASE
        WHEN amount >= 5000 THEN 'High'
        WHEN amount >= 1000 THEN 'Medium'
        ELSE 'Low'
    END AS segment,
    COUNT(*) AS orders
FROM orders
GROUP BY segment;

9. Subqueries and CTEs

Subquery

A subquery is a query written inside another query. It helps answer questions in steps.

-- Orders greater than the average order amount
SELECT order_id, amount
FROM orders
WHERE amount > (
    SELECT AVG(amount)
    FROM orders
);

CTE: Common Table Expression

A CTE creates a temporary named result that makes complex queries easier to read.

-- Find high value customers
WITH customer_sales AS (
    SELECT
        customer_id,
        SUM(amount) AS total_spent
    FROM orders
    WHERE status = 'Delivered'
    GROUP BY customer_id
)
SELECT
    c.name,
    c.city,
    cs.total_spent
FROM customer_sales cs
JOIN customers c ON cs.customer_id = c.customer_id
WHERE cs.total_spent > 50000
ORDER BY cs.total_spent DESC;

10. Window Functions

Definition

Window functions calculate values across related rows without collapsing them into one row like GROUP BY does.

Simple Difference

GROUP BY: Returns one row per group.

Window function: Keeps every row and adds calculation beside it.

-- Rank each customer's orders by amount
SELECT
    customer_id,
    order_id,
    amount,
    RANK() OVER (
        PARTITION BY customer_id
        ORDER BY amount DESC
    ) AS order_rank
FROM orders;
-- Running total of sales by date
SELECT
    order_date,
    amount,
    SUM(amount) OVER (
        ORDER BY order_date
    ) AS running_sales
FROM orders
WHERE status = 'Delivered';

11. SQL for Data Cleaning

Before analysis, data usually needs cleaning. SQL can handle missing values, duplicates, inconsistent text, and date formatting.

Remove Extra Spaces

SELECT
    TRIM(name) AS clean_name
FROM customers;

Standardize Text

SELECT
    LOWER(city) AS city_lowercase
FROM customers;

Handle Missing Values

SELECT
    name,
    COALESCE(city, 'Unknown') AS city
FROM customers;

Find Duplicates

SELECT email, COUNT(*) AS total
FROM customers
GROUP BY email
HAVING COUNT(*) > 1;

12. Practice Questions

Try these questions after reading the notes. They are arranged from beginner to interview level.

Beginner

  1. Show all customers from Hyderabad.
  2. Show all delivered orders.
  3. Find orders with amount greater than 3000.
  4. Show customer names sorted alphabetically.

Intermediate

  1. Find total revenue by city.
  2. Find average order amount by month.
  3. List customers who never placed an order.
  4. Create high, medium, low order segments.

Advanced

  1. Rank customers by total spending.
  2. Find month-over-month revenue growth.
  3. Find each customer's highest value order.
  4. Find duplicate customer emails.

Mini Project

Build a sales analysis report using SQL: total revenue, city-wise revenue, top customers, cancelled order percentage, monthly growth, and customer segments. Export the result to Power BI or Tableau for dashboarding.

Quick Revision Checklist

Before moving to visualization tools, students should be comfortable with these.

Arise Tech Academy
I can explain tables, rows, columns, keys, and relationships.
I can write SELECT, WHERE, ORDER BY, and LIMIT queries.
I can use COUNT, SUM, AVG, MIN, MAX with GROUP BY.
I know when to use WHERE and when to use HAVING.
I can combine tables using INNER JOIN and LEFT JOIN.
I can write CASE, CTE, and simple window function queries.