Data Analysis for E-Commerce Challenge

My fourth project after completing a course in DQLab about fundamental SQL. I have completed the modules of fundamental SQL use SELECT statement, FUNCTION, GROUP BY, INNER JOIN, UNION and Having. This project is provided by DQLab and the database used is MySQL.

DATA USED IN THE PROJECT

The dataset used is data from the DQLab Store which is an e-commerce where buyers and sellers meet each other. Users can buy goods from other users who sell. Each user can be both a buyer and a seller.

There are 4 tables used in this project:

1. users table: contains details of data of user. Consists of:
• user_id: user ID
• nama_user: username
• kodepos: postcode of the user’s main address
• email: user email
2. products table: contains details of products sold. Consists of:
• product_id: product ID
• desc_product: product name
• category: product category
• base_price: original price of the product
3. orders table: contains purchase transactions from buyers to sellers. Consists of:
• order_id: transaction ID
• seller_id: seller ID
• buyer_id: buyer ID
• kodepos: postcode of the transaction delivery address (can be different from the main address)
• subtotal: total price before discount
• discount: a discount from the transaction
• total: total price after discount (paid by the buyer)
• created_at: transaction date
• paid_at: paid date
• delivery_at: delivery date
4. order_details table: contains detail of items purchased. Consists of:
• order_detail_id: ID of this table
• order_id: transaction ID
• product_id: product ID
• price: price of each product
• quantity: the number of items purchased from each product

SYNTAX

The syntax used in this project:

• SELECT is used to select data from the database
• DISTINCT is used to select only distinct (different) values or no duplicate values
• WHERE is used to filter data or extract only data that fulfill a specified condition
• ORDER BY is used to sort the result set in ascending or descending order
• HAVING was added to SQL because the WHERE keyword could not be used with aggregate functions
• COUNT is used to count the number of rows
• LIMIT is used to specify the number of records to return
• SUM is used to sum of a numeric columns
• AND is used to filter data based on more than one condition and displays a data if all conditions are fulfilled
• BETWEEN is used to selects values within a given range
• AS is used to give temporary names to a table or a column in the table
• INNER JOIN is used to selects records that have matching values in both tables
• GROUP BY is used to group rows that have the same value into summary rows
• DESC is used to sort data in descending order
• DATEDIFF is used to returns the number of days between two date values
• EXTRACT is used to extracts a part from a given date
• YEAR is used to filter years in dates
• MONTH is used to filter months in dates
• SUBSTRING_INDEX is used to returns a substring of a string before a specified number of delimiter occurs

SHORT TASK

1. Which statement is true about products data
A. There are 4 columns in products data
B. There are 1,145 rows in products data
C. There are 13 types of product categories
D. There is 1 variable that has a NULL/empty value

answer: A and B

#to see the columns in the products table
DESCRIBE products;
Figure 1. Output result for products data

2. Which statement is true about orders data
A. There are 10 columns in orders data
B. There are 74,874 rows in orders data
C. There are 3 variables that have NULL/empty values
D. There are 3 variables that contain amount data(rupiah)
E. There are 3 variables that contain date data

answer: A, B, D, and E

#to see the columns in the orders table
DESCRIBE orders;
Figure 2. Output results for orders data

3. Which statement is true about monthly summary
A. There are 4,327 transactions in September 2019
B. There are 10,131 transactions in November 2019
C. There are 5,062 transactions in January 2020
D. There are 7,323 transactions in March 2020
E. There are 10,026 transactions in May 2020

answer: A, C, D and E

SELECT EXTRACT(YEAR_MONTH FROM created_at) AS date,
COUNT(order_id) AS total_transaction
FROM orders
GROUP BY date
ORDER BY date;
Figure 3. Output result for monthly transaction

4. Which statement is true about summary of transaction data
A. There are 5,046 unpaid transactions
B. There are 5,046 paid transactions but not send
C. There are 9,790 unsend transactions, whether paid or not
D. There are 4,744 transactions that send on the same day as the paid date

answer: A and C

#to see unpaid transactions
SELECT COUNT(order_id) AS total_transaction
FROM orders
WHERE paid_at IS NULL;
Figure 4. Output result for transaction data

5. Which statement is true about summary of user and transaction data
A. There are 17,877 total users
B. There are 17,877 users who transact as buyers
C. There are 69 users who transact as sellers
D. There are 69 users who transact as buyers and sellers
E. There are 69 users who never transact as buyers or sellers

answer: B, C and D

#to see the total users
SELECT COUNT(DISTINCT user_id) AS user
FROM users;
Figure 5. Output result for user data

6. From the list of user_id and username below, which are the 5 buyers with the highest total amount of purchases (based on the total price of items after discount)
A. 14411, Jaga Puspasari
B. 10977, Lukita Dabukke
C. 1251, Kartika Habibi
D. 15915, Sutan Agus Ardianto, S.Kom
E. 10355, Kartika Habibi

answer: A, D, and E

SELECT user_id, 
nama_user,
SUM(total) AS total
FROM users
INNER JOIN orders
ON user_id = buyer_id
GROUP BY user_id, nama_user
ORDER BY total DESC
LIMIT 5;
Figure 6. Output result for 5 users with the highest total amount of purchases

7. From the list of usernames below, which are the 5 buyers with the most transactions who have never used a discount when purchased the items.
A. 10977, Lukita Dabukke
B. 1251, Kartika Habibi
C. 12476, Yessi Wibisono
D. 696, Kayla Astuti
E. 5620, Cakrawangsa Habibi

answer: C and E

SELECT user_id, 
nama_user,
COUNT(order_id) AS total
FROM users
INNER JOIN orders
ON user_id = buyer_id
WHERE discount = 0
GROUP BY user_id, nama_user
ORDER BY total DESC
LIMIT 10;
Figure 7. Output result for users who have never used a discount

8. From the list of domains below, which are email domains from sellers at DQLab Store
A. cv.web.id
B. cv.com
C. cv.net.id
D. pt.net.id
E. ud.co.id

answer: A and D

SELECT DISTINCT(SUBSTRING_INDEX(email, ‘@’, -1)) AS email
FROM users
INNER JOIN orders
ON user_id = seller_id
WHERE user_id = seller_id
ORDER BY email;
Figure 8. Output result for email sellers

9. From the list of product names below, which are the top 5 products purchased in December 2019 based on the total quantity
A. QUEEN CEFA BRACELET LEATHER
B. ANNA FAITH LEGGING GLOSSY
C. Ajinomoto Bumbu Nasi Goreng Sajiku Ayam 20G
D. EMBA SHORT PANT INATH TWO
E. ANNA FAITH LEGGING GLOSSY

answer: A, B and E

SELECT a.desc_product AS name, 
SUM(b.quantity) AS quantity
FROM products AS a
INNER JOIN order_details AS b
ON a.product_id = b.product_id
INNER JOIN orders AS c
ON b.order_id = c.order_id
WHERE c.created_at BETWEEN ‘2019–12–01’ AND ‘2019–12–31’
GROUP BY name
ORDER BY quantity DESC
LIMIT 5;
Figure 9. Output result for the top 5 products in December 2019

CREATE SQL

  1. The 10 highest transactions of users 12476

Show table of seller_id, buyer_id, nilai_transaksi, and tanggal_transaksi.

SELECT seller_id, 
buyer_id,
total AS nilai_transaksi,
created_at AS tanggal_transaksi
FROM orders
WHERE buyer_id = 12476
ORDER BY 3 desc
LIMIT 10;
Figure 10. Output result for the 10 highest transactions of users 12476

2. Transactions per month in 2020

Show table of tahun_bulan, jumlah_transaksi, and total_nilai_transaksi.

SELECT EXTRACT(YEAR_MONTH FROM created_at) AS tahun_bulan, 
COUNT(1) AS jumlah_transaksi,
SUM(total) AS total_nilai_transaksi
FROM orders
WHERE created_at >= ‘2020–01–01’
GROUP BY 1
ORDER BY 1;
Figure 11. Output result for transactions per month in 2020

3. Buyer with the highest average transaction in January 2020

Show 10 buyers with the highest average transaction who transacted at least 2 times in January 2020. Show table of buyer_id, jumlah_transaksi, dan avg_nilai_transaksi.

SELECT buyer_id, 
COUNT(1) AS jumlah_transaksi,
AVG(total) AS avg_nilai_transaksi
FROM orders
WHERE created_at >= ‘2020–01–01’ AND created_at<’2020–02–01'
GROUP BY 1
HAVING COUNT(1) >= 2
ORDER BY 3 DESC
LIMIT 10;
Figure 12. Output result for user with the highest average transaction in January 2020

4. Big transaction in December 2019

Show all transactions with a minimum value of 20,000,000 in December 2019. Show table of nama_pembeli, nilai transaksi and tanggal_transaksi, sort alphabetically by buyer.

SELECT nama_user AS nama_pembeli, 
total AS nilai_transaksi,
created_at AS tanggal_transaksi
FROM orders
INNER JOIN users
ON buyer_id = user_id
WHERE created_at >= ‘2019–12–01’
AND created_at <’2020–01–01'
AND total >= 20000000
ORDER BY 1;
Figure 13. Output result for big transactions in December 2019

5. Best seller product category in 2020

Show 5 categories with the highest total quantity in 2020, only for transactions that have been sent to buyers. Show table of category, total_quantity, and total_price.

SELECT category, 
SUM(quantity) AS total_quantity,
SUM(price) AS total_price
FROM orders
INNER JOIN order_details USING (order_id)
INNER JOIN products USING (product_id)
WHERE created_at >= ‘2020–01–01’
AND delivery_at IS NOT NULL
GROUP BY 1
ORDER BY 2 DESC
LIMIT 5;
Figure 14. Output results for the best seller product category in 2020

6. Buyer with high value

Show buyers who have transacted more than 5 times and each transaction more than 2,000,000. Show table of nama_pembeli, jumlah_transaksi, total_nilai_transaksi, and min_nilai_transaksi, sort by the highest total_nilai_transaksi.

SELECT nama_user AS nama_pembeli, 
COUNT(1) AS jumlah_transaksi,
SUM(total) AS total_nilai_transaksi,
MIN(total) AS min_nilai_transaksi
FROM orders
INNER JOIN users
ON buyer_id = user_id
GROUP BY user_id, nama_user
HAVING COUNT(1) > 5
AND MIN(total) > 2000000
ORDER BY 3 DESC;
Figure 15. Output result for high value

7. Dropshipper

Show buyers with 10 or more transactions whose shipping address is different for each transaction. Show table of nama_pembeli, jumlah_transaksi, distinct_kodepos, total_nilai_transaksi, and avg_nilai_transaksi, sort by the highest number of transactions.

SELECT nama_user AS nama_pembeli,
COUNT(1) AS jumlah_transaksi,
COUNT(DISTINCT orders.kodepos) AS distinct_kodepos,
SUM(total) AS total_nilai_transaksi,
AVG(total) AS avg_nilai_transaksi
FROM orders
INNER JOIN users
ON buyer_id = user_id
GROUP BY user_id, nama_user
HAVING COUNT(1) >= 10
AND COUNT(1) = COUNT(DISTINCT orders.kodepos)
ORDER BY 2 DESC;
Figure 16. Output result for dropshipper

8. Offline reseller

Show buyers with 8 or more transactions whose shipping address is the same as the main shipping address and the average total quantity per transaction is more than 10. Show table of nama_pembeli, jumlah_transaksi, total_nilai_transaksi, avg_nilai_transaksi, and avg_quantity_per_transaksi, sort by the highest total_nilai_transaksi.

SELECT nama_user AS nama_pembeli,
COUNT(1) AS jumlah_transaksi,
SUM(total) AS total_nilai_transaksi,
AVG(total) AS avg_nilai_transaksi,
AVG(total_quantity) AS avg_quantity_per_transaksi
FROM orders
INNER JOIN users
ON buyer_id = user_id
INNER JOIN (
SELECT order_id,
SUM(quantity) AS total_quantity
FROM order_details
GROUP BY 1
) AS summary_order
USING (order_id)
WHERE orders.kodepos = users.kodepos
GROUP BY user_id, nama_user
HAVING COUNT(1) >= 8
AND AVG(total_quantity) > 10
ORDER BY 3 DESC;
Figure 17. Output results for offline reseller

9. Buyer and seller at once

Show the seller who has transacted as a buyer at least 7 times. Show table of nama_pengguna, jumlah_transaksi_beli, and jumlah_transaksi_jual, sort alphabetically by nama_pengguna.

SELECT nama_user AS nama_pengguna,
jumlah_transaksi_beli,
jumlah_transaksi_jual
FROM users
INNER JOIN (
SELECT buyer_id,
COUNT(1) AS jumlah_transaksi_beli
FROM orders
GROUP BY 1
) AS buyer
ON buyer_id = user_id
INNER JOIN (
SELECT seller_id,
COUNT(1) AS jumlah_transaksi_jual
FROM orders
GROUP BY 1
) AS seller
ON seller_id = user_id
WHERE jumlah_transaksi_beli >= 7
ORDER BY 1;
Figure 18. Output result for buyer and seller

10. Transaction time

Show the average time from transaction made to payment. Show table of tahun_bulan, jumlah_transaksi, avg_lama_dibayar, min_lama_dibayar, and max_lama_dibayar, group by month and sort by tahun_bulan.

SELECT EXTRACT(YEAR_MONTH FROM created_at) AS tahun_bulan,
COUNT(1) AS jumlah_transaksi,
AVG(DATEDIFF(paid_at, created_at)) AS avg_lama_dibayar,
MIN(DATEDIFF(paid_at, created_at)) min_lama_dibayar,
MAX(DATEDIFF(paid_at, created_at)) max_lama_dibayar
FROM orders
WHERE paid_at is NOT NULL
GROUP BY 1
ORDER BY 1;
Figure 19. Output result for transaction time

Be happy

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store