Data Analysis for E-Commerce Challenge

Image for post
Image for post

My fourth project after completing a course in DQLab about fundamental SQL. I have completed the modules of fundamental SQL use SELECT statement, FUNCTION, GROUP BY, INNER JOIN, UNION and Having. This project is provided by DQLab and the database used is MySQL.

DATA USED IN THE PROJECT

The dataset used is data from the DQLab Store which is an e-commerce where buyers and sellers meet each other. Users can buy goods from other users who sell. Each user can be both a buyer and a seller.

There are 4 tables used in this project:

1. users table: contains details of data of user. Consists of:
• user_id: user ID
• nama_user: username
• kodepos: postcode of the user’s main address
• email: user email
2. products table: contains details of products sold. Consists of:
• product_id: product ID
• desc_product: product name
• category: product category
• base_price: original price of the product
3. orders table: contains purchase transactions from buyers to sellers. Consists of:
• order_id: transaction ID
• seller_id: seller ID
• buyer_id: buyer ID
• kodepos: postcode of the transaction delivery address (can be different from the main address)
• subtotal: total price before discount
• discount: a discount from the transaction
• total: total price after discount (paid by the buyer)
• created_at: transaction date
• paid_at: paid date
• delivery_at: delivery date
4. order_details table: contains detail of items purchased. Consists of:
• order_detail_id: ID of this table
• order_id: transaction ID
• product_id: product ID
• price: price of each product
• quantity: the number of items purchased from each product

SYNTAX

The syntax used in this project:

• SELECT is used to select data from the database
• DISTINCT is used to select only distinct (different) values or no duplicate values
• WHERE is used to filter data or extract only data that fulfill a specified condition
• ORDER BY is used to sort the result set in ascending or descending order
• HAVING was added to SQL because the WHERE keyword could not be used with aggregate functions
• COUNT is used to count the number of rows
• LIMIT is used to specify the number of records to return
• SUM is used to sum of a numeric columns
• AND is used to filter data based on more than one condition and displays a data if all conditions are fulfilled
• BETWEEN is used to selects values within a given range
• AS is used to give temporary names to a table or a column in the table
• INNER JOIN is used to selects records that have matching values in both tables
• GROUP BY is used to group rows that have the same value into summary rows
• DESC is used to sort data in descending order
• DATEDIFF is used to returns the number of days between two date values
• EXTRACT is used to extracts a part from a given date
• YEAR is used to filter years in dates
• MONTH is used to filter months in dates
• SUBSTRING_INDEX is used to returns a substring of a string before a specified number of delimiter occurs

SHORT TASK

1. Which statement is true about products data
A. There are 4 columns in products data
B. There are 1,145 rows in products data
C. There are 13 types of product categories
D. There is 1 variable that has a NULL/empty value

answer: A and B

2. Which statement is true about orders data
A. There are 10 columns in orders data
B. There are 74,874 rows in orders data
C. There are 3 variables that have NULL/empty values
D. There are 3 variables that contain amount data(rupiah)
E. There are 3 variables that contain date data

answer: A, B, D, and E

3. Which statement is true about monthly summary
A. There are 4,327 transactions in September 2019
B. There are 10,131 transactions in November 2019
C. There are 5,062 transactions in January 2020
D. There are 7,323 transactions in March 2020
E. There are 10,026 transactions in May 2020

answer: A, C, D and E

4. Which statement is true about summary of transaction data
A. There are 5,046 unpaid transactions
B. There are 5,046 paid transactions but not send
C. There are 9,790 unsend transactions, whether paid or not
D. There are 4,744 transactions that send on the same day as the paid date

answer: A and C

5. Which statement is true about summary of user and transaction data
A. There are 17,877 total users
B. There are 17,877 users who transact as buyers
C. There are 69 users who transact as sellers
D. There are 69 users who transact as buyers and sellers
E. There are 69 users who never transact as buyers or sellers

answer: B, C and D

6. From the list of user_id and username below, which are the 5 buyers with the highest total amount of purchases (based on the total price of items after discount)
A. 14411, Jaga Puspasari
B. 10977, Lukita Dabukke
C. 1251, Kartika Habibi
D. 15915, Sutan Agus Ardianto, S.Kom
E. 10355, Kartika Habibi

answer: A, D, and E

7. From the list of usernames below, which are the 5 buyers with the most transactions who have never used a discount when purchased the items.
A. 10977, Lukita Dabukke
B. 1251, Kartika Habibi
C. 12476, Yessi Wibisono
D. 696, Kayla Astuti
E. 5620, Cakrawangsa Habibi

answer: C and E

8. From the list of domains below, which are email domains from sellers at DQLab Store
A. cv.web.id
B. cv.com
C. cv.net.id
D. pt.net.id
E. ud.co.id

answer: A and D

9. From the list of product names below, which are the top 5 products purchased in December 2019 based on the total quantity
A. QUEEN CEFA BRACELET LEATHER
B. ANNA FAITH LEGGING GLOSSY
C. Ajinomoto Bumbu Nasi Goreng Sajiku Ayam 20G
D. EMBA SHORT PANT INATH TWO
E. ANNA FAITH LEGGING GLOSSY

answer: A, B and E

CREATE SQL

  1. The 10 highest transactions of users 12476

Show table of seller_id, buyer_id, nilai_transaksi, and tanggal_transaksi.

2. Transactions per month in 2020

Show table of tahun_bulan, jumlah_transaksi, and total_nilai_transaksi.

3. Buyer with the highest average transaction in January 2020

Show 10 buyers with the highest average transaction who transacted at least 2 times in January 2020. Show table of buyer_id, jumlah_transaksi, dan avg_nilai_transaksi.

4. Big transaction in December 2019

Show all transactions with a minimum value of 20,000,000 in December 2019. Show table of nama_pembeli, nilai transaksi and tanggal_transaksi, sort alphabetically by buyer.

5. Best seller product category in 2020

Show 5 categories with the highest total quantity in 2020, only for transactions that have been sent to buyers. Show table of category, total_quantity, and total_price.

6. Buyer with high value

Show buyers who have transacted more than 5 times and each transaction more than 2,000,000. Show table of nama_pembeli, jumlah_transaksi, total_nilai_transaksi, and min_nilai_transaksi, sort by the highest total_nilai_transaksi.

7. Dropshipper

Show buyers with 10 or more transactions whose shipping address is different for each transaction. Show table of nama_pembeli, jumlah_transaksi, distinct_kodepos, total_nilai_transaksi, and avg_nilai_transaksi, sort by the highest number of transactions.

8. Offline reseller

Show buyers with 8 or more transactions whose shipping address is the same as the main shipping address and the average total quantity per transaction is more than 10. Show table of nama_pembeli, jumlah_transaksi, total_nilai_transaksi, avg_nilai_transaksi, and avg_quantity_per_transaksi, sort by the highest total_nilai_transaksi.

9. Buyer and seller at once

Show the seller who has transacted as a buyer at least 7 times. Show table of nama_pengguna, jumlah_transaksi_beli, and jumlah_transaksi_jual, sort alphabetically by nama_pengguna.

10. Transaction time

Show the average time from transaction made to payment. Show table of tahun_bulan, jumlah_transaksi, avg_lama_dibayar, min_lama_dibayar, and max_lama_dibayar, group by month and sort by tahun_bulan.

Be happy

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store