Apache Spark & PySpark (4 Questions)
1. What is the difference between `cache()` and `persist()` in Spark? In which scenarios would you prefer one over the other?
2. Explain the differences between `map`, `flatMap`, and `mapPartitions` in PySpark with real-time use cases.
3. What are wide and narrow transformations in Spark? Give examples of each.
4. How does Spark handle data skew during shuffling, and how can you optimize skewed joins?
HDFS (2 Questions)
5. How does the HDFS architecture ensure fault tolerance and data reliability?
6. Explain the role of the NameNode,DataNode, and Secondary NameNode in HDFS.
AWS (3 Questions)
7. How would you build a data pipeline in AWS using S3, Glue, and Athena for batch processing?
8. Compare EMR vs Glue for running Spark jobs. When would you prefer one over the other?
9. Explain how to use AWS Lambda with S3 to trigger a data transformation pipeline.
SQL (2 Questions)
10. Write a SQL query to find the second highest salary from an employee table.
11. What is the difference between `RANK()`, `DENSE_RANK()`, and `ROW_NUMBER()`? Give an example use case.
Hive (2 Questions)
12. What are the differences between internal and external tables in Hive?
13. Explain static vs dynamic partitioning in Hive with an example.
Coding Questions (3 Questions with Sample Data)
14. PySpark Coding Question:
+——-+———+——–+
| empId | dept | salary |
+——-+———+——–+
| 101 | IT | 60000 |
| 102 | HR | 50000 |
| 103 | IT | 70000 |
| 104 | HR | 45000 |
| 105 | Finance | 65000 |
Question: Write a PySpark program to find the highest-paid employee in each department.
15. SQL Coding Question:
Sample Data:
orders(order_id, customer_id, order_date, total_amount)
Question: Write a SQL query to get the total revenue generated per customer for the last 30 days.
16. Hive Coding Question:
Sample Table:
sales_data(product_id, region, sale_date, sale_amount)
Question: Write a Hive query to partition this table by region and bucket by product\_id into 5 buckets.
Data Engineer Interview Questions-Persistent Systems(3-5 Years)
By Siva
Published On:
---Advertisement---