1/ What is the difference between HDFS and traditional file systems?
2/ Explain the concept of MapReduce and how it works.
3/ What is YARN in Hadoop, and how does it improve resource management?
4/ How does Hadoop ensure fault tolerance with HDFS?
5/ What are the differences between Hive and HBase? When would you use each?
6/ How would you handle small files in Hadoop?
7/ Explain the role of block size in HDFS and its impact on performance.
8/ How do you optimize a slow MapReduce job?
9/ What is the difference between Direct Query and Import Mode in Power BI? When would you use each?
10/ Explain DAX. What are some commonly used DAX functions?
11/ How do you optimize Power BI reports for large datasets?
12/ What is Row-Level Security (RLS), and how do you implement it in Power BI?
13/ How do you create relationships between tables in Power BI?
14/ What are the steps to refresh and schedule data updates in Power BI?
15/ How would you troubleshoot performance issues in a Power BI report?
16/ What is the Power Query Editor, and how do you use it to transform data?
17/ What are the key steps of the ETL process, and why is each step important?
18/ How do you optimize ETL pipelines for performance and scalability?
19/ Explain incremental data loading vs. full data load. When would you choose each?
20/ How do you handle data quality issues during ETL?
21/ What tools have you used for ETL, and why did you choose them?
22/ How do you handle schema changes in source systems?
23/ What is data partitioning, and how does it improve ETL performance?
24/ Describe how you would debug and resolve a failed ETL job.
25/ How would you design a scalable data pipeline to process 1 TB of data daily using AWS tools?