Google Cloud Data Engineer Interviews Questions
Google cloud data engineer Interviews Questions for interview preparations.
Tell me something about your past/current projects and your roles & responsibilities.
1.Explain google cloud bigquery architecture?
2. When did you face a challenge in dealing with unstructured data and how did you solve it?
3. Tell me something about your past data engineer activity which you were doing in current and previous projects.
4. Explain the difference between structured data and unstructured data?
5. What are the design schemas of data modeling?
6. Tell me some of the important features of Hadoop.
7. Which ETL tools have you worked?
8. What is the difference between OLAP and OLTP?
9.Do you have experience in writing SQL queries?
What is the difference between group by and order by?
10.How much experince in Data Warehouse do you have ?Which datawarhouse have you worked on .Explain little bit what activity you have done ?
11.What is Data Lake?Why it is needed?
12.How good you are in python programming ?Can write a code in python If a given number is Even — divide by two. If a given number is Odd — multiply by 3 and add 1.
13.SQL query scenarios based question
Customers and Orders are the two tables, Cust_id is common in both tables, we would like to get list of all records from customers tables only when
That particular cust_id also present in orders tables
orders of that customer greater than 5000.
What is a stored procedure in SQL? Please write a syntax for same?
Why do we create INDEX? Write a syntax to create composite index.
14. What is Pub/Sub? Explain about topic and subscriptions ?
10. What is BigTable?Where you should use this?
11. What is the difference between BigQuery and BigTable?
12. What is the difference between Dataflow and Data proc?
13. How you can optimize performance improvement in BigQuery?
14. What is Cloud Scheduler? how you can configure it.
15. What is data modeling?
16. What is Data Fusion? What is the use case if you have used explain it in details.
17.What is Data Composer?
18.What is Data Catalogs?
19.What is looker?
20.Cloud composer airflow UI is returning 502 error stating try after some time. What could be the problem? And how to resolve this issue?
21.How would you transform unstructured data into structured data?
22.What is the difference between SQL and NoSQL databases?
23.How to create stream and batch pipelines and what is the difference between it?
24.What is diffrence between ELT and ETL ?
25.In which case we should use data fusion and data compuser ?
26.What is a Google Cloud Storage bucket?
27.What is Object Versioning?
28.What is bigquery autherized view ?
29.What is clustering and partioning in biquery
30.Why Biquery job role is required ?
31.How to connect biquery with looker?
32.How to export and import data from biquery?
33.If some one delete dataset and table how you can recover in bigquery?
34.What are the key features in dataflow and dataproc?
35.How does dataflow work?
36. if you need to process your data continuously in streaming pipeline which gcp service will be fit for this requirements ?
37. Which gcp service can be used for Apache Spark and Apache Hadoop service .
38. What is the benefits of migrating Hadoop to GCP?
39. For example one customer has Hadoop/Spark workloads on premise. They want to move to Google Cloud quickly.Which product should they choose in gcp?
40. What are the type of BiqQuery Tables?
41. What is the difference between standard views
and Materialized Views?
42. What is materialised views?
43.Do you have any migration experience in datawarhouse if yes how did you create migration plan and how you successfully completed ?
44.What are the Data Analytics services provided by GCP?
45.Can you explain what a pipeline and data flow is in the context of Apache Beam?
46.What are the benefits of using Spark vs. Dataflow?
47. What are the components of the Apache Airflow architecture?
48.What are the types of Executors in Airflow?
49. How to schedule DAG in Airflow?
What is the difference between @staticmethod and @classmethod?
What is Companion Object in scala
What do you understand by “Unit” and “()” in Scala?
What is the difference in cache() and persist() methods in Apache Spark?
How do you convert existing RDDs to Spark Datasets?
What is Unique about BigQuery Cloud Data Warehouse?
What is BigQuery Caching ?
What is Object Versioning in GCP?
How to set up Apache Spark and Jupyter Notebooks on Cloud Dataproc
How to use the DLP API to automatically classify data uploaded to Cloud Storage
How to access files in Cloud Storage with the Spring Resource abstraction
How does data compression works in BigQuery ?
Can you explain what a pipeline and data flow is in the context of Apache Beam?
Which programming languages can be used to develop applications that run on Google Cloud Platform?
What are the types of Executors in Airflow?
What is pep 8?
How stages are been decided by Spark, means how execution job is split into stages?
What are the optimization techniques you use for your Spark job?
- How to create a bucket name test_bucket in GCP?
- What is the syntax to create a bucket using gsutil command in GCP?
- What is the permission required to create backups in GCP?
- I have containerized data processing jobs; I want to run them sequentially, i.e., also be one after another — how can I achieve it?
- What limitations of cloud dataflow are you aware of?
- Why wouldn’t you directly stream data to BigQuery?
- How do we monitor, trace or capture logs in GCP?
- What are the different authentication mechanisms for the GCE API?
- How Scaling works in Google Cloud Platform?
- Suppose RuntimeException occurred that was not caught. What happens to the flow? Is there a way to know that an Exception has occurred (without enclosing the entire body in a try block)? Is there a way to restore the thread after it happened?
To Check Python Program
Write program to print all prime numbers between 1 and 1000
for num in range(2, 1000):
is_prime = True
for i in range(2, num):
if (num % i) == 0:
is_prime = False
break
if is_prime:
print(num)
You are given a string input, let’s say “Hello World!”. Your output should be, how many times each character is showing up in the string. So, in this example, it should write:
H: 1 e: 1 l: 3 o: 2
input_string = "Hello World!"
char_count = {}
# Loop through each character in the input string
for char in input_string:
# If the character is already in the dictionary, increment its count
if char in char_count:
char_count[char] += 1
# If the character is not yet in the dictionary, add it with a count of 1
else:
char_count[char] = 1
# Print out the results
for char, count in char_count.items():
print(char + ":", count)
What are decorators in Python?
In Python, a decorator is a function that takes another function as input, adds some functionality to the input function, and returns the input function with the added functionality. Decorators provide a way to modify or extend the behavior of functions or classes without modifying their source code directly.
Decorators are denoted by the @decorator_name syntax, which is placed immediately before the function or class that is being decorated. Here's an example of a decorator function that adds logging functionality to a function:
What are local variables and global variables in Python?
A local variable is a variable that is defined inside a function and can only be accessed within that function. Local variables are usually created when a function is called and destroyed when the function completes. Here's an example of a function with a local variable:
python
What is an iterator? How do you implement an iterator, what is a generator?
if you need any google cloud technical business solution guidance and help. Please let me know which area you want to know from me in GCP.
You can also follow and connect with me on Twitter @bgiri_gcloud and Linkedin https://www.linkedin.com/in/biswanathgirigcloudcertified/