Big Data – Explain structured, semi structured and unstructured data

Posted on August 9, 2019August 9, 2019 by gani37

Q.1
(a) Explain structured, semi structured and unstructured data in terms of big data analytics.
(b) Discuss Four V’s of Big Data.
(c) What are the advantages of Hadoop? Draw Hadoop ecosystem and explain its components.

Q.2
(a) Differentiate SQL and NOSQL.
(b) Explain working of reduce phase of MapReduce with an example.
(c) Define HDFS. Describe namenode, datanode and block. Explain HDFS operations in detail. OR
(d) What is HBase? Explain storage mechanism of HBase with an example.

Q.3
(a) How to create collection in MongoDB? Explain with its syntax.
(b) Write the use and syntax of following HDFS commands:
i. put
ii. expunge
iii. chmod
iv. get
(c) What is RDD? Explain transformations and actions in RDD. Explain RDD operations in brief.

Q.4
(a) Write down the differences between Apache Pig and MapReduce.
(b) Explain Five P’s of Big Data in brief.
(c) Justify: “SPARK is faster than MapReduce”.

Q.5
(a) What is Apache Pig and why do we need it?
(b) Explain the components of SPARK.
(c) Explain CRUD operations of MongoDB with an example.

Q.6
(a) Write down the goals of HDFS.
(b) Explain MongoDB sharding process.
(c) Discuss the applications of big data analytics in weather forecasting.

Q.7
(a) Explain benefits of ZooKeeper.
(b) Discuss Machine Learning with MLlib in SPARK.
(c) What is NoSQL? List out the features of NoSQL. Explain types of NoSQL databases in brief.

Q.8
(a) Define Term Frequency and Inverse Document Frequency.
(b) Which terms are used for table, row, column and table-join in MongoDB?
(c) Explain the architecture and features of HIVE.