Hive is a Big Data processing tool that helps you leverage the power of distributed computing and Hadoop for analytical processing. Its interface is somewhat similar to SQL, but with some key differences. This course is an end-to-end guide to using Hive and connecting the dots to SQL. It’s perfect for both professional and aspiring data analysts and engineers alike. Don’t know SQL? No problem, there’s a primer included in this course!
- Access 86 lectures & 15 hours of content 24/7
- Write complex analytical queries on data in Hive & uncover insights
- Leverage ideas of partitioning & bucketing to optimize queries in Hive
- Customize Hive w/ user defined functions in Java & Python
- Understand what goes on under the hood of Hive w/ HDFS & MapReduce
Big Data sounds pretty daunting doesn’t it? Well, this course aims to make it a lot simpler for you. Using Hadoop and MapReduce, you’ll learn how to process and manage enormous amounts of data efficiently. Any company that collects mass amounts of data, from startups to Fortune 500, need people fluent in Hadoop and MapReduce, making this course a must for anybody interested in data science.
- Access 71 lectures & 13 hours of content 24/7
- Set up your own Hadoop cluster using virtual machines (VMs) & the Cloud
- Understand HDFS, MapReduce & YARN & their interaction
- Use MapReduce to recommend friends in a social network, build search engines & generate bigrams
- Chain multiple MapReduce jobs together
- Write your own customized partitioner
- Learn to globally sort a large amount of data by sampling input files
Analysts and data scientists typically have to work with several systems to effectively manage mass sets of data. Spark, on the other hand, provides you a single engine to explore and work with large amounts of data, run machine learning algorithms, and perform many other functions in a single interactive environment. This course’s focus on new and innovating technologies in data science and machine learning makes it an excellent one for anyone who wants to work in the lucrative, growing field of Big Data.
- Access 52 lectures & 8 hours of content 24/7
- Use Spark for a variety of analytics & machine learning tasks
- Implement complex algorithms like PageRank & Music Recommendations
- Work w/ a variety of datasets from airline delays to Twitter, web graphs, & product ratings
- Employ all the different features & libraries of Spark, like RDDs, Dataframes, Spark SQL, MLlib, Spark Streaming & GraphX
The functional programming nature and the availability of a REPL environment make Scala particularly well suited for a distributed computing framework like Spark. Using these two technologies in tandem can allow you to effectively analyze and explore data in an interactive environment with extremely fast feedback. This course will teach you how to best combine Spark and Scala, making it perfect for aspiring data analysts and Big Data engineers.
- Access 51 lectures & 8.5 hours of content 24/7
- Use Spark for a variety of analytics & machine learning tasks
- Understand functional programming constructs in Scala
- Implement complex algorithms like PageRank & Music Recommendations
- Work w/ a variety of datasets from airline delays to Twitter, web graphs, & Product Ratings
- Use the different features & libraries of Spark, like RDDs, Dataframes, Spark SQL, MLlib, Spark Streaming, & GraphX
- Write code in Scala REPL environments & build Scala applications w/ an IDE
For Big Data engineers and data analysts, HBase is an extremely effective databasing tool for organizing and manage massive data sets. HBase allows an increased level of flexibility, providing column oriented storage, no fixed schema and low latency to accommodate the dynamically changing needs of applications. With the 25 examples contained in this course, you’ll get a complete grasp of HBase that you can leverage in interviews for Big Data positions.
- Access 41 lectures & 4.5 hours of content 24/7
- Set up a database for your application using HBase
- Integrate HBase w/ MapReduce for data processing tasks
- Create tables, insert, read & delete data from HBase
- Get a complete understanding of HBase & its role in the Hadoop ecosystem
- Explore CRUD operations in the shell, & with the Java API
Think about the last time you saw a completely unorganized spreadsheet. Now imagine that spreadsheet was 100,000 times larger. Mind-boggling, right? That’s why there’s Pig. Pig works with unstructured data to wrestle it into a more palatable form that can be stored in a data warehouse for reporting and analysis. With the massive sets of disorganized data many companies are working with today, people who can work with Pig are in major demand. By the end of this course, you could qualify as one of those people.
- Access 34 lectures & 5 hours of content 24/7
- Clean up server logs using Pig
- Work w/ unstructured data to extract information, transform it, & store it in a usable form
- Write intermediate level Pig scripts to munge data
- Optimize Pig operations to work on large data sets
Data sets can outgrow traditional databases, much like children outgrow clothes. Unlike, children’s growth patterns, however, massive amounts of data can be extremely unpredictable and unstructured. For Big Data, the Cassandra distributed database is the solution, using partitioning and replication to ensure that your data is structured and available even when nodes in a cluster go down. Children, you’re on your own.
- Access 44 lectures & 5.5 hours of content 24/7
- Set up & manage a cluster using the Cassandra Cluster Manager (CCM)
- Create keyspaces, column families, & perform CRUD operations using the Cassandra Query Language (CQL)
- Design primary keys & secondary indexes, & learn partitioning & clustering keys
- Understand restrictions on queries based on primary & secondary key design
- Discover tunable consistency using quorum & local quorum
- Learn architecture & storage components: Commit Log, MemTable, SSTables, Bloom Filters, Index File, Summary File & Data File
- Build a Miniature Catalog Management System using the Cassandra Java driver
Working with Big Data, obviously, can be a very complex task. That’s why it’s important to master Oozie. Oozie makes managing a multitude of jobs at different time schedules, and managing entire data pipelines significantly easier as long as you know the right configurations parameters. This course will teach you how to best determine those parameters, so your workflow will be significantly streamlined.
- Access 23 lectures & 3 hours of content 24/7
- Install & set up Oozie
- Configure Workflows to run jobs on Hadoop
- Create time-triggered & data-triggered Workflows
- Build & optimize data pipelines using Bundles
Flume and Sqoop are important elements of the Hadoop ecosystem, transporting data from sources like local file systems to data stores. This is an essential component to organizing and effectively managing Big Data, making Flume and Sqoop great skills to set you apart from other data analysts.
- Access 16 lectures & 2 hours of content 24/7
- Use Flume to ingest data to HDFS & HBase
- Optimize Sqoop to import data from MySQL to HDFS & Hive
- Ingest data from a variety of sources including HTTP, Twitter & MySQL
via Ashraf
0 comments:
Post a Comment