Course Overview
This course builds on skills developed in the Data Science and Big Data Analytics course. The main focus areas cover Hadoop (including Pig, Hive, and HBase), Natural Language Processing, Social Network Analysis, Simulation, Random Forests, Multinomial Logistic Regression, and Data Visualization. Taking an ""Open"" or technology-neutral approach, this course utilizes several open-source tools to address big data challenges.
Course Objectives
Develop and execute MapReduce functionality
Gain familiarity with NoSQL databases and Hadoop Ecosystem tools for
analyzing large-scale, unstructured data sets
Develop a working knowledge of Natural Language Processing, Social
Network Analysis, and Data Visualization concepts
Use advanced quantitative methods and apply one of them in a Hadoop
environment
Apply advanced techniques to real-world datasets in a final lab
Who Should Attend?
This course is intended for aspiring Data Scientists, data analysts that have completed the associate level Data Science and Big Data Analytics course, and computer scientists wanting to learn MapReduce and methods for analyzing unstructured data such as text.
- Top-rated instructors: Our crew of subject matter experts have an average instructor rating of 4.8 out of 5 across thousands of reviews.
- Authorized content: We maintain more than 35 Authorized Training Partnerships with the top players in tech, ensuring your course materials contain the most relevant and up-to date information.
- Interactive classroom participation: Our virtual training includes live lectures, demonstrations and virtual labs that allow you to participate in discussions with your instructor and fellow classmates to get real-time feedback.
- Post Class Resources: Review your class content, catch up on any material you may have missed or perfect your new skills with access to resources after your course is complete.
- Private Group Training: Let our world-class instructors deliver exclusive training courses just for your employees. Our private group training is designed to promote your team’s shared growth and skill development.
- Tailored Training Solutions: Our subject matter experts can customize the class to specifically address the unique goals of your team.
Agenda
1 - MapReduce and Hadoop
- The MapReduce Framework
- ApacheHadoop
- Hadoop Distributed File System
- YARN
2 - Hadoop Ecosystem and NoSQL
- Hadoop Ecosystem
- Pig
- Hive
- NoSQL-NotOnlySQL
- HBase
- Spark
3 - Natural Language Processing
- Introduction to NLP
- TextPreprocessing
- TFIDF
- BeyondBagofWords
- LanguageModeling
- POS Tagging and HMM
- Sentiment Analysis and Topic Modeling
4 - Social Network Analysis
- IntroductiontoSNAandGraphTheory
- MostImportantNodes
- Communities and Small World
- Network Problems and SNA Tools
5 - Data Science Theory and Methods
- Simulation
- RandomForests
- MultinomialLogisticRegression
6 - Data Visualization
- Perception and Visualization
- Visualization of Multivariate Data Module