67795 
Numerical Methods 
14 hours 
This course is for data scientists and statisticians that have some familiarity with numerical methods and have at least one programming language from R, Python, Octave, and some C++ options. The emphasis of this course is on the practical aspects of data/model preparation, execution, post hoc analysis and visualization.
The purpose of this course is to give a practical introduction in numerical methods to participants interested in applying the methods at work.
Sector specific examples are used to make the training relevant to the audience.
Topics Covered:
curve fitting
regression robust regression
linear algebra: matrix operations
eigenvalue/eigenvectormatrix decompositions
ordinary & partial differential equations
fourier analysis
interpolation & splines

mlintro 
Introduction to Machine Learning 
7 hours 
This training course is for people that would like to apply basic Machine Learning techniques in practical applications.
Audience
Data scientists and statisticians that have some familiarity with machine learning and know how to program R. The emphasis of this course is on the practical aspects of data/model preparation, execution, post hoc analysis and visualization. The purpose is to give a practical introduction to machine learning to participants interested in applying the methods at work
Sector specific examples are used to make the training relevant to the audience.
Naive Bayes
Multinomial models
Bayesian categorical data analysis
Discriminant analysis
Linear regression
Logistic regression
GLM
EM Algorithm
Mixed Models
Additive Models
Classification
KNN
Ridge regression
Clustering

webappsr 
Building Web Applications in R with Shiny 
7 hours 
Description:
This is a course designed to teach R users how to create web apps without needing to learn crossbrowser HTML, Javascript, and CSS.
Objective:
Covers the basics of how Shiny apps work.
Covers all commonly used input/output/rendering/paneling functions from the Shiny library.
An overview of Shiny
Installation of Shiny for a local use
Basic Shiny concepts
Basic control accessories  Buttons, sliders, drop down menus
Program structure ui.r, server.r
Building first application
Running your application
Customizing interface
Html links in Shiny
JavaScript and Shiny
Advanced control accessories
Showing and Hiding elements of UI
Dynamic user interfaces
Advanced reactivity
Animation
Downloading uploading data
Sharing Shiny web applications
An overview of Shiny extensions

BigData_ 
A practical introduction to Data Analysis and Big Data 
35 hours 
Participants who complete this training will gain a practical, realworld understanding of Big Data and its related technologies, methodologies and tools.
Participants will have the opportunity to put this knowledge into practice through handson exercises. Group interaction and instructor feedback make up an important component of the class.
The course starts with an introduction to elemental concepts of Big Data, then progresses into the programming languages and methodologies used to perform Data Analysis. Finally, we discuss the tools and infrastructure that enable Big Data storage, Distributed Processing, and Scalability.
Audience
Developers / programmers
IT consultants
Format of the course
Part lecture, part discussion, handson practice and implementation, occasional quizing to measure progress.
Introduction to Data Analysis and Big Data
What makes Big Data "big"?
Velocity, Volume, Variety, Veracity (VVVV)
Limits to traditional Data Processing
Distributed Processing
Statistical Analysis
Types of Machine Learning Analysis
Data Visualization
Languages used for Data Analysis
R language
Why R for Data Analysis?
Data manipulation, calculation and graphical display
Python
Why Python for Data Analysis?
Manipulating, processing, cleaning, and crunching data
Approaches to Data Analysis
Statistical Analysis
Time Series analysis
Forecasting with Correlation and Regression models
Inferential Statistics (estimating)
Descriptive Statistics in Big Data sets (e.g. calculating mean)
Machine Learning
Supervised vs unsupervised learning
Classification and clustering
Estimating cost of specific methods
Filtering
Natural Language Processing
Processing text
Understaing meaning of the text
Automatic text generation
Sentiment analysis / Topic analysis
Computer Vision
Acquiring, processing, analyzing, and understanding images
Reconstructing, interpreting and understanding 3D scenes
Using image data to make decisions
Big Data infrastructure
Data Storage
Relational databases (SQL)
MySQL
Postgres
Oracle
Nonrelational databases (NoSQL)
Cassandra
MongoDB
Neo4js
Understanding the nuances
Hierarchical databases
Objectoriented databases
Documentoriented databases
Graphoriented databases
Other
Distributed Processing
Hadoop
HDFS as a distributed filesystem
MapReduce for distributed processing
Spark
Allinone inmemory cluster computing framework for largescale data processing
Structured streaming
Spark SQL
Machine Learning libraries: MLlib
Graph processing with GraphX
Scalability
Public cloud
AWS, Google, Aliyun, etc.
Private cloud
OpenStack, Cloud Foundry, etc.
Autoscalability
Choosing the right solution for the problem
The future of Big Data
Closing remarks

xcelsius 
Xcelsius 
14 hours 
Description:
In this Xcelsius Training course, students will use Xcelsius Present to create interactive visualizations for presenting complex data in a simple way, and to conduct analysis to make critical decisions. Students will also create complete dashboards that present business, project, and human resources information, all consolidated and presented in a userfriendly manner. Finally, students will publish dashboards into various file formats such as Adobe Flash, Microsoft Office PowerPoint, Adobe PDF, and also to the web.
Objectives:
Upon successful completion of this course, students will be able to:
Explore the Xcelsius workspace and an already created dashboard.
Create simple visualizations.
Conduct data analysis using Xcelsius components that give dynamic functionality to the specified data.
Create a Project Management dashboard.
Create a dashboard to consolidate and present the Human Resources information of an organization.
Finalize dashboards and export them to different file formats.
Audience:
This course is designed for professionals who conduct data analysis and need to present robust and timely data in an interactive display.
1: Getting Started with Xcelsius
Explore the Xcelsius Interface
Explore a Dashboard
2: Creating Simple and Interactive Visualizations
Create a Simple Xcelsius Chart
Manage Personal Finance Using Value Box
Organize Levels of Information Using Filters
Conduct a Comparative Study Using List Builder and Line Chart
3: Conducting Data Analysis
Conduct Trend Analysis Using Combo Box
Conduct Demand Analysis Using Label Based Menu
Conduct a Region Based Demand Analysis Using Maps
Forecast Revenue Using Sliders and Gauge
4: Creating a Project Management Dashboard
Drill Down the Status of Current Projects Using the Drill Down Function
Analyze Resource Efficiency Using Fisheye Picture Menu and Other Tools
Analyze Resource Utilization Using Combination Chart
5: Creating a Human Resources Dashboard
Create an Organization Dashboard Using Organization Chart
Conduct Attrition Analysis
6: Finalizing Dashboards
Create a Snapshot
Publish Dashboards

appliedml 
Applied Machine Learning 
14 hours 
This training course is for people that would like to apply Machine Learning in practical applications.
Audience
This course is for data scientists and statisticians that have some familiarity with statistics and know how to program R (or Python or other chosen language). The emphasis of this course is on the practical aspects of data/model preparation, execution, post hoc analysis and visualization.
The purpose is to give practical applications to Machine Learning to participants interested in applying the methods at work.
Sector specific examples are used to make the training relevant to the audience.
Naive Bayes
Multinomial models
Bayesian categorical data analysis
Discriminant analysis
Linear regression
Logistic regression
GLM
EM Algorithm
Mixed Models
Additive Models
Classification
KNN
Bayesian Graphical Models
Factor Analysis (FA)
Principal Component Analysis (PCA)
Independent Component Analysis (ICA)
Support Vector Machines (SVM) for regression and classification
Boosting
Ensemble models
Neural networks
Hidden Markov Models (HMM)
Space State Models
Clustering

rintrob 
Introductory R for Biologists 
28 hours 
I. Introduction and preliminaries
1. Overview
Making R more friendly, R and available GUIs
Rstudio
Related software and documentation
R and statistics
Using R interactively
An introductory session
Getting help with functions and features
R commands, case sensitivity, etc.
Recall and correction of previous commands
Executing commands from or diverting output to a file
Data permanency and removing objects
Good programming practice: Selfcontained scripts, good readability e.g. structured scripts, documentation, markdown
installing packages; CRAN and Bioconductor
2. Reading data
Txt files (read.delim)
CSV files
3. Simple manipulations; numbers and vectors + arrays
Vectors and assignment
Vector arithmetic
Generating regular sequences
Logical vectors
Missing values
Character vectors
Index vectors; selecting and modifying subsets of a data set
Arrays
Array indexing. Subsections of an array
Index matrices
The array() function + simple operations on arrays e.g. multiplication, transposition
Other types of objects
4. Lists and data frames
Lists
Constructing and modifying lists
Concatenating lists
Data frames
Making data frames
Working with data frames
Attaching arbitrary lists
Managing the search path
5. Data manipulation
Selecting, subsetting observations and variables
Filtering, grouping
Recoding, transformations
Aggregation, combining data sets
Forming partitioned matrices, cbind() and rbind()
The concatenation function, (), with arrays
Character manipulation, stringr package
short intro into grep and regexpr
6. More on Reading data
XLS, XLSX files
readr and readxl packages
SPSS, SAS, Stata,… and other formats data
Exporting data to txt, csv and other formats
6. Grouping, loops and conditional execution
Grouped expressions
Control statements
Conditional execution: if statements
Repetitive execution: for loops, repeat and while
intro into apply, lapply, sapply, tapply
7. Functions
Creating functions
Optional arguments and default values
Variable number of arguments
Scope and its consequences
8. Simple graphics in R
Creating a Graph
Density Plots
Dot Plots
Bar Plots
Line Charts
Pie Charts
Boxplots
Scatter Plots
Combining Plots
II. Statistical analysis in R
1. Probability distributions
R as a set of statistical tables
Examining the distribution of a set of data
2. Testing of Hypotheses
Tests about a Population Mean
Likelihood Ratio Test
One and twosample tests
ChiSquare GoodnessofFit Test
KolmogorovSmirnov OneSample Statistic
Wilcoxon SignedRank Test
TwoSample Test
Wilcoxon Rank Sum Test
MannWhitney Test
KolmogorovSmirnov Test
3. Multiple Testing of Hypotheses
Type I Error and FDR
ROC curves and AUC
Multiple Testing Procedures (BH, Bonferroni etc.)
4. Linear regression models
Generic functions for extracting model information
Updating fitted models
Generalized linear models
Families
The glm() function
Classification
Logistic Regression
Linear Discriminant Analysis
Unsupervised learning
Principal Components Analysis
Clustering Methods(kmeans, hierarchical clustering, kmedoids)
5. Survival analysis (survival package)
Survival objects in r
KaplanMeier estimate, logrank test, parametric regression
Confidence bands
Censored (interval censored) data analysis
Cox PH models, constant covariates
Cox PH models, timedependent covariates
Simulation: Model comparison (Comparing regression models)
6. Analysis of Variance
OneWay ANOVA
TwoWay Classification of ANOVA
MANOVA
III. Worked problems in bioinformatics
Short introduction to limma package
Microarray data analysis workflow
Data download from GEO: http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE1397
Data processing (QC, normalisation, differential expression)
Volcano plot
Custering examples + heatmaps

Piwik 
Getting started with Piwik 
21 hours 
Audience
Web analysist
Data analysists
Market researchers
Marketing and sales professionals
System administrators
Format of course
Part lecture, part discussion, heavy handson practice
Introduction to Piwik
Why use Piwik?
Piwik vs Google Analystics
Setting up Piwik
Selecting which websites to monitor
Working with the dashboard
Understanding visitor activity
Actions
Referrals
Generating reports

sixsigmabb 
Six Sigma Black Belt 
84 hours 
Six Sigma is a data driven approach that tackles variation to improve the performance of products, services and processes, combining practical problem solving and the best scientific approaches found in experimentation and optimisation of systems. The approach has been widely and successfully applied in industry, notably by Motorola, AlliedSignal & General Electric.
Black Belt is a qualification for improvement managers in a Six Sigma organisation. You will learn the tools and techniques to take an improvement project through the Define, Measure, Analyse, Improve and Control phases (DMAIC). These techniques include Process Mapping, Measurement System Evaluation, Regression Analysis, Design of Experiments, Statistical Tolerancing, Monte Carlo Simulation and Lean Thinking.
The content of the course takes the participants through the DMAIC phases as well as introducing subjects such as Lean Thinking, Design for Six Sigma and discussing important leadership issues and experiences in deploying a Six Sigma programme.
Week 1 Foundation: covers the fundamentals of the Lean Six Sigma Define Measure Analyse Improve Control (DMAIC) approach enabling participants to take part and lead waste and defect reduction projects and initiatives.
Week 2 Practitioner: provides additional data analysis and lean tools for participants to lead well scoped process improvement projects related to their regular job function.
Week 3 Expert: provides regression, design of experiment and data analysis techniques to enable participants to tackle complex problem solving projects that require understanding of the relationships between multiple variables.
The trainer has 16 years experience with Six Sigma and as well as leading the deployment of Six Sigma at a number of businesses he has trained and coached over 300 Black Belts. Here are a few comments from previous participants:
“Probably the most valuable course I will ever pass”
“The content was very well delivered. The examples very relevant. Thank you”
“The course was excellent and I am able to use part of it to coach my lean teams here” (Company supervisor who attended with KTP associate)
Block 1
Day 1
Introduction to Six Sigma
Project Chartering & VOC
Process Mapping
Stakeholder analysis
Day 2
Team Start Up
Prioritisation Matrix
Lean Thinking
Value Stream Mapping
Day 3
Data Collection
Minitab and Graphical Analysis
Descriptive Statistics
Day 4
Measurement System Evaluation
Process Capability Cp, CpK
Six Sigma Metrics
Day 5
5 Why
FMEA
Block 2
Day 1
Review of Block 1
Multivari
Inferential Statistics
Intro to Hypothesis Testing
Day 2
2 sample ttests
F tests
Hypothesis Testing – Chi Sq
Day 3
Hypothesis Testing  Anova
Day 4
Correlation and Regression
Multiple Regression
Introduction to Design Of Experiments
Day 5
Mistake Proofing
Control Plans
Control Charts
Block 3
Day 1
Review of Block 2
2K Factorial Experiments
Box Cox Transformations
Hypothesis Testing – Non Parametric
Day 2
2K Factorial Experiments
Fractional Factorial Experiments
Day 3
Noise Blocking Robustness
Centre Points
General Full Factorial Experiments
Day 4
Response Surface Experiments
Implementing Improvements
Creative Solutions
Day 5
Intro to Design for Six Sigma
Statistical Tolerancing
Monte Carlo Simulation
Certification
Six Sigma is a practical qualification, to demonstrate knowledge of what has been learnt on the course you will need to undertake 2 coursework projects. There is no report to produce but you will be required to present a PowerPoint presentation to the trainer and examiner showing results and method. The projects can cover work you would complete in your normal work, however you will need to show use of the DMAIC problem solving approach and application of Six Sigma and Lean tools. This provides a good balance between the practical approach and more rigorous analysis which together lead to robust solutions. You will be able to contact the trainer for discussions of how Six Sigma tools could benefit you in your project. Examples of projects from previous participants include:
Formulating cream texture for seasonality in dairy feeds.
Housing Association complaints reduction
Multivariable (cost, efficiency, size) optimisation of a fuel cell
Job Scheduling improvement in a factory
Ambulance waiting time reduction
Reduction in resin thickness variation in glass manufacture
NobleProg & Redlands provide Black Belt certification. For delegates that require independent accreditation, NobleProg & Redlands have partnered with the British Quality Foundation (BQF) to provide Lean Six Sigma Black Belt certification. Certification requires passing an exam at the end of the course and completing and presenting two improvement projects that demonstrate understanding and application of the Six Sigma approach and techniques.
An additional charge of £600 plus VAT is levied for BQF independent accreditation. 
dataminr 
Data Mining with R 
14 hours 
Sources of methods
Artificial intelligence
Machine learning
Statistics
Sources of data
Pre processing of data
Data Import/Export
Data Exploration and Visualization
Dimensionality Reduction
Dealing with missing values
R Packages
Data mining main tasks
Automatic or semiautomatic analysis of large quantities of data
Extracting previously unknown interesting patterns
groups of data records (cluster analysis)
unusual records (anomaly detection)
dependencies (association rule mining)
Data mining
Anomaly detection (Outlier/change/deviation detection)
Association rule learning (Dependency modeling)
Clustering
Classification
Regression
Summarization
Frequent Pattern Mining
Text Mining
Decision Trees
Regression
Neural Networks
Sequence Mining
Frequent Pattern Mining
Data dredging, data fishing, data snooping 
bigddbsysfun 
Big Data & Database Systems Fundamentals 
14 hours 
The course is part of the Data Scientist skill set (Domain: Data and Technology).
Data Warehousing Concepts
What is Data Ware House?
Difference between OLTP and Data Ware Housing
Data Acquisition
Data Extraction
Data Transformation.
Data Loading
Data Marts
Dependent vs Independent data Mart
Data Base design
ETL Testing Concepts:
Introduction.
Software development life cycle.
Testing methodologies.
ETL Testing Work Flow Process.
ETL Testing Responsibilities in Data stage.
Big data Fundamentals
Big Data and its role in the corporate world
The phases of development of a Big Data strategy within a corporation
Explain the rationale underlying a holistic approach to Big Data
Components needed in a Big Data Platform
Big data storage solution
Limits of Traditional Technologies
Overview of database types
NoSQL Databases
Hadoop
Map Reduce
Apache Spark 
datamodeling 
Pattern Recognition 
35 hours 
This course provides an introduction into the field of pattern recognition and machine learning. It touches on practical applications in statistics, computer science, signal processing, computer vision, data mining, and bioinformatics.
The course is interactive and includes plenty of handson exercises, instructor feedback, and testing of knowledge and skills acquired.
Audience
Data analysts
PhD students, researchers and practitioners
Introduction
Probability theory, model selection, decision and information theory
Probability distributions
Linear models for regression and classification
Neural networks
Kernel methods
Sparse kernel machines
Graphical models
Mixture models and EM
Approximate inference
Sampling methods
Continuous latent variables
Sequential data
Combining models

excelstatsda 
Excel For Statistical Data Analysis 
14 hours 
Audience
Analysts, researchers, scientists, graduates and students and anyone who is interested in learning how to facilitate statistical analysis in Microsoft Excel.
Course Objectives
This course will help improve your familiarity with Excel and statistics and as a result increase the effectiveness and efficiency of your work or research.
This course describes how to use the Analysis ToolPack in Microsoft Excel, statistical functions and how to perform basic statistical procedures. It will explain what Excel limitation are and how to overcome them.
Aggregating Data in Excel
Statistical Functions
Outlines
Subtotals
Pivot Tables
Data Relation Analysis
Normal Distribution
Descriptive Statistics
Linear Correlation
Regression Analysis
Covariance
Analysing Data in Time
Trends/Regression line
Linear, Logarithmic, Polynomial, Power, Exponential, Moving Average Smoothing
Seasonal fluctuations analysis
Comparing Populations
Confidence Interval for the Mean
Test of Hypothesis Concerning the Population Mean
Difference Between Mean of Two Populations
ANOVA: Analysis of Variances
GoodnessofFit Test for Discrete Random Variables
Test of Independence: Contingency Tables
Test Hypothesis Concerning the Variance of Two Populations
Forecasting
Extrapolation

datascience 
Data Science Training 
21 hours 
Data Science Training
Aim:
Obtaining the required knowledge for application of Data Science methods and also getting
consultancy for establishing a Data Science team in an insurance company
Order:
23 days training and consulting in Data Science:
One goal is getting consultancy in the introduction and establishment of Data Science, and
the statistical environment R as Data Science tool, within a company / organization.
Another goal represents the prediction of typical Key Performance Indicators (KPI) and their
confidence intervals with R. Suitable reporting and communication of these KPIs to the
management board should be trained also. On the basis of use cases which are derived from
actual problems in Actuarial Science and Data Science, the respective methods and their
implementation in R should be trained and discussed.
Content:
1.) Modelling KPIs
1a.) Based on a use case, the modelling of respective KPI via R shall be discussed.
Especially following topics have to be concerned:
 Using R as a tool to analyze the performance of insurance portfolios
 Suitable data organization within R
 Application of Bayesian Theory (preferred using Stan Library in R)
 Validation of statistical models
 Suitable reporting of KPIs, visualization and communication of models and statistical
results to the management board
Target group: Data Scientists
2) Establishing a Data Science team within an organization
Based on practical experience, it should be taught how to establish a Data Science team and
R as a Data Science tool within a larger company.
Especially the following topics have to be concerned:
 Required hardware and software
 Definition of interfaces to other teams (Data Integration / Data Governance / IT)
 Standardization (Projects / Coding Styles / Methods)
 Information Management
 Documentation, reproducibility, allocation of tasks
 Networking
 Compliance
Target group: Data Scientists, management board
3.) Claims reserving with R using state of the art methods
Using the ChainLadder R Package, reserving shall be conducted. The focus lies on:
 Application of stateoftheart claims reserving methods including
o Basic ChainLadder
o Mack ChainLadder
o Generalized linear modelling
o Bayesian Approach
 Estimation of claim severity in case quickly growing portfolios
 Prediction of future claim severity in case of a fixed portfolio
 Modelling cancellation
Target group: Data Scientists, Actuaries
Extent:
23 day training / consulting
Requirements
 inhouse training is preferred
 Training is based on reallife insurance data / experience 
MLFWR1 
Machine Learning Fundamentals with R 
14 hours 
The aim of this course is to provide a basic proficiency in applying Machine Learning methods in practice. Through the use of the R programming platform and its various libraries, and based on a multitude of practical examples this course teaches how to use the most important building blocks of Machine Learning, how to make data modeling decisions, interpret the outputs of the algorithms and validate the results.
Our goal is to give you the skills to understand and use the most fundamental tools from the Machine Learning toolbox confidently and avoid the common pitfalls of Data Sciences applications.
Introduction to Applied Machine Learning
Statistical learning vs. Machine learning
Iteration and evaluation
BiasVariance tradeoff
Regression
Linear regression
Generalizations and Nonlinearity
Exercises
Classification
Bayesian refresher
Naive Bayes
Logistic regression
KNearest neighbors
Exercises
Crossvalidation and Resampling
Crossvalidation approaches
Bootstrap
Exercises
Unsupervised Learning
Kmeans clustering
Examples
Challenges of unsupervised learning and beyond Kmeans

rprogda 
R Programming for Data Analysis 
14 hours 
This course is part of the Data Scientist skill set (Domain: Data and Technology)
Introduction and preliminaries
Making R more friendly, R and available GUIs
Rstudio
Related software and documentation
R and statistics
Using R interactively
An introductory session
Getting help with functions and features
R commands, case sensitivity, etc.
Recall and correction of previous commands
Executing commands from or diverting output to a file
Data permanency and removing objects
Simple manipulations; numbers and vectors
Vectors and assignment
Vector arithmetic
Generating regular sequences
Logical vectors
Missing values
Character vectors
Index vectors; selecting and modifying subsets of a data set
Other types of objects
Objects, their modes and attributes
Intrinsic attributes: mode and length
Changing the length of an object
Getting and setting attributes
The class of an object
Arrays and matrices
Arrays
Array indexing. Subsections of an array
Index matrices
The array() function
The outer product of two arrays
Generalized transpose of an array
Matrix facilities
Matrix multiplication
Linear equations and inversion
Eigenvalues and eigenvectors
Singular value decomposition and determinants
Least squares fitting and the QR decomposition
Forming partitioned matrices, cbind() and rbind()
The concatenation function, (), with arrays
Frequency tables from factors
Lists and data frames
Lists
Constructing and modifying lists
Concatenating lists
Data frames
Making data frames
attach() and detach()
Working with data frames
Attaching arbitrary lists
Managing the search path
Data manipulation
Selecting, subsetting observations and variables
Filtering, grouping
Recoding, transformations
Aggregation, combining data sets
Character manipulation, stringr package
Reading data
Txt files
CSV files
XLS, XLSX files
SPSS, SAS, Stata,… and other formats data
Exporting data to txt, csv and other formats
Accessing data from databases using SQL language
Probability distributions
R as a set of statistical tables
Examining the distribution of a set of data
One and twosample tests
Grouping, loops and conditional execution
Grouped expressions
Control statements
Conditional execution: if statements
Repetitive execution: for loops, repeat and while
Writing your own functions
Simple examples
Defining new binary operators
Named arguments and defaults
The '...' argument
Assignments within functions
More advanced examples
Efficiency factors in block designs
Dropping all names in a printed array
Recursive numerical integration
Scope
Customizing the environment
Classes, generic functions and object orientation
Graphical procedures
Highlevel plotting commands
The plot() function
Displaying multivariate data
Display graphics
Arguments to highlevel plotting functions
Basic visualisation graphs
Multivariate relations with lattice and ggplot package
Using graphics parameters
Graphics parameters list
Automated and interactive reporting
Combining output from R with text

kdd 
Knowledge Discover in Databases (KDD) 
21 hours 
Knowledge discovery in databases (KDD) is the process of discovering useful knowledge from a collection of data. Reallife applications for this data mining technique include marketing, fraud detection, telecommunication and manufacturing.
In this course, we introduce the processes involved in KDD and carry out a series of exercises to practice the implementation of those processes.
Audience
Data analysts or anyone interested in learning how to interpret data to solve problems
Format of the course
After a theoretical discussion of KDD, the instructor will present reallife cases which call for the application of KDD to solve a problem. Participants will prepare, select and cleanse sample data sets and use their prior knowledge about the data to propose solutions based on the results of their observations.
Introduction
KDD vs data mining
Establishing the application domain
Establishing relevant prior knowledge
Understanding the goal of the investigation
Creating a target data set
Data cleaning and preprocessing
Data reduction and projection
Choosing the data mining task
Choosing the data mining algorithms
Interpreting the mined patterns 
stats2 
Statistics Level 2 
28 hours 
This training course covers advanced statistics. It explains most of the tools commonly used in research, analysis and forecasting. It provides short explanations of the theory behind the formulas.
This course does not relate to any specific field of knowledge, but can be tailored if all the delegates have the same background and goals.
Some basic computer tools are used during this course (notably Excel and OpenOffice)
Describing Bivariate Data
Introduction to Bivariate Data
Values of the Pearson Correlation
Guessing Correlations Simulation
Properties of Pearson's r
Computing Pearson's r
Restriction of Range Demo
Variance Sum Law II
Exercises
Probability
Introduction
Basic Concepts
Conditional Probability Demo
Gamblers Fallacy Simulation
Birthday Demonstration
Binomial Distribution
Binomial Demonstration
Base Rates
Bayes' Theorem Demonstration
Monty Hall Problem Demonstration
Exercises
Normal Distributions
Introduction
History
Areas of Normal Distributions
Varieties of Normal Distribution Demo
Standard Normal
Normal Approximation to the Binomial
Normal Approximation Demo
Exercises
Sampling Distributions
Introduction
Basic Demo
Sample Size Demo
Central Limit Theorem Demo
Sampling Distribution of the Mean
Sampling Distribution of Difference Between Means
Sampling Distribution of Pearson's r
Sampling Distribution of a Proportion
Exercises
Estimation
Introduction
Degrees of Freedom
Characteristics of Estimators
Bias and Variability Simulation
Confidence Intervals
Exercises
Logic of Hypothesis Testing
Introduction
Significance Testing
Type I and Type II Errors
One and TwoTailed Tests
Interpreting Significant Results
Interpreting NonSignificant Results
Steps in Hypothesis Testing
Significance Testing and Confidence Intervals
Misconceptions
Exercises
Testing Means
Single Mean
t Distribution Demo
Difference between Two Means (Independent Groups)
Robustness Simulation
All Pairwise Comparisons Among Means
Specific Comparisons
Difference between Two Means (Correlated Pairs)
Correlated t Simulation
Specific Comparisons (Correlated Observations)
Pairwise Comparisons (Correlated Observations)
Exercises
Power
Introduction
Factors Affecting Power
Why power matters
Exercises
Prediction
Introduction to Simple Linear Regression
Linear Fit Demo
Partitioning Sums of Squares
Standard Error of the Estimate
Prediction Line Demo
Inferential Statistics for b and r
Exercises
ANOVA
Introduction
ANOVA Designs
OneFactor ANOVA (BetweenSubjects)
OneWay Demo
MultiFactor ANOVA (BetweenSubjects)
Unequal Sample Sizes
Tests Supplementing ANOVA
WithinSubjects ANOVA
Power of WithinSubjects Designs Demo
Exercises
Chi Square
Chi Square Distribution
OneWay Tables
Testing Distributions Demo
Contingency Tables
2 x 2 Table Simulation
Exercises

mlentre 
Machine Learning Concepts for Entrepreneurs and Managers 
21 hours 
This training course is for people that would like to apply Machine Learning in practical applications for their team. The training will not dive into technicalities and revolve around basic concepts and business/operational applications of the same.
Target Audience
Investors and AI entrepreneurs
Managers and Engineers whose company is venturing into AI space
Business Analysts & Investors
Introduction to Neural Networks
Introduction to Applied Machine Learning
Statistical learning vs. Machine learning
Iteration and evaluation
BiasVariance tradeoff
Machine Learning with Python
Choice of libraries
Addon tools
Machine learning Concepts and Applications
Regression
Linear regression
Generalizations and Nonlinearity
Use cases
Classification
Bayesian refresher
Naive Bayes
Logistic regression
KNearest neighbors
Use Cases
Crossvalidation and Resampling
Crossvalidation approaches
Bootstrap
Use Cases
Unsupervised Learning
Kmeans clustering
Examples
Challenges of unsupervised learning and beyond Kmeans
Short Introduction to NLP methods
word and sentence tokenization
text classification
sentiment analysis
spelling correction
information extraction
parsing
meaning extraction
question answering
Artificial Intelligence & Deep Learning
Technical Overview
R v/s Python
Caffe v/s Tensor Flow
Various Machine Learning Libraries

octaveda 
Octave for Data Analysis 
14 hours 
Audience:
This course is for data scientists and statisticians that have some familiarity statistical methods and would like to use the Octave programming language at work.
The purpose of this course is to give a practical introduction in Octave programming to participants interested in using this programming language at work.
environment
data types:
numeric
string, arrays
matrices
variables
expressions
control flow
functions
exception handling
debugging
input/output
linear algebra
optimization
statistical distributions
regression
plotting

dmmlr 
Data Mining & Machine Learning with R 
14 hours 
Introduction to Data mining and Machine Learning
Statistical learning vs. Machine learning
Iteration and evaluation
BiasVariance tradeoff
Regression
Linear regression
Generalizations and Nonlinearity
Exercises
Classification
Bayesian refresher
Naive Bayes
Dicriminant analysis
Logistic regression
KNearest neighbors
Support Vector Machines
Neural networks
Decision trees
Exercises
Crossvalidation and Resampling
Crossvalidation approaches
Bootstrap
Exercises
Unsupervised Learning
Kmeans clustering
Examples
Challenges of unsupervised learning and beyond Kmeans
Advanced topics
Ensemble models
Mixed models
Boosting
Examples
Multidimensional reduction
Factor Analysis
Principal Component Analysis
Examples

druid 
Druid: Build a fast, realtime data analysis system 
21 hours 
Druid is an opensource, columnoriented, distributed data store written in Java. It was designed to quickly ingest massive quantities of event data and execute lowlatency OLAP queries on that data. Druid is commonly used in business intelligence applications to analyze high volumes of realtime and historical data. It is also well suited for powering fast, interactive, analytic dashboards for endusers. Druid is used by companies such as Alibaba, Airbnb, Cisco, eBay, Netflix, Paypal, and Yahoo.
In this course we explore some of the limitations of data warehouse solutions and discuss how Druid can compliment those technologies to form a flexible and scalable streaming analytics stack. We walk through many examples, offering participants the chance to implement and test Druidbased solutions in a lab environment.
Audience
Application developers
Software engineers
Technical consultants
DevOps professionals
Architecture engineers
Format of the course
Part lecture, part discussion, heavy handson practice, occasional tests to gauge understanding
Introduction
Installing and starting Druid
Druid architecture and design
Realtime ingestion of event data
Sharding and indexing
Loading data
Querying data
Visualizing data
Running a distributed cluster
Druid + Apache Hive
Druid + Apache Kafka
Druid + others
Troubleshooting
Administrative tasks 
stats1 
Statistics Level 1 
14 hours 
This course has been created for people who require general statistics skills. This course can be tailored to a specific area of expertise like market research, biology, manufacturing, public sector research, etc...
Introduction
Descriptive Statistics
Inferential Statistics
Sampling Demonstration
Variables
Percentiles
Measurement
Levels of Measurement
Measurement Demonstration
Basics of Data Collection
Distributions
Summation Notation
Linear Transformations
Exercises
Graphing Distributions
Qualitative Variables
Quantitative Variables
Stem and Leaf Displays
Histograms
Frequency Polygons
Box Plots
Box Plot Demonstration
Bar Charts
Line Graphs
Exercises
Summarizing Distributions
Central Tendency
What is Central Tendency
Measures of Central Tendency
Balance Scale Simulation
Absolute Difference Simulation
Squared Differences Simulation
Median and Mean
Mean and Median Simulation
Additional Measures
Comparing measures
Variability
Measures of Variability
Estimating Variance Simulation
Shape
Comparing Distributions Demo
Effects of Transformations
Variance Sum Law I
Exercises
Normal Distributions
History
Areas of Normal Distributions
Varieties of Normal Distribution Demo
Standard Normal
Normal Approximation to the Binomial
Normal Approximation Demo
Exercises

sixsigmagb 
Six Sigma Green Belt 
70 hours 
Green Belts participate in and lead Lean and Six Sigma projects from within their regular job function. They can tackle projects as part of a cross functional team or projects scoped within their normal job.
Each session of Green Belt training is separated by 3 or 4 weeks when the Green Belts apply their training to their improvement projects. We recommend supporting the Green Belts on their projects in between training sessions and holding stage gate reviews along with leadership and Lean Six Sigma Champions to ensure DMAIC methodology is being rigorously applied.
Week 1 Foundation: covers the fundamentals of the Lean Six Sigma Define Measure Analyse Improve Control (DMAIC) approach enabling participants to take part and lead waste and defect reduction projects and initiatives.
Week 2 Practitioner: provides additional data analysis and lean tools for participants to lead well scoped process improvement projects related to their regular job function.
Block 1
Day 1
Introduction to Six Sigma
Project Chartering & VOC
Process Mapping
Stakeholder analysis
Day 2
Team Start Up
Prioritisation Matrix
Lean Thinking
Value Stream Mapping
Day 3
Data Collection
Minitab and Graphical Analysis
Descriptive Statistics
Day 4
Measurement System Evaluation
Process Capability Cp, CpK
Six Sigma Metrics
Day 5
5 Why
FMEA
Block 2
Day 1
Review of Block 1
Multivari
Inferential Statistics
Intro to Hypothesis Testing
Day 2
2 sample ttests
F tests
Hypothesis Testing – Chi Sq
Day 3
Hypothesis Testing  Anova
Day 4
Correlation and Regression
Multiple Regression
Introduction to Design Of Experiments
Day 5
Mistake Proofing
Control Plans
Control Charts

statsqa 
Statistical Quality Analysis 
7 hours 
This course covers the fundamentals of statistical process control and how these quality tools can provide the necessary evidence to improve and control processes. Know when and where to use the various types of control charts available in Minitab for your own processes. And learn how to use capability analysis tools to evaluate your processes.
Gage R&R,
Destructive Testing,
Gage Linearity and Bias,
Attribute Agreement,
Variables and Attribute Control Charts,
Capability Analysis for Normal, Nonnormal and Attribute data

predmodr 
Predictive Modelling with R 
14 hours 
Problems facing forecasters
Customer demand planning
Investor uncertainty
Economic planning
Seasonal changes in demand/utilization
Roles of risk and uncertainty
Time series Forecasting
Seasonal adjustment
Moving average
Exponential smoothing
Extrapolation
Linear prediction
Trend estimation
Stationarity and ARIMA modelling
Econometric methods (casual methods)
Regression analysis
Multiple linear regression
Multiple nonlinear regression
Regression validation
Forecasting from regression
Judgemental methods
Surveys
Delphi method
Scenario building
Technology forecasting
Forecast by analogy
Simulation and other methods
Simulation
Prediction market
Probabilistic forecasting and Ensemble forecasting

mtstatda 
Minitab for Statistical Data Analysis 
14 hours 
The course is aimed at anyone interested in statistical analysis. It provides familiarity with Minitab and will increase the effectiveness and efficiency of your data analysis and improve your knowledge of statistics.
Chapter 1: Descriptive Statistics and Graphical Analysis
1.1 Introduction
1.1.1 Learning Objectives
1.2 Types of Data
1.2.1 Basic Concepts
1.2.2 Data Types
1.2.3 Quiz: Types of Data
1.3 Using Graphs to Analyze Data
1.3.1 Basic Concepts
1.3.2 Bar Charts and Pareto Charts
1.3.3 Pie Charts
1.3.4 Histograms
1.3.5 Dotplots
1.3.6 Individual Value Plots
1.3.7 Boxplots
1.3.8 Time Series Plots
1.3.9 Quiz: Using Graphs to Analyze Data
1.3.10 Minitab Tools: Bar Chart
1.3.11 Minitab Tools: Pie Chart
1.3.12 Minitab Tools: Histogram
1.3.13 Minitab Tools: Dotplot
1.3.14 Minitab Tools: Individual Value Plot
1.3.15 Minitab Tools: Boxplot
1.3.16 Minitab Tools: Times Series Plot
1.3.17 Exercise: Graphical Analysis
1.4 Using Statistics to Analyze Data
1.4.1 Basic Concepts
1.4.2 Mean and Median
1.4.3 Range, Variance, and Standard Deviation
1.4.4 Quiz: Using Statistics to Analyze Data
1.4.5 Minitab Tools: Display Descriptive Statistics
1.4.6 Exercise: Descriptive Statistics
1.5 Summary
1.5.1 Objectives Review
Chapter 2: Statistical Inference
2.1 Introduction
2.1.1 Learning Objectives
2.2 Fundamentals of Statistical Inference
2.2.1 Basic Concepts
2.2.2 Random Samples
2.2.3 Quiz: Fundamentals of Statistical Inference
2.2.4 Minitab Tools: Random Sampling
2.3 Sampling Distributions
2.3.1 Basic Concepts
2.3.2 Sampling Distribution of the Mean
2.3.3 Quiz: Sampling Distributions
2.4 Normal Distribution
2.4.1 Basic Concepts
2.4.2 Probabilities Associated with a Normal Distribution
2.4.3 Probabilities Associated with the Sample Mean
2.4.4 Quiz: Normal Distribution
2.4.5 Minitab Tools: Cumulative Probabilities with a Normal Distribution
2.4.6 Exercise: Probabilities and Normal Distributions
2.5 Summary
2.5.1 Objectives Review
Chapter 3: Hypothesis Tests and Confidence Intervals
3.1 Introduction
3.1.1 Learning Objectives
3.2 Tests and Confidence Intervals
3.2.1 Confidence Intervals
3.2.2 Hypothesis Testing
3.2.3 Using Hypothesis Testing to Make Decisions
3.2.4 Type I and Type II Errors and Power
3.2.5 Quiz: Tests and Confidence Intervals
3.3 1Sample tTest
3.3.1 Basic Concepts
3.3.2 Individual Value Plots
3.3.3 1Sample tTest Results
3.3.4 Assumptions
3.3.5 Quiz: 1Sample tTest
3.3.6 Minitab Tools: 1Sample tTest
3.3.7 Exercise: 1Sample tTest
3.4 2 Variances Test
3.4.1 Basic Concepts
3.4.2 Boxplots
3.4.3 2 Variances Test Results 3.4.4 Assumptions
3.4.5 Quiz: 2 Variances Test
3.4.6 Minitab Tools: 2 Variances Test
3.4.7 Exercise: 2 Variances Test
3.5 2Sample tTest
3.5.1 Basic Concepts
3.5.2 Individual Value Plot
3.5.3 2Sample tTest Results
3.5.4 Assumptions
3.5.5 Quiz: 2Sample tTest
3.5.6 Minitab Tools: 2Sample tTest
3.5.7 Exercise: 2Sample tTest
3.6 Paired tTest
3.6.1 Basic Concepts
3.6.2 Individual Value Plots
3.6.3 Paired tTest Results
3.6.4 Assumptions
3.6.5 Quiz: Paired tTest
3.6.6 Minitab Tools: Paired tTest
3.6.7 Exercise: Paired tTest
3.7 1 Proportion Test
3.7.1 Basic Concepts
3.7.2 1 Proportion Test Results
3.7.3 Assumptions
3.7.4 Quiz: 1 Proportion Test
3.7.5 Minitab Tools: 1 Proportion Test
3.7.6 Exercise: 1 Proportion Test
3.8 2 Proportions Test
3.8.1 Basic Concepts
3.8.2 2 Proportions Test Results
3.8.3 Assumptions
3.8.4 Quiz: 2 Proportions Test
3.8.5 Minitab Tools: 2 Proportions Test
3.8.6 Exercise: 2 Proportions Test
3.9 ChiSquare Test
3.9.1 Basic Concepts
3.9.2 ChiSquare Test Results
3.9.3 Assumptions
3.9.4 Quiz: ChiSquare Test
3.9.5 Minitab Tools: ChiSquare Test
3.9.6 Exercise: ChiSquare Test
3.10 Summary
3.10.1 Objectives Review
Chapter 4: Control Charts
4.1 Introduction
4.1.1 Learning Objectives
4.2 Statistical Process Control
4.2.1 Basic Concepts
4.2.2 Patterns in Control Charts
4.2.3 Quiz: Statistical Process Control
4.3 Control Charts for Variables Data in Subgroups
4.3.1 Basic Concepts
4.3.2 R Charts
4.3.3 S Charts
4.3.4 Xbar Charts
4.3.5 Quiz: Control Charts for Variables Data in Subgroups
4.3.6 Minitab Tools: XbarR Chart
4.3.7 Exercise: XbarR Chart
4.4 Control Charts for Individual Observations
4.4.1 Basic Concepts
4.4.2 Moving Range Charts
4.4.3 Individuals Charts
4.4.4 Quiz: Control Charts for Individual Observations
4.4.5 Minitab Tools: IMR Chart
4.4.6 Exercise: IMR Chart
4.5 Control Charts for Attribute Data
4.5.1 Basic Concepts
4.5.2 NP and P Charts
4.5.3 C and U Charts
4.5.4 Quiz: Control Charts for Attributes Data
4.5.5 Minitab Tools: P Chart
4.5.6 Exercise: P Chart
4.6 Summary
4.6.1 Objectives Review
Chapter 5: Process Capability
5.1 Introduction
5.1.1 Learning Objectives
5.2 Process Capability for Normal Data
5.2.1 Basic Concepts
5.2.2 Assumptions
5.2.3 Testing for Normality
5.2.4 Quiz: Process Capability for Normal Data
5.2.5 Minitab Tools: Normality Test
5.2.6 Exercise: Assumptions for Process Capability
5.3 Capability Indices
5.3.1 Potential Capability: Cp and Cpk
5.3.2 Process Performance: Pp and Ppk
5.3.3 Sigma Level
5.3.4 Quiz: Capability Indices
5.3.5 Minitab Tools: Cp and Pp
5.3.6 Minitab Tools: Sigma Level
5.3.7 Exercise: Process Capability for Normal Data
5.4 Process Capability for Nonnormal Data
5.4.1 Transformations and Alternate Distributions
5.4.2 BoxCox Transformation
5.4.3 Johnson Transformation
5.4.4 Alternate Distributions
5.4.5 Quiz: Process Capability for Nonormal Data
5.4.6 Minitab Tools: BoxCox Transformation
5.4.7 Minitab Tools: Johnson Transformation
5.4.8 Minitab Tools: Capability Analysis with Johnson Transformation
5.4.9 Minitab Tools: Alternate Distributions
5.4.10 Minitab Tools: Capability Analysis with Alternate Distributions
5.4.11 Exercise: Process Capability with Data Tranformations
5.4.12 Exercise: Process Capability with Alternate Distributions
5.5 Summary
5.5.1 Objectives Review
Chapter 6: Analysis of Variance (ANOVA)
6.1 Introduction
6.1.1 Learning Objectives
6.2 Fundamentals of ANOVA
6.2.1 Basic Concepts
6.2.2 Graphs and Summary Statistics
6.2.3 Quiz: Fundamentals of ANOVA
6.3 OneWay ANOVA
6.3.1 Hypothesis Tests
6.3.2 FStatistics and PValues
6.3.3 Multiple Comparisons
6.3.4 Assumptions and Residual Plots
6.3.5 Quiz: OneWay ANOVA
6.3.6 Minitab Tools: OneWay ANOVA
6.3.7 Exercise: OneWay ANOVA
6.4 TwoWay ANOVA
6.4.1 Basic Concepts
6.4.2 Graphs
6.4.3 Hypothesis Tests
6.4.4 FStatistics and PValues
6.4.5 Assumptions and Residual Plots
6.4.6 Quiz: TwoWay ANOVA
6.4.7 Minitab Tools: TwoWay ANOVA
6.4.8 Exercise: TwoWay ANOVA
6.5 Summary
6.5.1 Summary of ANOVA
Chapter 7: Correlation and Regression
7.1 Introduction
7.1.1 Learning Objectives
7.2 Relationship Between Two Quantitative Variables
7.2.1 Basic Concepts
7.2.2 Scatterplot
7.2.3 Correlation
7.2.4 Quiz: Relationship Between Two Quantitative Variables
7.2.5 Minitab Tools: Scatterplot
7.2.6 Minitab Tools: Correlation
7.2.7 Exercise: Scatterplots and Correlation
7.3 Simple Regression
7.3.1 Basic Concepts
7.3.2 Regression
7.3.3 Hypothesis Tests and R2
7.3.4 Assumptions and Residual Plots
7.3.5 Quiz: Simple Regression
7.3.6 Minitab Tools: Simple Regression
7.3.7 Exercise: Simple Regression
7.4 Summary
7.4.1 Objectives Review
Chapter 8: Measurement Systems Analysis
8.1 Introduction
8.1.1 Learning Objectives
8.2 Fundamentals of Measurement Systems Analysis
8.2.1 Basic Concepts
8.2.2 Accuracy
8.2.3 Precision
8.2.4 Comparing Accuracy and Precision
8.2.5 Quiz: Fundamentals of Measurement Systems Analysis
8.3 Repeatability and Reproducibility
8.3.1 Basic Concepts
8.3.2 Gage R&R Studies
8.3.3 Quiz: Repeatability and Reproducibility
8.4 Graphical Analysis of a Gage R&R Study
8.4.1 Basic Concepts
8.4.2 Components of Variation
8.4.3 Xbar and R Charts
8.4.4 Interaction between Operator and Part
8.4.5 Comparative Plots
8.4.6 Gage Run Charts
8.4.7 Quiz: Graphical Analysis of a Gage R&R Study
8.4.8 Minitab Tools: Crossed Gage R&R Study
8.4.9 Minitab Tools: Gage Run Chart
8.4.10 Exercise: Graphical Analysis of a Gage R&R Study
8.5 Variation
8.5.1 Standard Deviation and Study Variation
8.5.2 Tolerance
8.5.3 Process Variation
8.5.4 Quiz: Variation
8.5.5 Exercise: Numerical Analysis of a Gage R&R Study
8.6 ANOVA with a Gage R&R Study
8.6.1 Variance Components
8.6.2 Analysis of Variance Tables
8.6.3 Quiz: ANOVA with a Gage R&R Study
8.6.4 Exercise: ANOVA Output for a Gage R&R Study
8.7 Gage Linearity and Bias Study
8.7.1 Basic Concepts
8.7.2 Gage Linearity
8.7.3 Gage Bias
8.7.4 Quiz: Gage Linearity and Bias Study
8.7.5 Minitab Tools: Gage Linearity and Bias Study
8.7.6 Exercise: Gage Linearity and Bias Study
8.8 Attribute Agreement Analysis
8.8.1 Basic Concepts
8.8.2 Binary Data
8.8.3 Nominal Data
8.8.4 Ordinal Data
8.8.5 Quiz: Attribute Agreement Analysis
8.8.6 Minitab Tools: Attribute Agreement Analysis with Binary Data
8.8.7 Minitab Tools: Attribute Agreement Analysis with Nominal Data
8.8.8 Minitab Tools: Attribute Agreement Analysis with Ordinal Data
8.8.9 Exercise: Attribute Agreement Analysis
8.9 Summary
8.9.1 Objectives Review
Chapter 9: Design of Experiments
9.1 Introduction
9.1.1 Learning Objectives
9.2 Factorial Designs
9.2.1 Basic Concepts
9.2.2 Creating Full Factorial Designs
9.2.3 Analyzing Full Factorial Designs
9.2.4 Quiz: Factorial Designs
9.2.5 Minitab Tools: Create a Full Factorial Design
9.2.6 Minitab Tools: Analyze a Full Factorial Design
9.2.7 Exercise: Create a Full Factorial Design
9.2.8 Exercise: Analyze a Full Factorial Design
9.3 Blocking and Incorporating Center Points
9.3.1 Blocking
9.3.2 Center Points
9.3.3 Analyzing Designs with Blocks and Center Points
9.3.4 Quiz: Blocking and Incorporating Center Points
9.3.5 Minitab Tools: Create a Factorial Design with Blocks and Center Points
9.3.6 Minitab Tools: Analyze a Factorial Design with Blocks and Center Points
9.3.7 Exercise: Create a Factorial Design with Blocks and Center Points
9.3.8 Exercise: Analyze a Factorial Design with Blocks and Center Points
9.4 Fractional Factorial Designs
9.4.1 Basic Concepts
9.4.2 Creating Fractional Factorial Designs
9.4.3 Analyzing Fractional Factorial Designs
9.4.4 Quiz: Fractional Factorial Designs
9.4.5 Minitab Tools: Create a Fractional Factorial Design
9.4.6 Minitab Tools: Analyze a Fractional Factorial Design
9.5 Response Optimization
9.5.1 Response Optimization
9.5.2 Quiz: Response Optimization
9.5.3 Minitab Tools: Response Optimization
9.5.4 Exercise: Response Optimization
9.6 Summary
9.6.1 Objectives Review 
advspsspas 
Advanced Statistics using SPSS Predictive Analytics Software 
28 hours 
Goal:
Mastering the skill work independently with the program SPSS for advanced use, dialog boxes, and command language syntax for the selected analytical techniques.
The addressees:
Analysts, researchers, scientists, students and all those who want to acquire the ability to use SPSS package and advanced level and learn the selected statistical models. Training takes universal analysis problems and it is dedicated to a specific industry
Preparation of a database for analysis
management of data collection
operations on variables
transforming the variables selected functions (logarithmic, exponential, etc.)
Parametric and nonparametric statistics, or how to fit a model to the data
measuring scale
distribution type
outliers and influential observations (outliers)
sample size
central limit theorem
Study the differences between the characteristics of statistical
tests based on the average and media
Analysis of correlation and similarities
correlations
principal component analysis
cluster analysis
Prediction  single regression analysis and multivariate
method of least squares
Linear Model
instrumental variable regression models (dummy, effect, orthogonal coding)
Statistical Inference 
datacolmtd 
Data Collection Methods 
14 hours 
Method of data collection
Survey design (including questionnaire and question design)
Different types of surveys (crosssectional/time series/panel)
Measurement bias
Framing bias
Response bias
Nonresponse analysis
Methods used to help correct for bias (e.g. weighting)
Data linkage (e.g. linking survey data with administrative data)
Assessing data quality & validating data

tableauvra 
Visual Reporting and Analysis with Tableau 
7 hours 
Connecting to Data
Connecting to various databases – data connection types
Multiple data sources & data blending
Creating Basic Visualizations
Sorting, Filtering, Organizing data
Using Multiple Measures on the Same Axis
Showing the Relationship between Numerical Values
Mapping Data Geographically
Tableau geocoding – advanced mapping + using Background Images
Basic calculations and aggregations
Parameters, references lines
Overview of additional visualizations
Dashboards: quick filters, actions, and parameters
Advanced calculations
Tips & tricks – parameters, calculations, sorting, filtering etc.
Best practices when using Tableau

excelafd 
Analysing Financial Data in Excel 
14 hours 
Audience
Financial or market analysts, managers, accountants
Course Objectives
Facilitate and automate all kinds of financial analysis with Microsoft Excel
Advanced functions
Logical functions
Math and statistical functions
Financial functions
Lookups and data tables
Using lookup functions
Using MATCH and INDEX
Advanced list management
Validating cell entries
Exploring database functions
PivotTables and PivotCharts
Creating Pivot Tables
Calculated Item and Calculated Field
Working with External Data
Exporting and importing
Exporting and importing XML data
Querying external databases
Linking to a database
Linking to a XML data source
Analysing online data (Web Queries)
Analytical options
Goal Seek
Solver
The Analysis ToolPack
Scenarios
Macros and custom functions
Running and recording a macro
Working with VBA code
Creating functions
Conditional formatting and SmartArt
Conditional formatting with graphics
SmartArt graphics

nlpwithr 
NLP: Natural Language Processing with R 
21 hours 
It is estimated that unstructured data accounts for more than 90 percent of all data, much of it in the form of text. Blog posts, tweets, social media, and other digital publications continuously add to this growing body of data.
This course centers around extracting insights and meaning from this data. Utilizing the R Language and Natural Language Processing (NLP) libraries, we combine concepts and techniques from computer science, artificial intelligence, and computational linguistics to algorithmically understand the meaning behind text data. Data samples are available in various languages per customer requirements.
By the end of this training participants will be able to prepare data sets (large and small) from disparate sources, then apply the right algorithms to analyze and report on its significance.
Audience
Linguists and programmers
Format of the course
Part lecture, part discussion, heavy handson practice, occasional tests to gauge understanding
Introduction
NLP and R vs Python
Installing and configuring R Studio
Installing R packages related to Natural Language Processing (NLP).
An overview of R’s text manipulation capabilities
Getting started with an NLP project in R
Reading and importing data files into R
Text manipulation with R
Document clustering in R
Parts of speech tagging in R
Sentence parsing in R
Working with regular expressions in R
Namedentity recognition in R
Topic modeling in R
Text classification in R
Working with very large data sets
Visualizing your results
Optimization
Integrating R with other languages (Java, Python, etc.)
Closing remarks 
descstats 
Descriptive Statistics 
14 hours 
This course will cover averages (mean, median, mode, proportions, etc), dispersions (variance, standard deviation, etc), contingency tables (cross tabs, etc), graphs/charts
Types of data
Distributions
Central tendency – mean, median, mode
Measures of dispersion  variance, standard deviation
Standard error
Central Limit Theorem and Law of Large numbers
Confidence intervals, p values
Hypothesis testing, statistical significance
Covariance and correlation
Causal versus descriptive inference
Stated versus revealed preference
Choosing optimal sample size ex ante
Output (tables and graphs)

spssanal 
Statistical Analysis using SPSS 
21 hours 
Getting started with SPSS
Obtaining, Editing, and saving Statstical output
Manipulating Data
Descriptive Statistics Procedures
Evaluating Score Distribution Assumptions
t Tests
Univariate Group Differences: Anova and Ancova
Multivariate Group Dfferences: Manova
Nonparametric procedures for ananlysing frequesncy data
Correlations
Regression with Quantitative Variables
Regression with Categorical Variables
Principal Components Analysys and Factor Analysis

mrkfct 
Market Forecasting 
14 hours 
Audience
This course has been created for analysts, forecasters wanting to introduce or improve forecasting which can be related to sale forecasting, economic forecasting, technology forecasting, supply chain management and demand or supply forecasting.
Description
This course guides delegates through series of methodologies, frameworks and algorithms which are useful when choosing how to predict the future based on historical data.
It uses standard tools like Microsoft Excel or some Open Source programs (notably R project).
The principles covered in this course can be implemented by any software (e.g. SAS, SPSS, Statistica, MINITAB ...)
Problems facing forecasters
Customer demand planning
Investor uncertainty
Economic planning
Seasonal changes in demand/utilization
Roles of risk and uncertainty
Time series methods
Moving average
Exponential smoothing
Extrapolation
Linear prediction
Trend estimation
Growth curve
Econometric methods (casual methods)
Regression analysis using linear regression or nonlinear regression
Autoregressive moving average (ARMA)
Autoregressive integrated moving average (ARIMA)
Econometrics
Judgemental methods
Surveys
Delphi method
Scenario building
Technology forecasting
Forecast by analogy
Simulation and other methods
Simulation
Prediction market
Probabilistic forecasting and Ensemble forecasting
Reference class forecasting

StaEcoMod 
Statistical and Econometric Modelling 
21 hours 
The Nature of Econometrics and Economic Data
Econometrics and models
Steps in econometric modelling
Types of economic data, time series, crosssectional, panel
Causality in econometric analysis
Specification and Data Issues
Functional form
Proxy variables
Measurement error in variables
Missing data, outliers, influential observations
Regression Analysis
Estimation
Ordinary least squares (OLS) estimators
Classical OLS assumptions,
Gauss MarkovTheorem
Best Linear Unbiased Estimators
Inference
Testing statistical significance of parameters ttest(single, group)
Confidence intervals
Testing multiple linear restrictions, Ftest
Goodness of fit
Testing functional form
Missing variables
Binary variables
Testing for violation of assumptions and their implications:
Heteroscedasticity
Autocorrelation
Multicolinearity
Endogeneity
Other Estimation techniques
Instrumental Variables Estimation
Generalised Least Squares
Maximum Likelihood
Generalised Method of Moments
Models for Binary Response Variables
Linear Probability Model
Probit Model
Logit Model
Estimation
Interpretation of parameters, Marginal Effects
Goodness of Fit
Limited Dependent Variables
Tobit Model
Truncated Normal Distribution
Interpretation of Tobit Model
Specification and Estimation Issues
Time Series Models
Characteristics of Time Series
Decomposition of Time Series
Exponential Smoothing
Stationarity
ARIMA models
CoIntegration
ECM model
Predictive Analysis
Forecasting, Planning and Goals
Steps in Forecasting
Evaluating Forecast Accuracy
Redisual Diagnostics
Prediction Intervals

intror 
Introduction to R with Time Series Analysis 
21 hours 
Introduction and preliminaries
Making R more friendly, R and available GUIs
Rstudio
Related software and documentation
R and statistics
Using R interactively
An introductory session
Getting help with functions and features
R commands, case sensitivity, etc.
Recall and correction of previous commands
Executing commands from or diverting output to a file
Data permanency and removing objects
Simple manipulations; numbers and vectors
Vectors and assignment
Vector arithmetic
Generating regular sequences
Logical vectors
Missing values
Character vectors
Index vectors; selecting and modifying subsets of a data set
Other types of objects
Objects, their modes and attributes
Intrinsic attributes: mode and length
Changing the length of an object
Getting and setting attributes
The class of an object
Arrays and matrices
Arrays
Array indexing. Subsections of an array
Index matrices
The array() function
The outer product of two arrays
Generalized transpose of an array
Matrix facilities
Matrix multiplication
Linear equations and inversion
Eigenvalues and eigenvectors
Singular value decomposition and determinants
Least squares fitting and the QR decomposition
Forming partitioned matrices, cbind() and rbind()
The concatenation function, (), with arrays
Frequency tables from factors
Lists and data frames
Lists
Constructing and modifying lists
Concatenating lists
Data frames
Making data frames
attach() and detach()
Working with data frames
Attaching arbitrary lists
Managing the search path
Data manipulation
Selecting, subsetting observations and variables
Filtering, grouping
Recoding, transformations
Aggregation, combining data sets
Character manipulation, stringr package
Reading data
Txt files
CSV files
XLS, XLSX files
SPSS, SAS, Stata,… and other formats data
Exporting data to txt, csv and other formats
Accessing data from databases using SQL language
Probability distributions
R as a set of statistical tables
Examining the distribution of a set of data
One and twosample tests
Grouping, loops and conditional execution
Grouped expressions
Control statements
Conditional execution: if statements
Repetitive execution: for loops, repeat and while
Writing your own functions
Simple examples
Defining new binary operators
Named arguments and defaults
The '...' argument
Assignments within functions
More advanced examples
Efficiency factors in block designs
Dropping all names in a printed array
Recursive numerical integration
Scope
Customizing the environment
Classes, generic functions and object orientation
Graphical procedures
Highlevel plotting commands
The plot() function
Displaying multivariate data
Display graphics
Arguments to highlevel plotting functions
Basic visualisation graphs
Multivariate relations with lattice and ggplot package
Using graphics parameters
Graphics parameters list
Time series Forecasting
Seasonal adjustment
Moving average
Exponential smoothing
Extrapolation
Linear prediction
Trend estimation
Stationarity and ARIMA modelling
Econometric methods (casual methods)
Regression analysis
Multiple linear regression
Multiple nonlinear regression
Regression validation
Forecasting from regression

statsres 
Statistics for Researchers 
35 hours 
This course aims to give researchers an understanding of the principles of statistical design and analysis and their relevance to research in a range of scientific disciplines.
It covers some probability and statistical methods, mainly through examples. This training contains around 30% of lectures, 70% of guided quizzes and labs.
In the case of closed course we can tailor the examples and materials to a specific branch (like psychology tests, public sector, biology, genetics, etc...)
In the case of public courses, mixed examples are used.
Though various software is used during this course (Microsoft Excel to SPSS, Statgraphics, etc...) its main focus is on understanding principles and processes guiding research, reasoning and conclusion.
This course can be delivered as a blended course i.e. with homework and assignments.
Scientific Method, Probability & Statistics
Very short history of statistics
Why can be "confident" about the conclusions
Probability and decision making
Preparation for research (deciding "what" and "how")
The big picture: research is a part of a process with inputs and outputs
Gathering data
Questioners and measurement
What to measure
Observational Studies
Design of Experiments
Analysis of Data and Graphical Methods
Research Skills and Techniques
Research Management
Describing Bivariate Data
Introduction to Bivariate Data
Values of the Pearson Correlation
Guessing Correlations Simulation
Properties of Pearson's r
Computing Pearson's r
Restriction of Range Demo
Variance Sum Law II
Exercises
Probability
Introduction
Basic Concepts
Conditional Probability Demo
Gamblers Fallacy Simulation
Birthday Demonstration
Binomial Distribution
Binomial Demonstration
Base Rates
Bayes' Theorem Demonstration
Monty Hall Problem Demonstration
Exercises
Normal Distributions
Introduction
History
Areas of Normal Distributions
Varieties of Normal Distribution Demo
Standard Normal
Normal Approximation to the Binomial
Normal Approximation Demo
Exercises
Sampling Distributions
Introduction
Basic Demo
Sample Size Demo
Central Limit Theorem Demo
Sampling Distribution of the Mean
Sampling Distribution of Difference Between Means
Sampling Distribution of Pearson's r
Sampling Distribution of a Proportion
Exercises
Estimation
Introduction
Degrees of Freedom
Characteristics of Estimators
Bias and Variability Simulation
Confidence Intervals
Exercises
Logic of Hypothesis Testing
Introduction
Significance Testing
Type I and Type II Errors
One and TwoTailed Tests
Interpreting Significant Results
Interpreting NonSignificant Results
Steps in Hypothesis Testing
Significance Testing and Confidence Intervals
Misconceptions
Exercises
Testing Means
Single Mean
t Distribution Demo
Difference between Two Means (Independent Groups)
Robustness Simulation
All Pairwise Comparisons Among Means
Specific Comparisons
Difference between Two Means (Correlated Pairs)
Correlated t Simulation
Specific Comparisons (Correlated Observations)
Pairwise Comparisons (Correlated Observations)
Exercises
Power
Introduction
Example Calculations
Factors Affecting Power
Exercises
Prediction
Introduction to Simple Linear Regression
Linear Fit Demo
Partitioning Sums of Squares
Standard Error of the Estimate
Prediction Line Demo
Inferential Statistics for b and r
Exercises
ANOVA
Introduction
ANOVA Designs
OneFactor ANOVA (BetweenSubjects)
OneWay Demo
MultiFactor ANOVA (BetweenSubjects)
Unequal Sample Sizes
Tests Supplementing ANOVA
WithinSubjects ANOVA
Power of WithinSubjects Designs Demo
Exercises
Chi Square
Chi Square Distribution
OneWay Tables
Testing Distributions Demo
Contingency Tables
2 x 2 Table Simulation
Exercises
Case Studies
Analysis of selected case studies 
mrkanar 
Marketing Analytics using R 
21 hours 
Audience:
Business owners (marketing managers, product managers, customer base managers) and their teams; customer insights professionals.
Overview:
The course follows the customer life cycle from acquiring new customers, managing the existing customers for profitability, retaining good customers, and finally understanding which customers are leaving us and why. We will be working with real (if anonymous) data from a variety of industries including telecommunications, insurance, media, and high tech.
Format:
Instructorled training over the course of five halfday sessions with inclass exercises as well as homework. It can be delivered as a classroom or distance (online) course.
Part 1: Inflow  acquiring new customers
Our focus is direct marketing so we will not look at advertising campaigns but instead focus on understanding marketing campaigns (e.g. direct mail). This is the foundation for almost everything else in the course.
We look at measuring and improving campaign effectiveness. including:
The importance of test and control groups. Universal control group.
Techniques: Lift curves, AUC
Return on investment. Optimizing marketing spend.
Part 2: Base Management: managing existing customers
Considering the cost of acquiring new customers for many businesses there are probably few assets more valuable than their existing customer base, though few think of it in this way. Topics include:
1. Crossselling and upselling: Offering the right product or service to the customer at the right time.
Techniques: RFM models. Multinomial regression.
b. Value of lifetime purchases.
2. Customer segmentation: Understanding the types of customers that you have.
Classification models using first simple decision trees, and then
random forests and other, newer techniques.
Part 3: Retention: Keeping your good customers
Understanding which customers are likely to leave and what you can do about it is key to profitability in many industries, especially where there are repeat purchases or subscriptions. We look at propensity to churn models, including
Logistic regression: glm (package stats) and newer techniques (especially gbm as a general tool)
Tuning models (caret) and introduction to ensemble models.
Part 4: Outflow: Understanding who are leaving and why
Customers will leave you – that is a fact of life. What is important is to understand who are leaving and why. Is it low value customers who are leaving or is it your best customers? Are they leaving to competitors or because they no longer need your products and services? Topics include:
Customer lifetime value models: Combining value of purchases with propensity to churn and the cost of servicing and retaining the customer.
Analysing survey data. (Generally useful, but we will do a brief introduction here in the context of exit surveys.)

datashrinkgov 
Data Shrinkage for Government 
14 hours 
Why shrink data
Relational databases
Introduction
Aggregation and disaggregation
Normalisation and denormalisation
Null values and zeroes
Joining data
Complex joins
Cluster analysis
Applications
Strengths and weaknesses
Measuring distance
Hierarchical clustering
Kmeans and derivatives
Applications in Government
Factor analysis
Concepts
Exploratory factor analysis
Confirmatory factor analysis
Principal component analysis
Correspondence analysis
Software
Applications in Government
Predictive analytics
Timelines and naming conventions
Holdout samples
Weights of evidence
Information value
Scorecard building demonstration using a spreadsheet
Regression in predictive analytics
Logistic regression in predictive analytics
Decision Trees in predictive analytics
Neural networks
Measuring accuracy
Applications in Government

kylin 
Apache Kylin: From classic OLAP to realtime data warehouse 
14 hours 
Apache Kylin is an extreme, distributed analytics engine for big data.
In this instructorled live training, participants will learn how to use Apache Kylin to set up a realtime data warehouse.
By the end of this training, participants will be able to:
Consume realtime streaming data using Kylin
Utilize Apache Kylin's powerful features, including snowflake schema support, a rich SQL interface, spark cubing and subsecond query latency
Note
We use the latest version of Kylin (as of this writing, Apache Kylin v2.0)
Audience
Big data engineers
Big Data analysts
Format of the course
Part lecture, part discussion, exercises and heavy handson practice
To request a customized course outline for this training, please contact us. 
sspsspas 
Statistics with SPSS Predictive Analytics Software 
14 hours 
Goal:
Learning to work with SPSS at the level of independence
The addressees:
Analysts, researchers, scientists, students and all those who want to acquire the ability to use SPSS package and learn popular data mining techniques.
Using the program
The dialog boxes
input / downloading data
the concept of variable and measuring scales
preparing a database
Generate tables and graphs
formatting of the report
Command language syntax
automated analysis
storage and modification procedures
create their own analytical procedures
Data Analysis
descriptive statistics
Key terms: eg variable, hypothesis, statistical significance
measures of central tendency
measures of dispersion
measures of central tendency
standardization
Introduction to research the relationships between variables
correlational and experimental methods
Summary: This case study and discussion

surveyrste 
Survey Research, Sampling Techniques & Estimation 
14 hours 
Survey research:
Principle of sample survey design and implementation
survey preliminaries
sampling methods (probability & nonprobability methods)
population & sampling frames
survey data collection methods
Questionnaire design
Design and writing of questionnaires
Pretests & piloting
Planning & organisation of surveys
Minimising errors, bias & nonresponse at the design stage
Survey data processing
Commissioning surveys/research
Sample Techniques & Estimation:
Sampling techniques and their strengths/weaknesses (may overlap above sampling methods)
Simple Random Sampling
Unequal Probability Sampling
Stratified Sampling (with proportional to size & disproportional selection)
Systematic Sampling
Cluster sampling
Multistage Sampling
Quota Sampling
Estimation
Methods of estimating sample sizes
Estimating population parameters using sample estimates
Variance and confidence intervals estimation
Estimating bias/precision
Methods of correcting bias
Methods of handling missing data
Nonresponse analysis

tbladv 
Tableau Advanced 
14 hours 
Introduction and Getting Started
Filtering, Sorting & Grouping
Advanced options for filtering and hiding
Understanding many options for ordering and grouping your data
Sort, Groups, Bins, Sets
Interrelation between all options
Working with Data in Tableau
Dimension versus Measures
Data types, Discrete versus Continous
Joining Database sources,
Inner, Left, Right join
Blending different datasources in a single worksheet
Working with extracts instead of live connections
Data quality problems
Metadata and sharing a connection
Calculations on Data and Statistics
Rowlevel calculations
Aggregate calculations
Arithmetic, string, date calculations
Custom aggregations and calculated fields
Controlflow calculations
What is behind the scene
Advanced Statistics
Working with dates and times
Table Calculations
Quick table calculations
Scope and direction
Addressing and partitioning
Advanced table calculations
Advanced Geo techniques
Building basic maps
Geographic fields, map options
Customizing a geographic view
Web Map Service
Visualizing non geographical data with background images
Mapping tips
Distance Calculations
Parameters in tableau
Creating parameters
Parameters in calculated fields
Parameter control options
Enhancing analysis and visualizations with parameters
Building Advanced Chart Visualizations
Bar chart variations –bullet, barinbar, highlights chart
Date and time visualizations, gantt charts
Stacked bars, treemaps, area charts, pie charts
Heat map
KPI chart
Pareto chart
Bullet chart
Advanced formattting
Labels
Legends
Highlighting
Annotations
Telling a data story with Dashboards
Dashboard framework
Filter actions
Highlight actions
URL actions
Cascading filters
Trends and Forecasting
Understanding and Customizing trend lines
Distributions
Forecasting
Integrating Tableau and R for advanced data analytics
Possibility to include different data analytics methods in R on participants request

sixsigmayb 
Six Sigma Yellow Belt 
21 hours 
Yellow Belt covers the basics of the Six Sigma Define Measure Analyse Improve Control (DMAIC) approach enabling delegates to take part and lead team based waste and defect reduction projects and initiatives. In addition emphasis is placed on applying the problem solving tools into daily roles.
At the end of the course you will be equipped to look at your immediate team and role, determine what can be improved and create a business improvement project on a selected opportunity that is aligned to customer requirements.
You will be able to analyse the process using visualization tools and identify the waste (nonvalue adding) components and work to eliminate these from the process. You will apply root cause analysis techniques to identify the underlying causes of defects in the process.
The course uses simulations, case study exercises and work based projects to enable delegates to 'learn through doing'.
Notes: This course has a minimum class size of 4. And if requested this course can be delivered in 2 days with some reductions to the course content and level of detail in some areas, notably Customer needs; Graphical analysis and Process handover.
An overview of project selection and scoping
Understanding customer needs and how they impact project aims
Discovering processes using visualisation techniques
Understanding the causes of work and how to simplify
Finding and removing process waste
Graphical analysis to understand process performance
Problem solving tools to determine root cause
Basic solution creation
Piloting & implementation
Process handover

samr 
Statistic analysis in market research 
28 hours 
Goal: Improving consumer behavior researcher workshop products and services
Addressees The researchers, market analysts, managers and employees of marketing departments, sales departments primarily pharmaceutical and FMCG, students of socioeconomic and everyone interested in market research
Module 1 Quantitative research
Pretreatment results
check the accuracy of the database
control of missing data
weighting observations
Statistical models
multiple regression
conjoint analysis
classification trees
Automate procedures in tracking studies
Analysis of data from a marketing experiment
The report and draw conclusions
Module 2 Qualitative Research
The transformation of qualitative data into a quantitative
Statistical models for qualitative data

ModelFore4Gov 
Modelling and Forecasting for Government 
14 hours 
Modelling in government
Hypothesis testing
Why test a hypothesis?
Type I and type II errors
Estimating the tax gap
Case studies using econometric & regression models in government
Forecasting and time series models in government
Shocks, trends and seasonality
Forecasting tax receipts using regression and econometric modelling
Sensitivity analysis and validation
Prediction validation techniques.
Hold out samples
Prediction intervals
Comparative analysis of forecasts
Forecast Performance Measures

dataar 
Data Analytics With R 
21 hours 
R is a very popular, open source environment for statistical computing, data analytics and graphics. This course introduces R programming language to students. It covers language fundamentals, libraries and advanced concepts. Advanced data analytics and graphing with real world data.
Audience
Developers / data analytics
Duration
3 days
Format
Lectures and Handson
Day One: Language Basics
Course Introduction
About Data Science
Data Science Definition
Process of Doing Data Science.
Introducing R Language
Variables and Types
Control Structures (Loops / Conditionals)
R Scalars, Vectors, and Matrices
Defining R Vectors
Matricies
String and Text Manipulation
Character data type
File IO
Lists
Functions
Introducing Functions
Closures
lapply/sapply functions
DataFrames
Labs for all sections
Day Two: Intermediate R Programming
DataFrames and File I/O
Reading data from files
Data Preparation
Builtin Datasets
Visualization
Graphics Package
plot() / barplot() / hist() / boxplot() / scatter plot
Heat Map
ggplot2 package ( qplot(), ggplot())
Exploration With Dplyr
Labs for all sections
Day Three: Advanced Programming With R
Statistical Modeling With R
Statistical Functions
Dealing With NA
Distributions (Binomial, Poisson, Normal)
Regression
Introducing Linear Regressions
Recommendations
Text Processing (tm package / Wordclouds)
Clustering
Introduction to Clustering
KMeans
Classification
Introduction to Classification
Naive Bayes
Decision Trees
Training using caret package
Evaluating Algorithms
R and Big Data
Connecting R to databases
Big Data Ecosystem
Labs for all sections

statsman 
Statistics for Managers 
35 hours 
This course has been created for decision makers whose primary goal is not to do the calculation and the analysis, but to understand them.
The course uses a lot of pictures, diagrams, computer simulations, anecdotes and sense of humour to explain concepts and pitfalls of statistics.
Introduction to Statistics
What are Statistics?
Importance of Statistics
Descriptive Statistics
Inferential Statistics
Variables
Percentiles
Measurement
Levels of Measurement
Basics of Data Collection
Distributions
Summation Notation
Linear Transformations
Common Pitfalls
Biased samples
Average, mean or median?
Misleading graphs
Semiattached figures
Third variable problem
Ceteris paribus
Errors in reasoning
Understanding confidence level
Understanding Results
Describing Bivariate Data
Probability
Normal Distributions
Sampling Distributions
Estimation
Logic of Hypothesis Testing
Testing Means
Power
Prediction
ANOVA
Chi Square
Case Studies
Discussion about case studies chosen by the delegates.

rneuralnet 
Neural Network in R 
14 hours 
This course is an introduction to applying neural networks in real world problems using Rproject software.
Introduction to Neural Networks
What are Neural Networks
What is current status in applying neural networks
Neural Networks vs regression models
Supervised and Unsupervised learning
Overview of packages available
nnet, neuralnet and others
differences between packages and itls limitations
Visualizing neural networks
Applying Neural Networks
Concept of neurons and neural networks
A simplified model of the brain
Opportunities neuron
XOR problem and the nature of the distribution of values
The polymorphic nature of the sigmoidal
Other functions activated
Construction of neural networks
Concept of neurons connect
Neural network as nodes
Building a network
Neurons
Layers
Scales
Input and output data
Range 0 to 1
Normalization
Learning Neural Networks
Backward Propagation
Steps propagation
Network training algorithms
range of application
Estimation
Problems with the possibility of approximation by
Examples
OCR and image pattern recognition
Other applications
Implementing a neural network modeling job predicting stock prices of listed

ImpEvalQuatAnal 
Impact Evaluation – Quantitative Analysis 
14 hours 
This course covers Impact evaluation and does not cover the broader design of evaluations.
Why evaluate
The evaluation lifecycle
Process and Impact evaluation
Counterfactuals and baselines
Exploring your options
Randomised control trial
Difference in differences (with practical exercise)
Regression discontinuity design
Propensity score matching
Interrupted time series
Instrumental variables

datavisR1 
Introduction to Data Visualization with R 
28 hours 
This course is intended for data engineers, decision makers and data analysts and will lead you to create very effective plots using R studio that appeal to decision makers and help them find out hidden information and take the right decisions
Day 1:
overview of R programming
introduction to data visualization
scatter plots and clusters
the use of noise and jitters
Day 2:
other type of 2D and 3D plots
histograms
heat charts
categorical data plotting
Day 3:
plotting KPIs with data
R and X charts examples
dashboards
parallel axes
mixing categorical data with numeric data
Day 4:
different hats of data visualization
disguised and hidden trends
case studies
saving plots and loading Excel files

pgmt 
The Practitioner’s Guide to Multivariate Techniques 
14 hours 
The introduction of the digital computer, and now the widespread availability of computer packages, has opened up a hitherto difficult area of statistics; multivariate analysis. Previously the formidable computing effort associated with these procedures presented a real barrier. That barrier has now disappeared and the analyst can therefore concentrate on an appreciation and an interpretation of the findings.
Multivariate Analysis of Variance (MANOVA)
Whereas the Analysis of Variance technique (ANOVA) investigates possible systematic differences between prescribes groups of individuals on a single variable, the technique of Multivariate Analysis of Variance is simply an extension of that procedure to numerous variates viewed collectively. These variates could be distinct in nature; for example Height, Weight etc, or repeated measures of a single variate over time or over space. When the variates are repeated measures over time or space, the analyses may often be reduced to a succession of univariate analyses, with easier interpretation. This procedure is often referred to as Repeated Measure Analysis.
Principal Component Analysis
If only two variates are recorded for a number of individuals, the data may conveniently be represented on a twodimensional plot. If there are ‘p’ variates then one could imagine a plot of the data in ‘p’ dimensional space. The technique of Principal Component Analysis corresponds to a rotation of the axes so that the maximum amounts of variation are progressively represented along the new axes. It has been described as …….‘peering into multidimensional space, from every conceivable angle, and selecting as the viewing angle that which contains the maximum amount of variation’ The aim therefore is a reduction of the dimensionality of multivariate data. If for example a very high percentage (say 90%) of the variability is contained in the first two principal components, a plot of these components would be a virtually complete pictorial representation of the variability.
Discriminant Analysis
Suppose that several variates are observed on individuals from two identified groups. The technique of discriminant analysis involves calculating that linear function of the variates that best separates out the groups. The linear function may therefore be used to identify group membership simply from the pattern of variates. Various methods are available to estimate the success in general of this identification procedure.
Canonical Variate Analysis
Canonical Variate Analysis is in essence an extension of Discriminant Analysis to accommodate the situation where there are more than two groups of individuals.
Cluster Analysis
Cluster Analysis as the name suggests involves identifying groupings (or clusters) of individuals in multidimensional space. Since here there is no ‘a priori’ grouping of individuals, the identification of so called clusters is a subjective process subject to various assumptions. Most computer packages offer several clustering procedures that may often give differing results. However the pictorial representation of the so called ‘clusters’, in diagrams called dendrograms, provides a very useful diagnostic.
Factor Analysis
If ‘p’ variates are observed on each of ‘n’ individuals, the technique of factor analysis attempts to identify say ‘r’ (< p) so called factors which determine to a large extent the variate values. The implicit assumption here therefore is that the entire array of ‘p’ variates is controlled by ‘r’ factors. For example the ‘p’ variates could represent the performance of students in numerous examination subjects, and we wish to determine whether a few attributes such as numerical ability, linguistic ability could account for much of the variability. The difficulties here stem from the fact that the socalled factors are not directly observable, and indeed may not really exist.
Factor analysis has been viewed very suspiciously over the years, because of the measure of speculation involved in the identification of factors. One popular numerical procedure starts with the rotation of axes using principal components (described above) followed by a rotation of the factors identified. 
apacheh 
Administrator Training for Apache Hadoop 
35 hours 
Audience:
The course is intended for IT specialists looking for a solution to store and process large data sets in a distributed system environment
Goal:
Deep knowledge on Hadoop cluster administration.
1: HDFS (17%)
Describe the function of HDFS Daemons
Describe the normal operation of an Apache Hadoop cluster, both in data storage and in data processing.
Identify current features of computing systems that motivate a system like Apache Hadoop.
Classify major goals of HDFS Design
Given a scenario, identify appropriate use case for HDFS Federation
Identify components and daemon of an HDFS HAQuorum cluster
Analyze the role of HDFS security (Kerberos)
Determine the best data serialization choice for a given scenario
Describe file read and write paths
Identify the commands to manipulate files in the Hadoop File System Shell
2: YARN and MapReduce version 2 (MRv2) (17%)
Understand how upgrading a cluster from Hadoop 1 to Hadoop 2 affects cluster settings
Understand how to deploy MapReduce v2 (MRv2 / YARN), including all YARN daemons
Understand basic design strategy for MapReduce v2 (MRv2)
Determine how YARN handles resource allocations
Identify the workflow of MapReduce job running on YARN
Determine which files you must change and how in order to migrate a cluster from MapReduce version 1 (MRv1) to MapReduce version 2 (MRv2) running on YARN.
3: Hadoop Cluster Planning (16%)
Principal points to consider in choosing the hardware and operating systems to host an Apache Hadoop cluster.
Analyze the choices in selecting an OS
Understand kernel tuning and disk swapping
Given a scenario and workload pattern, identify a hardware configuration appropriate to the scenario
Given a scenario, determine the ecosystem components your cluster needs to run in order to fulfill the SLA
Cluster sizing: given a scenario and frequency of execution, identify the specifics for the workload, including CPU, memory, storage, disk I/O
Disk Sizing and Configuration, including JBOD versus RAID, SANs, virtualization, and disk sizing requirements in a cluster
Network Topologies: understand network usage in Hadoop (for both HDFS and MapReduce) and propose or identify key network design components for a given scenario
4: Hadoop Cluster Installation and Administration (25%)
Given a scenario, identify how the cluster will handle disk and machine failures
Analyze a logging configuration and logging configuration file format
Understand the basics of Hadoop metrics and cluster health monitoring
Identify the function and purpose of available tools for cluster monitoring
Be able to install all the ecosystem components in CDH 5, including (but not limited to): Impala, Flume, Oozie, Hue, Manager, Sqoop, Hive, and Pig
Identify the function and purpose of available tools for managing the Apache Hadoop file system
5: Resource Management (10%)
Understand the overall design goals of each of Hadoop schedulers
Given a scenario, determine how the FIFO Scheduler allocates cluster resources
Given a scenario, determine how the Fair Scheduler allocates cluster resources under YARN
Given a scenario, determine how the Capacity Scheduler allocates cluster resources
6: Monitoring and Logging (15%)
Understand the functions and features of Hadoop’s metric collection abilities
Analyze the NameNode and JobTracker Web UIs
Understand how to monitor cluster Daemons
Identify and monitor CPU usage on master nodes
Describe how to monitor swap and memory allocation on all nodes
Identify how to view and manage Hadoop’s log files
Interpret a log file

rlang 
R 
21 hours 
Day 1
Introduction and preliminaries
Making R more friendly, R and available GUIs
Rstudio
Related software and documentation
R and statistics
Using R interactively
An introductory session
Getting help with functions and features
R commands, case sensitivity, etc.
Recall and correction of previous commands
Executing commands from or diverting output to a file
Data permanency and removing objects
Simple manipulations; numbers and vectors
Vectors and assignment
Vector arithmetic
Generating regular sequences
Logical vectors
Missing values
Character vectors
Index vectors; selecting and modifying subsets of a data set
Other types of objects
Objects, their modes and attributes
Intrinsic attributes: mode and length
Changing the length of an object
Getting and setting attributes
The class of an object
Ordered and unordered factors
A specific example
The function tapply() and ragged arrays
Ordered factors
Arrays and matrices
Arrays
Array indexing. Subsections of an array
Index matrices
The array() function
Mixed vector and array arithmetic. The recycling rule
The outer product of two arrays
Generalized transpose of an array
Matrix facilities
Matrix multiplication
Linear equations and inversion
Eigenvalues and eigenvectors
Singular value decomposition and determinants
Least squares fitting and the QR decomposition
Forming partitioned matrices, cbind() and rbind()
The concatenation function, (), with arrays
Frequency tables from factors
Day 2
Lists and data frames
Lists
Constructing and modifying lists
Concatenating lists
Data frames
Making data frames
attach() and detach()
Working with data frames
Attaching arbitrary lists
Managing the search path
Data manipulation
Selecting, subsetting observations and variables
Filtering, grouping
Recoding, transformations
Aggregation, combining data sets
Character manipulation, stringr package
Reading data
Txt files
CSV files
XLS, XLSX files
SPSS, SAS, Stata,… and other formats data
Exporting data to txt, csv and other formats
Accessing data from databases using SQL language
Probability distributions
R as a set of statistical tables
Examining the distribution of a set of data
One and twosample tests
Grouping, loops and conditional execution
Grouped expressions
Control statements
Conditional execution: if statements
Repetitive execution: for loops, repeat and while
Day 3
Writing your own functions
Simple examples
Defining new binary operators
Named arguments and defaults
The '...' argument
Assignments within functions
More advanced examples
Efficiency factors in block designs
Dropping all names in a printed array
Recursive numerical integration
Scope
Customizing the environment
Classes, generic functions and object orientation
Statistical analysis in R
Linear regression models
Generic functions for extracting model information
Updating fitted models
Generalized linear models
Families
The glm() function
Classification
Logistic Regression
Linear Discriminant Analysis
Unsupervised learning
Principal Components Analysis
Clustering Methods( kmeans, hierarchical clustering, kmedoids)
Survival analysis
Survival objects in r
KaplanMeier estimate
Confidence bands
Cox PH models, constant covariates
Cox PH models, timedependent covariates
Graphical procedures
Highlevel plotting commands
The plot() function
Displaying multivariate data
Display graphics
Arguments to highlevel plotting functions
Basic visualisation graphs
Multivariate relations with lattice and ggplot package
Using graphics parameters
Graphics parameters list
Automated and interactive reporting
Combining output from R with text
Creating html, pdf documents 
advr 
Advanced R 
7 hours 
Rstudio IDE
Data manipulation with dplyr, tidyr, reshape2
Object oriented programming in R
Performance profiling
Exception handling
Debugging R code
Creating R packages
Reproducible research with knitr and RMarkdown
C/C++ coding in R
Writing and compiling C/C++ code from R

statdm 
Statistical Thinking for Decision Makers 
7 hours 
This course has been created for decision makers whose primary goal is not to do the calculation and the analysis, but to understand them and be able to choose what kind of statistical methods are relevant in strategic planning of the organization.
For example, a prospect participant needs to make decision how many samples needs to be collected before they can make the decision whether the product is going to be launched or not.
If you need longer course which covers the very basics of statistical thinking have a look at 5 day "Statistics for Managers" training.
What statistics can offer to Decision Makers
Descriptive Statistics
Basic statistics  which of the statistics (e.g. median, average, percentiles etc...) are more relevant to different distributions
Graphs  significance of getting it right (e.g. how the way the graph is created reflects the decision)
Variable types  what variables are easier to deal with
Ceteris paribus, things are always in motion
Third variable problem  how to find the real influencer
Inferential Statistics
Probability value  what is the meaning of Pvalue
Repeated experiment  how to interpret repeated experiment results
Data collection  you can minimize bias, but not get rid of it
Understanding confidence level
Statistical Thinking
Decision making with limited information
how to check how much information is enough
prioritizing goals based on probability and potential return (benefit/cost ratio ration, decision trees)
How errors add up
Butterfly effect
Black swans
What is Schrödinger's cat and what is Newton's Apple in business
Cassandra Problem  how to measure a forecast if the course of action has changed
Google Flu trends  how it went wrong
How decisions make forecast outdated
Forecasting  methods and practicality
ARIMA
Why naive forecasts are usually more responsive
How far a forecast should look into the past?
Why more data can mean worse forecast?
Statistical Methods useful for Decision Makers
Describing Bivariate Data
Univariate data and bivariate data
Probability
why things differ each time we measure them?
Normal Distributions and normally distributed errors
Estimation
Independent sources of information and degrees of freedom
Logic of Hypothesis Testing
What can be proven, and why it is always the opposite what we want (Falsification)
Interpreting the results of Hypothesis Testing
Testing Means
Power
How to determine a good (and cheap) sample size
False positive and false negative and why it is always a tradeoff

tableau1 
Data analysis with Tableau 
14 hours 
Connecting to various databases
Data connection types
Working with Single Data Sources Multiple data sources & data blending
Tableau geocoding
Advanced mapping + using Background Images
Overview of additional visualizations
Dashboards: quick filters, actions, and parameters
Advanced calculations
Parameters, calculations, sorting, filtering etc.
Best practices when using Tableau R programming

dsbda 
Data Science for Big Data Analytics 
35 hours 
Introduction to Data Science for Big Data Analytics
Data Science Overview
Big Data Overview
Data Structures
Drivers and complexities of Big Data
Big Data ecosystem and a new approach to analytics
Key technologies in Big Data
Data Mining process and problems
Association Pattern Mining
Data Clustering
Outlier Detection
Data Classification
Introduction to Data Analytics lifecycle
Discovery
Data preparation
Model planning
Model building
Presentation/Communication of results
Operationalization
Exercise: Case study
From this point most of the training time (80%) will be spent on examples and exercises in R and related big data technology.
Getting started with R
Installing R and Rstudio
Features of R language
Objects in R
Data in R
Data manipulation
Big data issues
Exercises
Getting started with Hadoop
Installing Hadoop
Understanding Hadoop modes
HDFS
MapReduce architecture
Hadoop related projects overview
Writing programs in Hadoop MapReduce
Exercises
Integrating R and Hadoop with RHadoop
Components of RHadoop
Installing RHadoop and connecting with Hadoop
The architecture of RHadoop
Hadoop streaming with R
Data analytics problem solving with RHadoop
Exercises
Preprocessing and preparing data
Data preparation steps
Feature extraction
Data cleaning
Data integration and transformation
Data reduction – sampling, feature subset selection,
Dimensionality reduction
Discretization and binning
Exercises and Case study
Exploratory data analytic methods in R
Descriptive statistics
Exploratory data analysis
Visualization – preliminary steps
Visualizing single variable
Examining multiple variables
Statistical methods for evaluation
Hypothesis testing
Exercises and Case study
Data Visualizations
Basic visualizations in R
Packages for data visualization ggplot2, lattice, plotly, lattice
Formatting plots in R
Advanced graphs
Exercises
Regression (Estimating future values)
Linear regression
Use cases
Model description
Diagnostics
Problems with linear regression
Shrinkage methods, ridge regression, the lasso
Generalizations and nonlinearity
Regression splines
Local polynomial regression
Generalized additive models
Regression with RHadoop
Exercises and Case study
Classification
The classification related problems
Bayesian refresher
Naïve Bayes
Logistic regression
Knearest neighbors
Decision trees algorithm
Neural networks
Support vector machines
Diagnostics of classifiers
Comparison of classification methods
Scalable classification algorithms
Exercises and Case study
Assessing model performance and selection
Bias, Variance and model complexity
Accuracy vs Interpretability
Evaluating classifiers
Measures of model/algorithm performance
Holdout method of validation
Crossvalidation
Tuning machine learning algorithms with caret package
Visualizing model performance with Profit ROC and Lift curves
Ensemble Methods
Bagging
Random Forests
Boosting
Gradient boosting
Exercises and Case study
Support vector machines for classification and regression
Maximal Margin classifiers
Support vector classifiers
Support vector machines
SVM’s for classification problems
SVM’s for regression problems
Exercises and Case study
Identifying unknown groupings within a data set
Feature Selection for Clustering
Representative based algorithms: kmeans, kmedoids
Hierarchical algorithms: agglomerative and divisive methods
Probabilistic base algorithms: EM
Density based algorithms: DBSCAN, DENCLUE
Cluster validation
Advanced clustering concepts
Clustering with RHadoop
Exercises and Case study
Discovering connections with Link Analysis
Link analysis concepts
Metrics for analyzing networks
The Pagerank algorithm
HyperlinkInduced Topic Search
Link Prediction
Exercises and Case study
Association Pattern Mining
Frequent Pattern Mining Model
Scalability issues in frequent pattern mining
Brute Force algorithms
Apriori algorithm
The FP growth approach
Evaluation of Candidate Rules
Applications of Association Rules
Validation and Testing
Diagnostics
Association rules with R and Hadoop
Exercises and Case study
Constructing recommendation engines
Understanding recommender systems
Data mining techniques used in recommender systems
Recommender systems with recommenderlab package
Evaluating the recommender systems
Recommendations with RHadoop
Exercise: Building recommendation engine
Text analysis
Text analysis steps
Collecting raw text
Bag of words
Term Frequency –Inverse Document Frequency
Determining Sentiments
Exercises and Case study

bigdatar 
Programming with Big Data in R 
21 hours 
Introduction to Programming Big Data with R (bpdR)
Setting up your environment to use pbdR
Scope and tools available in pbdR
Packages commonly used with Big Data alongside pbdR
Message Passing Interface (MPI)
Using pbdR MPI 5
Parallel processing
Pointtopoint communication
Send Matrices
Summing Matrices
Collective communication
Summing Matrices with Reduce
Scatter / Gather
Other MPI communications
Distributed Matrices
Creating a distributed diagonal matrix
SVD of a distributed matrix
Building a distributed matrix in parallel
Statistics Applications
Monte Carlo Integration
Reading Datasets
Reading on all processes
Broadcasting from one process
Reading partitioned data
Distributed Regression
Distributed Bootstrap

rprogadv 
Advanced R Programming 
7 hours 
This course is for data scientists and statisticians that already have basic R & C++ coding skills and R code and need advanced R coding skills.
The purpose is to give a practical advanced R programming course to participants interested in applying the methods at work.
Sector specific examples are used to make the training relevant to the audience
R's environment
Object oriented programming in R
S3
S4
Reference classes
Performance profiling
Exception handling
Debugging R code
Creating R packages
Unit testing
C/C++ coding in R
SEXPRs
Calling dynamically loaded libraries from R
Writing and compiling C/C++ code from R
Improving R's performance with C++ linear algebra library

datama 
Data Mining and Analysis 
28 hours 
Objective:
Delegates be able to analyse big data sets, extract patterns, choose the right variable impacting the results so that a new model is forecasted with predictive results.
Data preprocessing
Data Cleaning
Data integration and transformation
Data reduction
Discretization and concept hierarchy generation
Statistical inference
Probability distributions, Random variables, Central limit theorem
Sampling
Confidence intervals
Statistical Inference
Hypothesis testing
Multivariate linear regression
Specification
Subset selection
Estimation
Validation
Prediction
Classification methods
Logistic regression
Linear discriminant analysis
Knearest neighbours
Naive Bayes
Comparison of Classification methods
Neural Networks
Fitting neural networks
Training neural networks issues
Decision trees
Regression trees
Classification trees
Trees Versus Linear Models
Bagging, Random Forests, Boosting
Bagging
Random Forests
Boosting
Support Vector Machines and Flexible disct
Maximal Margin classifier
Support vector classifiers
Support vector machines
2 and more classes SVM’s
Relationship to logistic regression
Principal Components Analysis
Clustering
Kmeans clustering
Kmedoids clustering
Hierarchical clustering
Density based clustering
Model Assesment and Selection
Bias, Variance and Model complexity
Insample prediction error
The Bayesian approach
Crossvalidation
Bootstrap methods

tidyverse 
Introduction to Data Visualization with Tidyverse and R 
7 hours 
The Tidyverse is a collection of versatile R packages for cleaning, processing, modeling, and visualizing data. Some of the packages included are: ggplot2, dplyr, tidyr, readr, purrr, and tibble.
In this instructorled, live training, participants will learn how to manipulate and visualize data using the tools included in the Tidyverse.
By the end of this training, participants will be able to:
Perform data analysis and create appealing visualizations
Draw useful conclusions from various datasets of sample data
Filter, sort and summarize data to answer exploratory questions
Turn processed data into informative line plots, bar plots, histograms
Import and filter data from diverse data sources, including Excel, CSV, and SPSS files
Audience
Beginners to the R language
Beginners to data analysis and data visualization
Format of the course
Part lecture, part discussion, exercises and heavy handson practice
Introduction
Tydyverse vs traditional R plotting
Setting up your working environment
Preparing the dataset
Importing and filtering data
Wrangling the data
Visualizing the data (graphs, scatter plots)
Grouping and summarizing the data
Visualizing the data (line plots, bar plots, histograms, boxplots)
Working with nonstandard data
Closing remarks 
rintro 
Introduction to R 
21 hours 
R is an opensource free programming language for statistical computing, data analysis, and graphics. R is used by a growing number of managers and data analysts inside corporations and academia. R has also found followers among statisticians, engineers and scientists without computer programming skills who find it easy to use. Its popularity is due to the increasing use of data mining for various goals such as set ad prices, find new drugs more quickly or finetune financial models. R has a wide variety of packages for data mining.
This course covers the manipulation of objects in R including reading data, accessing R packages, writing R functions, and making informative graphs. It includes analyzing data using common statistical models. The course teaches how to use the R software (http://www.rproject.org) both on a command line and in a graphical user interface (GUI).
Introduction and preliminaries
Making R more friendly, R and available GUIs
The R environment
Related software and documentation
R and statistics
Using R interactively
An introductory session
Getting help with functions and features
R commands, case sensitivity, etc.
Recall and correction of previous commands
Executing commands from or diverting output to a file
Data permanency and removing objects
Simple manipulations; numbers and vectors
Vectors and assignment
Vector arithmetic
Generating regular sequences
Logical vectors
Missing values
Character vectors
Index vectors; selecting and modifying subsets of a data set
Other types of objects
Objects, their modes and attributes
Intrinsic attributes: mode and length
Changing the length of an object
Getting and setting attributes
The class of an object
Ordered and unordered factors
A specific example
The function tapply() and ragged arrays
Ordered factors
Arrays and matrices
Arrays
Array indexing. Subsections of an array
Index matrices
The array() function
Mixed vector and array arithmetic. The recycling rule
The outer product of two arrays
Generalized transpose of an array
Matrix facilities
Matrix multiplication
Linear equations and inversion
Eigenvalues and eigenvectors
Singular value decomposition and determinants
Least squares fitting and the QR decomposition
Forming partitioned matrices, cbind() and rbind()
The concatenation function, (), with arrays
Frequency tables from factors
Lists and data frames
Lists
Constructing and modifying lists
Concatenating lists
Data frames
Making data frames
attach() and detach()
Working with data frames
Attaching arbitrary lists
Managing the search path
Reading data from files
The read.table()function
The scan() function
Accessing builtin datasets
Loading data from other R packages
Editing data
Probability distributions
R as a set of statistical tables
Examining the distribution of a set of data
One and twosample tests
Grouping, loops and conditional execution
Grouped expressions
Control statements
Conditional execution: if statements
Repetitive execution: for loops, repeat and while
Writing your own functions
Simple examples
Defining new binary operators
Named arguments and defaults
The '...' argument
Assignments within functions
More advanced examples
Efficiency factors in block designs
Dropping all names in a printed array
Recursive numerical integration
Scope
Customizing the environment
Classes, generic functions and object orientation
Statistical models in R
Defining statistical models; formulae
Contrasts
Linear models
Generic functions for extracting model information
Analysis of variance and model comparison
ANOVA tables
Updating fitted models
Generalized linear models
Families
The glm() function
Nonlinear least squares and maximum likelihood models
Least squares
Maximum likelihood
Some nonstandard models
Graphical procedures
Highlevel plotting commands
The plot() function
Displaying multivariate data
Display graphics
Arguments to highlevel plotting functions
Lowlevel plotting commands
Mathematical annotation
Hershey vector fonts
Interacting with graphics
Using graphics parameters
Permanent changes: The par() function
Temporary changes: Arguments to graphics functions
Graphics parameters list
Graphical elements
Axes and tick marks
Figure margins
Multiple figure environment
Device drivers
PostScript diagrams for typeset documents
Multiple graphics devices
Dynamic graphics
Packages
Standard packages
Contributed packages and CRAN
Namespaces

rdataana 
R for Data Analysis and Research 
7 hours 
Audience
managers
developers
scientists
students
Format of the course
online instruction and discussion OR facetoface workshops
The list below gives an idea of the topics that will be covered in the workshop.
The number of topics that will be covered depends on the duration of the workshop (i.e. one, two or three days). In a one or two day workshop it may not be possible to cover all topics, and so the workshop will be tailored to suit the specific needs of the learners.
A first R session
Syntax for analysing one dimensional data arrays
Syntax for analysing two dimensional data arrays
Reading and writing data files
Subsetting data, sorting, ranking and ordering data
Merging arrays
Set membership
The main statistical functions in R
The Normal Distribution (correlation, probabilities, tests for normality and confidence intervals)
Ordinary Least Squares Regression
Ttests, Analysis of Variance and Multivariable Analysis of Variance
Chisquare tests for categorical variables
Writing functions in R
Writing software (scripts) in R
Control structures (e.g. Loops)
Graphical methods (including scatterplots, bar charts, pie charts, histograms, box plots and dot charts)
Graphical User Interfaces for R

frcr 
Forecasting with R 
14 hours 
This course allows delegate to fully automate the process of forecasting with R
Forecasting with R
Introduction to Forecasting
Exponential Smoothing
ARIMA models
The forecast package
Package 'forecast'
accuracy
Acf
arfima
Arima
arima.errors
auto.arima
bats
BoxCox
BoxCox.lambda
croston
CV
dm.test
dshw
ets
fitted.Arima
forecast
forecast.Arima
forecast.bats
forecast.ets
forecast.HoltWinters
forecast.lm
forecast.stl
forecast.StructTS
gas
gold
logLik.ets
ma
meanf
monthdays
msts
na.interp
naive
ndiffs
nnetar
plot.bats
plot.ets
plot.forecast
rwf
seasadj
seasonaldummy
seasonplot
ses
simulate.ets
sindexf
splinef
subset.ts
taylor
tbats
thetaf
tsdisplay
tslm
wineind
woolyrnq
