Lecture 2: Machine Learning

Data Science

J Mwaura

ML Terminology

Machine Learning, Data Science, Data Mining, Data Analysis, Statistical Learning, Knowledge Discovery in Databases, Pattern Discovery

Digital Data Sources

Photos/images

Video

Tweets

Satellite data

Data Types

Texts & numbers

Graphs & tables

Clickstreams & transactions

Images & videos

Data Science Process

  1. Data collection
  2. Data preparation
  3. Exploratory data analysis
  4. Machine learning
  5. Visualization

Machine Learning

A subfield of artificial intelligence (AI)

Machine learning involves coding programs that automatically adjust their performance in accordance with their exposure to information in data

Learning is achieved via a parameterized model with tunable parameters that are automatically adjusted according to different performance criteria

Machine Learning

Machine learning sub-fields;

  1. Supervised learning - learn from a labeled training set e.g. logistic regression, support vector machines, decision trees, random forest, etc
  2. Unsupervised learning - learn from a unlabeled training set e.g. k-means clustering and kernel density estimation
  3. Reinforcement learning

Applications of ML

  1. Detecting faces in images
  2. Handwriting recognition
  3. Digit recognition on checks, plates
  4. Scene classification
  5. etc...

Statistics vs Machine Learning

    Statistics

  • Hypothesis testing
  • Experimental design
  • Regression e.g. linear, logistic
  • Principal component analysis

    Machine Learning

  • Decision trees & Rule induction
  • Neural Networks & SVMs
  • Clustering method & Association rules
  • Feature selection & Visualization
  • Graphical models & Genetic algorithm

End of Lecture 2

Data Science

That's it!

Queries about this Lesson, please send them to: jmwaura.uni@gmail.com

References

  • Introduction to Data Science Laura Igual, Santi SeguĂ­
  • The principles of geostatistical analysis Geostatistical Analyst
  • A Practical Guide to Geostatistical Mapping Tomislav Hengl
Courtesy of Open School
Data Science