Introduction

Python ecosystem for machine learning

  1. Python and its rising use for machine learning.
  2. SciPy and the functionality it provides with NumPy, Matplotlib and Pandas.
  3. scikit-learn that provides all of the machine learning algorithms.

Python

Python is a general purpose interpreted programming language. It is easy to learn and use primarily because the language focuses on readability.

It is a popular language in general, consistently appearing in the top 10 programming languages in surveys on StackOverflow 1 . It’s a dynamic language and very suited to interactive development and quick prototyping with the power to support the development of large applications. It is also widely used for machine learning and data science because of the excellent library support and because it is a general purpose programming language (unlike R or Matlab).

SciPy

SciPy is an ecosystem of Python libraries for mathematics, science and engineering. It is an add-on to Python that you will need for machine learning. The SciPy ecosystem is comprised of the following core modules relevant to machine learning: ˆ NumPy: A foundation for SciPy that allows you to efficiently work with data in arrays. ˆ Matplotlib: Allows you to create 2D charts and plots from data. ˆ Pandas: Tools and data structures to organize and analyze your data.

To be effective at machine learning in Python you must install and become familiar with SciPy. Specifically: ˆ You will prepare your data as NumPy arrays for modeling in machine learning algorithms. ˆ You will use Matplotlib (and wrappers of Matplotlib in other frameworks) to create plots and charts of your data. ˆ You will use Pandas to load, explore, and better understand your data.

scikit-learn

The scikit-learn library is how you can develop and practice machine learning in Python. It is built upon and requires the SciPy ecosystem. The name scikit suggests that it is a SciPy plug-in or toolkit. The focus of the library is machine learning algorithms for classification, regression, clustering and more. It also provides tools for related tasks such as evaluating models, tuning parameters and pre-processing data.

Analyze Data

ch 3, 4, 5

Prepare Data

ch 6, 7

Evaluate Algorithms

ch 8, 9, 10, 11, 12, 13

Improve Results

ch 14, 15

Present Results

ch 16, 17