IT Guy

IT、AI / Machine Learning、IoT、Project Management、プログラミング、ITIL等々

洋書 - Introduction to Machine Learning with Python

Introduction to Machine Learning with Python

Introduction to Machine Learning with Python: A Guide for Data Scientists

Introduction to Machine Learning with Python: A Guide for Data Scientists

Table of Contents

1. Introduction

1.1. Why Machine Learning?
1.2. Why Python?
1.3. scikit-learn
1.4. Essential Libraries and Tools

Jupyter Notebook, NumPy, SciPy, matplotlib, pandas, mglearn

1.5. Python 2 Versus Python 3
1.6. Versions Used in this Book
1.7. A First Application: Classifying Iris Species

Iris datasetを使ったk-NN法によるIris品種予測

2. Supervised Learning

2.1. Classification and Regression
2.2. Generalization, Overfitting, and Underfitting
2.2.1. Relation of Model Complexity to Dataset Size
2.3. Supervised Machine Learning Algorithms
2.3.1. Some Sample Datasets
2.3.2. k-Nearest Neighbors (k近傍法)
2.3.3. Linear Models (線形モデル)
2.3.4. Naive Bayes Classifiers (単純ベイズ分類器)
2.3.5. Decision Trees (決定木)
2.3.6. Ensembles of Decision Trees (アンサンブル)
2.3.7. Kernelized Support Vector Machines
2.3.8. Neural Networks (Deep Learning)
2.4. Uncertainty Estimates from Classifiers
2.4.1. The Decision Function
2.4.2. Predicting Probabilities
2.4.3. Uncertainty in Multiclass Classification

3. Unsupervised Learning and Preprocessing

3.1. Types of Unsupervised Learning
3.2. Challenges in Unsupervised Learning
3.3. Preprocessing and Scaling
3.3.1. Different Kinds of Preprocessing
3.3.2. Applying Data Transformations
3.3.3. Scaling Training and Test Data the Same Way
3.3.4. The Effect of Preprocessing on Supervised Learning
3.4. Dimensionality Reduction, Feature Extraction, and Manifold Learning
3.4.1. Principal Component Analysis (PCA) (主成分分析)
3.4.2. Non-Negative Matrix Factorization (NMF) (非負値行列因子分解)
3.4.3. Manifold Learning with t-SNE
3.5. Clustering
3.5.1. k-Means Clustering (k平均法)
3.5.2. Agglomerative Clustering (凝集型クラスタリング)
3.5.3. DBSCAN
3.5.4. Comparing and Evaluating Clustering Algorithms
3.5.5. Summary of Clustering Methods

4. Representing Data and Engineering Features

4.1. Categorical Variables
4.1.1. One-Hot-Encoding (Dummy Variables) (One-Hot表現)
4.1.2. Numbers Can Encode Categoricals
4.2. OneHotEncoder and ColumnTransformer: Categorical Variables with scikit-learn
4.3. Convenient ColumnTransformer creation with make_columntransformer
4.4. Binning, Discretization, Linear Models, and Trees
4.5. Interactions and Polynomials
4.6. Univariate Nonlinear Transformations (単変量非線形変換)
4.7. Automatic Feature Selection
4.7.1. Univariate Statistics
4.7.2. Model-Based Feature Selection
4.7.3. Iterative Feature Selection
4.8. Utilizing Expert Knowledge

5. Model Evaluation and Improvement

5.1. Cross-Validation
5.1.1. Cross-Validation in scikit-learn
5.1.2. Benefits of Cross-Validation
5.1.3. Stratified k-Fold Cross-Validation and Other Strategies
5.2. Grid Search
5.2.1. Simple Grid Search
5.2.2. The Danger of Overfitting the Parameters and the Validation Set
5.2.3. Grid Search with Cross-Validation
5.3. Evaluation Metrics and Scoring
5.3.1. Keep the End Goal in Mind
5.3.2. Metrics for Binary Classification
5.3.3. Metrics for Multiclass Classification
5.3.4. Regression Metrics
5.3.5. Using Evaluation Metrics in Model Selection

6. Algorithm Chains and Pipelines

6.1. Parameter Selection with Preprocessing
6.2. Building Pipelines
6.3. Using Pipelines in Grid Searches
6.4. The General Pipeline Interface
6.4.1. Convenient Pipeline Creation with make_pipeline
6.4.2. Accessing Step Attributes
6.4.3. Accessing Attributes in a Pipeline inside GridSearchCV
6.5. Grid-Searching Preprocessing Steps and Model Parameters
6.6. Grid-Searching Which Model To Use
6.6.1. Avoiding Redundant Computation

7. Working with Text Data


7.1. Types of Data Represented as Strings
7.2. Example Application: Sentiment Analysis of Movie Reviews
7.3. Representing Text Data as a Bag of Words
7.4. Stopwords
7.5. Rescaling the Data with tf–idf
7.6. Investigating Model Coefficients
7.7. Bag-of-Words with More Than One Word (n-Grams)
7.8. Advanced Tokenization, Stemming, and Lemmatization
7.9. Topic Modeling and Document Clustering

Latent Dirichlet Allocation

8. Wrapping Up

8.1. Approaching a Machine Learning Problem
8.1.1. Humans in the Loop
8.2. From Prototype to Production
8.3. Testing Production Systems
8.4. Building Your Own Estimator
8.5. Where to Go from Here
8.5.1. Theory
8.5.2. Other Machine Learning Frameworks and Packages
8.5.3. Ranking, Recommender Systems, and Other Kinds of Learning
8.5.4. Probabilistic Modeling, Inference, and Probabilistic Programming
8.5.5. Neural Networks
8.5.6. Scaling to Larger Datasets
8.5.7. Honing Your Skills
8.6. Conclusion