Testing modules and packages
Preparing a module or package for publication
Uploading your work to GitHub
Submitting to the Python Package Index
- https://pypi.python.org/pypiから！
- LICENSE.txtの例 : MIT license
Using pip to download and install modules and packages

9. Modular Programming as a Foundation for Good Programming Technique

2018-12-19

Python - Matplotlib参考情報

AI / Machine Learning Python

Link

Official

matplotlib.org
matplotlib.org - User's Guide
matplotlib.org - Gallery
- 各グラフをクリックすると、コードが見れる！

Others

Top 50 matplotlib Visualizations

参考書籍

Safari Online - Matplotlib 3.0 Cookbook

Matplotlib 3.0 Cookbook: Over 150 recipes to create highly detailed interactive visualizations using Python (English Edition)

作者: Srinivasa Rao Poladi
出版社/メーカー: Packt Publishing
発売日: 2018/10/23
メディア: Kindle版
この商品を含むブログを見る

2018-12-19

Machine Learningの有名なサンプルデータ

AI / Machine Learning Python

Iris dataset
Wisconsin Breast Cancer dataset
- Dataset概要
- Technical Info
Boston Housing dataset
- Dataset概要
- Technical Info

Iris dataset

Dataset概要

For supervised learning, classification (three-classes)
target (classes) : setosa(0), versicolor(1), virginica(2)
feature : sepal length (cm), sepal width (cm), petal length (cm), petal width (cm)

Technical Info

import : from sklearn.datasets import load_iris
dict_keys(['data', 'target', 'target_names', 'DESCR', 'feature_names', 'filename'])
iris_dataset['data'].shape : (150, 4)
iris_dataset['target'].shape : (150,)

取り上げている書籍

[Introduction to Machine Learning with Python]

使われ方例

k-nearest neighbors classification algorithm

X_train, X_test, y_train, y_test = train_test_split(
    iris_dataset['data'], iris_dataset['target'], random_state=0)

knn = KNeighborsClassifier(n_neighbors=1)
knn.fit(X_train, y_train)

print("Test set score: {:.2f}".format(knn.score(X_test, y_test)))

Wisconsin Breast Cancer dataset

Dataset概要

Cancer data

Technical Info

cancer.keys(): dict_keys(['data', 'target', 'target_names', 'DESCR', 'feature_names', 'filename'])

Boston Housing dataset

Dataset概要

Technical Info

2018-12-18

洋書 - Introduction to Machine Learning with Python

AI / Machine Learning Python

Introduction to Machine Learning with Python

Introduction to Machine Learning with Python: A Guide for Data Scientists

作者: Andreas C. Mueller,Sarah Guido
出版社/メーカー: O'Reilly Media
発売日: 2016/10/21
メディア: ペーパーバック
この商品を含むブログを見る

Safari Books Online

1. Introduction

1.1. Why Machine Learning?

1.2. Why Python?

1.3. scikit-learn

1.4. Essential Libraries and Tools

Jupyter Notebook, NumPy, SciPy, matplotlib, pandas, mglearn

1.5. Python 2 Versus Python 3

1.6. Versions Used in this Book

1.7. A First Application: Classifying Iris Species

Iris datasetを使ったk-NN法によるIris品種予測

2. Supervised Learning

2.1. Classification and Regression

2.2. Generalization, Overfitting, and Underfitting

2.2.1. Relation of Model Complexity to Dataset Size

2.3. Supervised Machine Learning Algorithms

2.3.1. Some Sample Datasets

2.3.2. k-Nearest Neighbors (k近傍法)

2.3.3. Linear Models (線形モデル)

2.3.4. Naive Bayes Classifiers (単純ベイズ分類器)

2.3.5. Decision Trees (決定木)

2.3.6. Ensembles of Decision Trees (アンサンブル)

2.3.7. Kernelized Support Vector Machines

2.3.8. Neural Networks (Deep Learning)

2.4. Uncertainty Estimates from Classifiers

2.4.1. The Decision Function

2.4.2. Predicting Probabilities

2.4.3. Uncertainty in Multiclass Classification

3. Unsupervised Learning and Preprocessing

3.1. Types of Unsupervised Learning

3.2. Challenges in Unsupervised Learning

3.3. Preprocessing and Scaling

3.3.1. Different Kinds of Preprocessing

3.3.2. Applying Data Transformations

3.3.3. Scaling Training and Test Data the Same Way

3.3.4. The Effect of Preprocessing on Supervised Learning

3.4. Dimensionality Reduction, Feature Extraction, and Manifold Learning

3.4.1. Principal Component Analysis (PCA) (主成分分析)

3.4.2. Non-Negative Matrix Factorization (NMF) (非負値行列因子分解)

3.4.3. Manifold Learning with t-SNE

3.5. Clustering

3.5.1. k-Means Clustering (k平均法)

3.5.2. Agglomerative Clustering (凝集型クラスタリング)

3.5.3. DBSCAN

3.5.4. Comparing and Evaluating Clustering Algorithms

3.5.5. Summary of Clustering Methods

4. Representing Data and Engineering Features

4.1. Categorical Variables

4.1.1. One-Hot-Encoding (Dummy Variables) (One-Hot表現)

4.1.2. Numbers Can Encode Categoricals

4.2. OneHotEncoder and ColumnTransformer: Categorical Variables with scikit-learn

4.3. Convenient ColumnTransformer creation with make_columntransformer

4.4. Binning, Discretization, Linear Models, and Trees

4.5. Interactions and Polynomials

4.6. Univariate Nonlinear Transformations (単変量非線形変換)

4.7. Automatic Feature Selection

4.7.1. Univariate Statistics

4.7.2. Model-Based Feature Selection

4.7.3. Iterative Feature Selection

4.8. Utilizing Expert Knowledge

5. Model Evaluation and Improvement

5.1. Cross-Validation

5.1.1. Cross-Validation in scikit-learn

5.1.2. Benefits of Cross-Validation

5.1.3. Stratified k-Fold Cross-Validation and Other Strategies

5.2. Grid Search

5.2.1. Simple Grid Search

5.2.2. The Danger of Overfitting the Parameters and the Validation Set

5.2.3. Grid Search with Cross-Validation

5.3. Evaluation Metrics and Scoring

5.3.1. Keep the End Goal in Mind

5.3.2. Metrics for Binary Classification

5.3.3. Metrics for Multiclass Classification

5.3.4. Regression Metrics

5.3.5. Using Evaluation Metrics in Model Selection

6. Algorithm Chains and Pipelines

6.1. Parameter Selection with Preprocessing

6.2. Building Pipelines

6.3. Using Pipelines in Grid Searches

6.4. The General Pipeline Interface

6.4.1. Convenient Pipeline Creation with make_pipeline

6.4.2. Accessing Step Attributes

6.4.3. Accessing Attributes in a Pipeline inside GridSearchCV

6.5. Grid-Searching Preprocessing Steps and Model Parameters

6.6. Grid-Searching Which Model To Use

6.6.1. Avoiding Redundant Computation

7. Working with Text Data

この章、相当難易度が高い・・・

7.1. Types of Data Represented as Strings

7.2. Example Application: Sentiment Analysis of Movie Reviews

7.3. Representing Text Data as a Bag of Words

7.4. Stopwords

7.5. Rescaling the Data with tf–idf

7.6. Investigating Model Coefficients

7.7. Bag-of-Words with More Than One Word (n-Grams)

7.8. Advanced Tokenization, Stemming, and Lemmatization

7.9. Topic Modeling and Document Clustering

Latent Dirichlet Allocation

8. Wrapping Up

8.1. Approaching a Machine Learning Problem

8.1.1. Humans in the Loop

8.2. From Prototype to Production

8.3. Testing Production Systems

8.4. Building Your Own Estimator

8.5. Where to Go from Here

8.5.1. Theory

8.5.2. Other Machine Learning Frameworks and Packages

8.5.3. Ranking, Recommender Systems, and Other Kinds of Learning

8.5.4. Probabilistic Modeling, Inference, and Probabilistic Programming

8.5.5. Neural Networks

8.5.6. Scaling to Larger Datasets

8.5.7. Honing Your Skills

8.6. Conclusion

2018-12-18

洋書 - Hands-On Data Analysis with NumPy and pandas

AI / Machine Learning Python

Hands-On Data Analysis with NumPy and pandas

Hands-On Data Analysis with NumPy and pandas: Implement Python packages from data manipulation to processing (English Edition)

作者: Curtis Miller
出版社/メーカー: Packt Publishing
発売日: 2018/06/29
メディア: Kindle版
この商品を含むブログを見る

Safari Books Online

感想

紙の本だと168ページぐらい。何よりわかりやすい（内容も深くない）。
いきなりMachine Learningのアルゴリズムに挑戦して撃沈（！）した場合、基本に戻ってNumPy、pandasの基礎を学ぶのにいい。

1. SETTING UP A PYTHON DATA ANALYSIS ENVIRONMENT

2. DIVING INTO NUMPY

NumPy arrays
Special numeric values
Creating NumPy arrays
Creating ndarray

3. OPERATIONS ON NUMPY ARRAYS

Operations on NumPy Arrays
Selecting elements explicitly
Slicing arrays with colons
Advanced indexing
Expanding arrays
Arithmetic and linear algebra with arrays
Arithmetic with two equal-shaped arrays
Broadcasting
Linear algebra
Employing array methods and functions
Array methods
Vectorization with ufuncs
Custom ufuncs

4. PANDAS ARE FUN! WHAT IS PANDAS?

What does pandas do?
Exploring series and DataFrame objects
Creating series
Creating DataFrames
Adding data
Saving DataFrames
Subsetting your data
Subsetting a series
Indexing methods
Slicing a DataFrame

5. ARITHMETIC, FUNCTION APPLICATION, AND MAPPING WITH PANDAS

Arithmetic, Function Application, and Mapping with pandas
Arithmetic
Arithmetic with DataFrames
Vectorization with DataFrames
DataFrame function application
Handling missing data in a pandas DataFrame
Deleting missing information
Filling missing information

6. MANAGING, INDEXING, AND PLOTTING

Managing, Indexing, and Plotting
Index sorting
Sorting by values
Hierarchical indexing
Slicing a series with a hierarchical index
Plotting with pandas
Plotting methods

TIOBE Indexとは

Link

主要Blu-Ray Studios (2019/01時点情報)

20th Century Fox

Criterion

Disney / Buena Vista

DreamWorks

Lionsgate Films

Metro-Goldwyn-Mayer

New Line Cinema

Paramount Pictures

Sony Pictures

Studio Canal

Universal Studios

Warner Bros.

Link

Modular Programming with Python

Reference

感想

Table of Contents

1. Introducing Modular Programming

2. Writing Your First Modular Program

3. Using Modules and Packages

4. Using Modules for Real-World Programming

5. Working with Module Patterns

6. Creating Reusable Modules

7. Advanced Module Techniques

8. Testing and Deploying Modules

8. Testing and Deploying Modules

9. Modular Programming as a Foundation for Good Programming Technique

Link

Official

Others

参考書籍

Iris dataset

Dataset概要

Technical Info

取り上げている書籍

使われ方例

k-nearest neighbors classification algorithm

Wisconsin Breast Cancer dataset

Dataset概要

Technical Info

Boston Housing dataset

Dataset概要

Technical Info

Introduction to Machine Learning with Python

Table of Contents

1. Introduction

1.1. Why Machine Learning?

1.2. Why Python?

1.3. scikit-learn

1.4. Essential Libraries and Tools

1.5. Python 2 Versus Python 3

1.6. Versions Used in this Book

1.7. A First Application: Classifying Iris Species

2. Supervised Learning

2.1. Classification and Regression

2.2. Generalization, Overfitting, and Underfitting

2.2.1. Relation of Model Complexity to Dataset Size

2.3. Supervised Machine Learning Algorithms

2.3.1. Some Sample Datasets

2.3.2. k-Nearest Neighbors (k近傍法)

2.3.3. Linear Models (線形モデル)

2.3.4. Naive Bayes Classifiers (単純ベイズ分類器)

2.3.5. Decision Trees (決定木)

2.3.6. Ensembles of Decision Trees (アンサンブル)

2.3.7. Kernelized Support Vector Machines

2.3.8. Neural Networks (Deep Learning)

2.4. Uncertainty Estimates from Classifiers

2.4.1. The Decision Function

2.4.2. Predicting Probabilities

2.4.3. Uncertainty in Multiclass Classification

3. Unsupervised Learning and Preprocessing

3.1. Types of Unsupervised Learning

3.2. Challenges in Unsupervised Learning

3.3. Preprocessing and Scaling

3.3.1. Different Kinds of Preprocessing

3.3.2. Applying Data Transformations

3.3.3. Scaling Training and Test Data the Same Way