IT Guy

IT、AI / Machine Learning、IoT、Project Management、プログラミング、ITIL等々

TIOBE Index

TIOBE Indexとは

  • プログラミング言語の人気度に関するランキングサイト・その指標
  • 毎月更新されるので、今の旬のプログラミング言語が何かがわかる
  • 毎年programming language of the yearも発表。2018受賞はPython。

Link

www.tiobe.com

洋書 - Modular Programming with Python

Modular Programming with Python

Modular Programming with Python

Modular Programming with Python

Reference

pypi.org

感想

実際pipからダウンロードできるまでの過程がわかりやすく書いてある。
特に8章はお勧め。

Table of Contents

1. Introducing Modular Programming

2. Writing Your First Modular Program

3. Using Modules and Packages

4. Using Modules for Real-World Programming

5. Working with Module Patterns

6. Creating Reusable Modules

7. Advanced Module Techniques

8. Testing and Deploying Modules

8. Testing and Deploying Modules

  • Testing modules and packages
  • Preparing a module or package for publication
  • Uploading your work to GitHub
  • Submitting to the Python Package Index
  • Using pip to download and install modules and packages

9. Modular Programming as a Foundation for Good Programming Technique

Python - Matplotlib参考情報

Link

Official

Others

参考書籍

Matplotlib 3.0 Cookbook: Over 150 recipes to create highly detailed interactive visualizations using Python (English Edition)

Matplotlib 3.0 Cookbook: Over 150 recipes to create highly detailed interactive visualizations using Python (English Edition)

Machine Learningの有名なサンプルデータ

Iris dataset

Dataset概要

  • For supervised learning, classification (three-classes)
  • target (classes) : setosa(0), versicolor(1), virginica(2)
  • feature : sepal length (cm), sepal width (cm), petal length (cm), petal width (cm)

Technical Info

  • import : from sklearn.datasets import load_iris
  • dict_keys(['data', 'target', 'target_names', 'DESCR', 'feature_names', 'filename'])
  • iris_dataset['data'].shape : (150, 4)
  • iris_dataset['target'].shape : (150,)

取り上げている書籍

  • [Introduction to Machine Learning with Python]

使われ方例

k-nearest neighbors classification algorithm
X_train, X_test, y_train, y_test = train_test_split(
    iris_dataset['data'], iris_dataset['target'], random_state=0)

knn = KNeighborsClassifier(n_neighbors=1)
knn.fit(X_train, y_train)

print("Test set score: {:.2f}".format(knn.score(X_test, y_test)))

Wisconsin Breast Cancer dataset

Dataset概要

  • Cancer data

Technical Info

  • cancer.keys(): dict_keys(['data', 'target', 'target_names', 'DESCR', 'feature_names', 'filename'])

Boston Housing dataset

Dataset概要

Technical Info

洋書 - Introduction to Machine Learning with Python

Introduction to Machine Learning with Python

Introduction to Machine Learning with Python: A Guide for Data Scientists

Introduction to Machine Learning with Python: A Guide for Data Scientists

Table of Contents

1. Introduction

1.1. Why Machine Learning?
1.2. Why Python?
1.3. scikit-learn
1.4. Essential Libraries and Tools

Jupyter Notebook, NumPy, SciPy, matplotlib, pandas, mglearn

1.5. Python 2 Versus Python 3
1.6. Versions Used in this Book
1.7. A First Application: Classifying Iris Species

Iris datasetを使ったk-NN法によるIris品種予測

2. Supervised Learning

2.1. Classification and Regression
2.2. Generalization, Overfitting, and Underfitting
2.2.1. Relation of Model Complexity to Dataset Size
2.3. Supervised Machine Learning Algorithms
2.3.1. Some Sample Datasets
2.3.2. k-Nearest Neighbors (k近傍法)
2.3.3. Linear Models (線形モデル)
2.3.4. Naive Bayes Classifiers (単純ベイズ分類器)
2.3.5. Decision Trees (決定木)
2.3.6. Ensembles of Decision Trees (アンサンブル)
2.3.7. Kernelized Support Vector Machines
2.3.8. Neural Networks (Deep Learning)
2.4. Uncertainty Estimates from Classifiers
2.4.1. The Decision Function
2.4.2. Predicting Probabilities
2.4.3. Uncertainty in Multiclass Classification

3. Unsupervised Learning and Preprocessing

3.1. Types of Unsupervised Learning
3.2. Challenges in Unsupervised Learning
3.3. Preprocessing and Scaling
3.3.1. Different Kinds of Preprocessing
3.3.2. Applying Data Transformations
3.3.3. Scaling Training and Test Data the Same Way
3.3.4. The Effect of Preprocessing on Supervised Learning
3.4. Dimensionality Reduction, Feature Extraction, and Manifold Learning
3.4.1. Principal Component Analysis (PCA) (主成分分析)
3.4.2. Non-Negative Matrix Factorization (NMF) (非負値行列因子分解)
3.4.3. Manifold Learning with t-SNE
3.5. Clustering
3.5.1. k-Means Clustering (k平均法)
3.5.2. Agglomerative Clustering (凝集型クラスタリング)
3.5.3. DBSCAN
3.5.4. Comparing and Evaluating Clustering Algorithms
3.5.5. Summary of Clustering Methods

4. Representing Data and Engineering Features

4.1. Categorical Variables
4.1.1. One-Hot-Encoding (Dummy Variables) (One-Hot表現)
4.1.2. Numbers Can Encode Categoricals
4.2. OneHotEncoder and ColumnTransformer: Categorical Variables with scikit-learn
4.3. Convenient ColumnTransformer creation with make_columntransformer
4.4. Binning, Discretization, Linear Models, and Trees
4.5. Interactions and Polynomials
4.6. Univariate Nonlinear Transformations (単変量非線形変換)
4.7. Automatic Feature Selection
4.7.1. Univariate Statistics
4.7.2. Model-Based Feature Selection
4.7.3. Iterative Feature Selection
4.8. Utilizing Expert Knowledge

5. Model Evaluation and Improvement

5.1. Cross-Validation
5.1.1. Cross-Validation in scikit-learn
5.1.2. Benefits of Cross-Validation
5.1.3. Stratified k-Fold Cross-Validation and Other Strategies
5.2. Grid Search
5.2.1. Simple Grid Search
5.2.2. The Danger of Overfitting the Parameters and the Validation Set
5.2.3. Grid Search with Cross-Validation
5.3. Evaluation Metrics and Scoring
5.3.1. Keep the End Goal in Mind
5.3.2. Metrics for Binary Classification
5.3.3. Metrics for Multiclass Classification
5.3.4. Regression Metrics
5.3.5. Using Evaluation Metrics in Model Selection

6. Algorithm Chains and Pipelines

6.1. Parameter Selection with Preprocessing
6.2. Building Pipelines
6.3. Using Pipelines in Grid Searches
6.4. The General Pipeline Interface
6.4.1. Convenient Pipeline Creation with make_pipeline
6.4.2. Accessing Step Attributes
6.4.3. Accessing Attributes in a Pipeline inside GridSearchCV
6.5. Grid-Searching Preprocessing Steps and Model Parameters
6.6. Grid-Searching Which Model To Use
6.6.1. Avoiding Redundant Computation

7. Working with Text Data

この章、相当難易度が高い・・・

7.1. Types of Data Represented as Strings
7.2. Example Application: Sentiment Analysis of Movie Reviews
7.3. Representing Text Data as a Bag of Words
7.4. Stopwords
7.5. Rescaling the Data with tf–idf
7.6. Investigating Model Coefficients
7.7. Bag-of-Words with More Than One Word (n-Grams)
7.8. Advanced Tokenization, Stemming, and Lemmatization
7.9. Topic Modeling and Document Clustering

Latent Dirichlet Allocation

8. Wrapping Up

8.1. Approaching a Machine Learning Problem
8.1.1. Humans in the Loop
8.2. From Prototype to Production
8.3. Testing Production Systems
8.4. Building Your Own Estimator
8.5. Where to Go from Here
8.5.1. Theory
8.5.2. Other Machine Learning Frameworks and Packages
8.5.3. Ranking, Recommender Systems, and Other Kinds of Learning
8.5.4. Probabilistic Modeling, Inference, and Probabilistic Programming
8.5.5. Neural Networks
8.5.6. Scaling to Larger Datasets
8.5.7. Honing Your Skills
8.6. Conclusion

洋書 - Hands-On Data Analysis with NumPy and pandas

Hands-On Data Analysis with NumPy and pandas

感想

紙の本だと168ページぐらい。 何よりわかりやすい(内容も深くない)。
いきなりMachine Learningのアルゴリズムに挑戦して撃沈(!)した場合、基本に戻ってNumPy、pandasの基礎を学ぶのにいい。

Table of Contents

1. SETTING UP A PYTHON DATA ANALYSIS ENVIRONMENT

2. DIVING INTO NUMPY

  • NumPy arrays
  • Special numeric values
  • Creating NumPy arrays
  • Creating ndarray

3. OPERATIONS ON NUMPY ARRAYS

  • Operations on NumPy Arrays
  • Selecting elements explicitly
  • Slicing arrays with colons
  • Advanced indexing
  • Expanding arrays
  • Arithmetic and linear algebra with arrays
  • Arithmetic with two equal-shaped arrays
  • Broadcasting
  • Linear algebra
  • Employing array methods and functions
  • Array methods
  • Vectorization with ufuncs
  • Custom ufuncs

4. PANDAS ARE FUN! WHAT IS PANDAS?

  • What does pandas do?
  • Exploring series and DataFrame objects
  • Creating series
  • Creating DataFrames
  • Adding data
  • Saving DataFrames
  • Subsetting your data
  • Subsetting a series
  • Indexing methods
  • Slicing a DataFrame

5. ARITHMETIC, FUNCTION APPLICATION, AND MAPPING WITH PANDAS

  • Arithmetic, Function Application, and Mapping with pandas
  • Arithmetic
  • Arithmetic with DataFrames
  • Vectorization with DataFrames
  • DataFrame function application
  • Handling missing data in a pandas DataFrame
  • Deleting missing information
  • Filling missing information

6. MANAGING, INDEXING, AND PLOTTING

  • Managing, Indexing, and Plotting
  • Index sorting
  • Sorting by values
  • Hierarchical indexing
  • Slicing a series with a hierarchical index
  • Plotting with pandas
  • Plotting methods