IT Guy

IT、Project Management、IoT、プログラミング、ITIL等々

洋書 - Machine Learning with Python Cookbook

Machine Learning with Python Cookbook

Machine Learning with Python Cookbook: Practical Solutions from Preprocessing to Deep Learning

Machine Learning with Python Cookbook: Practical Solutions from Preprocessing to Deep Learning

評価

約200個のわかりやすいサンプルコードが収録。真面目に勉強するには非常によい!

Table of Contents

1. Vectors, Matrices and Arrays [numpy]
  • 基本 : np.array(), scipy.sparse.csr_matrix(matrix), matrix.shape, matrix.size, matrix.ndim, np.vectorize(lamdba function), np.max(matrix), np.min(matrix), np.mean(matrix), np.var(matrix), np.std(matrix)
  • 高度な関数・メソッド : matrix.reshape(m, n), matrix.T, matrix.flatten(), np.linalg.matrix_rank(matrix), np.linalg.det(matrix), matrix.diagnal(), matrix.trace(), np.linalg.eig(matrix), np.dot(vector_a, vector_b), np.add(matrix_a, matrix_b), np.subtract(matrix_a, matrix_b), np.dot(matrix_a, matrix_b), np.linalg.inv(matrix)
  • Random : np.random.seed(n), np.random.random(n), np.random.logistic(...), np.random.uniform(...)
2. Loading Data [scikit-learn, pandas]
  • "toy" datasets : load_boston, load_iris, load_digits, data, target
  • make_regression, make_classification, make_blobs
  • pandas.read_csv(url), pandas.read_json(url), pandas.read_sql_query(...)
3. Data Wrangling [pandas]
  • 基本 : pd.read_csv(url), pd.DataFrame(), dataframe['Col1'] = [val1, val2], dataframe.shape, dataframe.describe()
  • Search : dataframe.iloc[1:4], dataframe.loc["Allen"], dataframe[(dataframe['Sex'] == 'female')]
  • Replace / Delete / Merge... : dataframe.replace(1, "One"), dataframe.drop_duplicates(), pd.concat(...), pd.merge(...)
4. Handling Numerical Data [pandas, scikit-learn]
5. Handling Categorical Data [pandas, scikit-learn]
6. Handling Text [NLTK, scikit-learn]
  • Python標準string : string.strip() for string in text_data, string.replace(".", "") for string in strip_whitespace, string.upper()
  • Regular Expression : re.sub
  • scraping HTML : BeautifulSoup
  • NLTK(Natural Language Toolkit)
7. Handling Dates and Times [pandas]
8. Handling Images [OpenCV, matplotlib]
  • OpenCV : cv2.imread("images/plane.jpg"), cv2.imwrite("images/plane_new.jpg", image), cv2.resize(image, (256, 256)), image_cropped = image[:,:128], cv2.blur(image, (5,5))
9. Dimensionality Reduction Using Feature Extraction [scikit-learn]
10. Dimensionality Reduction Using Feature Selection [scikit-learn]
11. Model Evaluation [scikit-learn]
12. Model Selection [scikit-learn]
13. Linear Regression [scikit-learn]
14. Trees and Forests [scikit-learn]
15. K-Nearest Neighbors [scikit-learn]
16. Logistic Regression [scikit-learn]
17. Support Vector Machines [scikit-learn]
18. Naive Bayes [scikit-learn]
19. Clustering [scikit-learn]
20. Neural Networks [keras]
21. Saving and Loading Trained Models [scikit-learn, keras]

サンプル環境構築 (自分の環境)

  • OS : Windows 10
  • Python : Python 3.6.7 64 bit (Anacondaは容量が多きぎるので、使っていない。標準pythonとpipのみで問題ない!)
    • 2018/12時点でtensorflowは3.7は未対応であるため、3.6.xをインストール
    • Pythonを入れなおす時、Uninstall後も下記のところを自分で綺麗に消さないと相当苦労する。
      • C:\Users\sejahn\AppData\Local\Programs\Python
      • C:\Users\sejahn\AppData\Local\pip
      • RegeditでPythonが入っているところ
    • PIP : Python 3.6.x同梱のPIP。「python -m pip install --upgrade pip」でpip upgradeした。
  • Library : 下記pip freeze結果(pip install時バージョンは指定しなくても大丈夫だった)
cycler==0.10.0
kiwisolver==1.0.1
matplotlib==3.0.2
numpy==1.15.4
pandas==0.23.4
pyparsing==2.3.0
python-dateutil==2.7.5
pytz==2018.7
scikit-learn==0.20.1
scipy==1.1.0
six==1.11.0
sklearn==0.0
  • Editor : なんでもいいだろうけど、sublime text