IT Guy

IT、AI / Machine Learning、IoT、Project Management、プログラミング、ITIL等々

Machine Learningの有名なサンプルデータ

Iris dataset

Dataset概要

  • For supervised learning, classification (three-classes)
  • target (classes) : setosa(0), versicolor(1), virginica(2)
  • feature : sepal length (cm), sepal width (cm), petal length (cm), petal width (cm)

Technical Info

  • import : from sklearn.datasets import load_iris
  • dict_keys(['data', 'target', 'target_names', 'DESCR', 'feature_names', 'filename'])
  • iris_dataset['data'].shape : (150, 4)
  • iris_dataset['target'].shape : (150,)

取り上げている書籍

  • [Introduction to Machine Learning with Python]

使われ方例

k-nearest neighbors classification algorithm
X_train, X_test, y_train, y_test = train_test_split(
    iris_dataset['data'], iris_dataset['target'], random_state=0)

knn = KNeighborsClassifier(n_neighbors=1)
knn.fit(X_train, y_train)

print("Test set score: {:.2f}".format(knn.score(X_test, y_test)))

Wisconsin Breast Cancer dataset

Dataset概要

  • Cancer data

Technical Info

  • cancer.keys(): dict_keys(['data', 'target', 'target_names', 'DESCR', 'feature_names', 'filename'])

Boston Housing dataset

Dataset概要

Technical Info