2017-11-20

【C#】 JPEGの品質を変更する

C# OpenCvSharp

OpenCVSharpのImWriteを使うことで画像を保存できますが、品質レベル（デフォルト値95）を変えたい時どうすれば良いのか調べました。
下記のようにエンコード⇒デコード⇒ImWriteという手順で良いようです。

// 画像の読み込み
Mat image = Cv2.ImRead("lenna.png");
// 品質50でエンコードする
var buffer = new byte[image.Rows * image.Cols * image.Channels()];
var param = new ImageEncodingParam(ImwriteFlags.JpegQuality, 50);
Cv2.ImEncode(".jpg", image, out buffer, param);
// デコードして保存する
image = Cv2.ImDecode(buffer, ImreadModes.Color);
Cv2.ImWrite("compressed_lenna.jpg", image);

2017-11-17

【異常検知】 GMM（Gaussian Mixture Model）による外れ値検知

Python 機械学習異常検知

GMMによる外れ値検出手法を試してみます。LOFやiForestのようにずばりそのものを見つけることが出来なかったので、scikit-learnにあるGaussianMixtureクラスを流用して作成します。まずは、GMMを用いて外れ値検出を行うクラスをGMMAnomalyDetectorクラスとして、gmmanomalydetector.pyに作ります。

import numpy as np
from sklearn.mixture import GaussianMixture

class GMMAnomalyDetector:
    def __init__(self, max_n_component, covariance_type='full'):
        self._max_n_component = max_n_component
        self._covariance_type = covariance_type
        self._best_gmm = None
        self.best_n_component = -1

    def fit(self, X):
        # BIC基準でベストなクラスタ数を2～max_n_componentの範囲で探す
        lowest_bic = np.inf
        for n_component in range(2, self._max_n_component + 1):
            gmm = GaussianMixture(n_components=n_component, covariance_type=self._covariance_type)
            gmm.fit(X)
            bic = gmm.bic(X)
            if bic < lowest_bic:
                lowest_bic = bic
                self._best_gmm = gmm
                self.best_n_component = n_component

    def predict(self, X, contamination=0.1):
        # スコア下位N%を異常と見なす
        scores = np.exp(self._best_gmm.score_samples(X))
        ordered_scores = np.argsort(scores)
        anomaly_indices = ordered_scores[:int(len(scores)*contamination + 0.5)]
        # scikit-learnに倣って正常を1、異常を-1として返す
        prediction = np.ones((len(scores)), dtype=np.int)
        prediction[anomaly_indices] = -1
        return prediction

次にGMMAnomalyDetectorクラスを利用して外れ値検出を行う例をmain.pyとして以下に書きます。

import numpy as np
import matplotlib.pyplot as plt
from gmmanomalydetector import GMMAnomalyDetector

np.random.seed(42)

# Generate train data
X = 0.3 * np.random.randn(100, 2)
# Generate some abnormal novel observations
X_outliers = np.random.uniform(low=-4, high=4, size=(20, 2))
X = np.r_[X + 2, X - 2, X_outliers]

# fit the model
clf = GMMAnomalyDetector(max_n_component=10)
clf.fit(X)
# 全体の9%が異常データ
contamination = 0.09
y_pred = clf.predict(X, contamination)
# 正常を1、異常を-1と出力します
ANOMALY_DATA = -1
predicted_outlier_index = np.where(y_pred == ANOMALY_DATA)
predicted_outlier = X[predicted_outlier_index]

plt.title("Gaussian Mixture Model (k=" + str(clf.best_n_component) + ')')

a = plt.scatter(X[:200, 0], X[:200, 1], c='yellow',
                edgecolor='k', s=30, marker='o')
b = plt.scatter(X[200:, 0], X[200:, 1], c='red',
                edgecolor='k', s=30, marker='o')
c = plt.scatter(predicted_outlier[:, 0], predicted_outlier[:, 1], c='blue',
                edgecolor='k', s=10, marker='x')
plt.axis('tight')
plt.xlim((-5, 5))
plt.ylim((-5, 5))
plt.legend([a, b, c],
           ["normal observations",
            "abnormal observations",
            "observations predicted as abnormal"],
           loc="upper left", prop={'size': 12})
plt.show()

best_n_component=5となっているため、5つのクラスタに分けるのが最も当てはまりが良いと判定されたようです。 f:id:ni4muraano:20171117224444p:plain

2017-11-16

【異常検知】 One class SVMによる外れ値検知

Python 機械学習異常検知

外れ値検出手法の一つであるOne class SVMを試したのでメモします。

import numpy as np
import matplotlib.pyplot as plt
from sklearn import svm

np.random.seed(42)

# Generate train data
X = 0.3 * np.random.randn(100, 2)

# fit the model
clf = svm.OneClassSVM(nu=0.1, kernel='rbf', gamma='auto')
clf.fit(X)

# Generate some abnormal novel observations
ANOMALY_DATA_COUNT = 20
X_outliers = np.random.uniform(low=-4, high=4, size=(ANOMALY_DATA_COUNT, 2))
X = np.r_[X + 2, X - 2, X_outliers]

y_pred = clf.predict(X)
# 正常を1、異常を-1と判定するようです
ANOMALY_DATA = -1
predicted_outlier_index = np.where(y_pred == ANOMALY_DATA)
predicted_outlier = X[predicted_outlier_index]

# plot the level sets of the decision function
xx, yy = np.meshgrid(np.linspace(-5, 5, 50), np.linspace(-5, 5, 50))
Z = clf.decision_function(np.c_[xx.ravel(), yy.ravel()])
Z = Z.reshape(xx.shape)

plt.title("One Class SVM")
plt.contourf(xx, yy, Z, cmap=plt.cm.Blues_r)

a = plt.scatter(X[:200, 0], X[:200, 1], c='yellow',
                edgecolor='k', s=30, marker='o')
b = plt.scatter(X[200:, 0], X[200:, 1], c='red',
                edgecolor='k', s=30, marker='o')
c = plt.scatter(predicted_outlier[:, 0], predicted_outlier[:, 1], c='blue',
                edgecolor='k', s=10, marker='x')
plt.axis('tight')
plt.xlim((-5, 5))
plt.ylim((-5, 5))
plt.legend([a, b, c],
           ["normal observations",
            "abnormal observations",
            "observations predicted as abnormal"],
           loc="upper left", prop={'size': 12})
plt.show()

ハイパーパラメータにnuというやつがいるのですが、これは $0 \leq nu \leq 1$ の範囲を取り、異常データ割合の上限を教えることができるようです。

qiita.com

今回は220個のデータの内20個を異常データとしているので、nu=0.1としました。
f:id:ni4muraano:20171113225032p:plain 試しにnu=0.5とすると、確かに異常と見なされるデータ数が増えています。
f:id:ni4muraano:20171113225045p:plain

2017-11-14

【異常検知】 Fast ABOD（Angle Based Outlier Detection）による外れ値検知

Python 機械学習異常検知

外れ値検出手法の一つであるABODの論文を読んだので試してみようと思ったのですが、scikit-learnにそれっぽい手法を見つけることが出来なかったので、Fast ABODを作成しました。ABODは $O(n^{3})$ に対して、Fast ABODは $O(n^{2}+nk^{2})$ なので、計算量の面で優位となります（ただし近似精度はkに依存）。
まず、Fast ABODをfastabod.pyとして以下のように実装します。

import numpy as np
import itertools
from sklearn.neighbors import NearestNeighbors

class FastABOD:
    def __init__(self, n_neighbors):
        self.n_neighbors = n_neighbors

    def fit_predict(self, X, contamination=0.1):
        # 各点のk最近傍を求める
        k_nearest = NearestNeighbors(n_neighbors=self.n_neighbors).fit(X)
        distances, indices = k_nearest.kneighbors(X)
        # k最近傍との角度を求めるための組み合わせ計算
        numbers = [i + 1 for i in range(distances.shape[1] - 1)]
        combs = list(itertools.combinations(numbers, 2))
        # ABOFを求める
        abofs = []
        for i in range(len(X)):
            x = X[indices[i]]
            abof = self._compute_abof(x, combs)
            abofs.append(abof)
        # ABOFスコア下位N%を異常と見なす
        ordered_abofs = np.argsort(abofs)
        anomaly_indices = ordered_abofs[:int(len(abofs)*contamination + 0.5)]
        # scikit-learnに倣って正常を1、異常を-1として返す
        prediction = np.ones((len(abofs)), dtype=np.int)
        prediction[anomaly_indices] = -1
        return prediction

    def _compute_abof(self, x, combs):
        numerator1 = 0
        numerator2 = 0
        denominator1 = 0
        for comb in combs:
            AB = x[comb[0]] - x[0]
            AC = x[comb[1]] - x[0]
            AB_norm = np.linalg.norm(AB)
            AC_norm = np.linalg.norm(AC)
            a = 1 / (AB_norm * AC_norm)
            b = np.dot(AB, AC) / ((AB_norm ** 2) * (AC_norm ** 2))
            numerator1 += a * (b ** 2)
            denominator1 += a
            numerator2 += a * b
        denominator2 = denominator1
        return numerator1 / denominator1 - (numerator2 / denominator2) ** 2

次にFastABODクラスを利用して外れ値検出を行う例をmain.pyとして以下に書きます（ただし、ABOD自体は高次元データにも精度良く適用できる手法というのが特徴なので、以下のように二次元データに適用するのは例としては良くない気がしますがご容赦下さい）。

import numpy as np
import matplotlib.pyplot as plt
from fastabod import FastABOD

np.random.seed(42)

# Generate train data
X = 0.3 * np.random.randn(100, 2)
# Generate some abnormal novel observations
X_outliers = np.random.uniform(low=-4, high=4, size=(20, 2))
X = np.r_[X + 2, X - 2, X_outliers]

# fit the model
clf = FastABOD(n_neighbors=10)
# 全体の9%が異常データ
contamination = 0.09
y_pred = clf.fit_predict(X, contamination)
# 正常を1、異常を-1と出力します
ANOMALY_DATA = -1
predicted_outlier_index = np.where(y_pred == ANOMALY_DATA)
predicted_outlier = X[predicted_outlier_index]

# plot the level sets of the decision function
xx, yy = np.meshgrid(np.linspace(-5, 5, 50), np.linspace(-5, 5, 50))
#Z = clf._decision_function(np.c_[xx.ravel(), yy.ravel()])
#Z = Z.reshape(xx.shape)

plt.title("Fast Angle Based Outlier Detection (FastABOD)")
#plt.contourf(xx, yy, Z, cmap=plt.cm.Blues_r)

a = plt.scatter(X[:200, 0], X[:200, 1], c='yellow',
                edgecolor='k', s=30, marker='o')
b = plt.scatter(X[200:, 0], X[200:, 1], c='red',
                edgecolor='k', s=30, marker='o')
c = plt.scatter(predicted_outlier[:, 0], predicted_outlier[:, 1], c='blue',
                edgecolor='k', s=10, marker='x')
plt.axis('tight')
plt.xlim((-5, 5))
plt.ylim((-5, 5))
plt.legend([a, b, c],
           ["normal observations",
            "abnormal observations",
            "observations predicted as abnormal"],
           loc="upper left", prop={'size': 12})
plt.show()

f:id:ni4muraano:20171112102138p:plain

2017-11-13

【アイトラッキング】解析のために保存しておきたいデータ

アイトラッキング

これを読んでいてTobii Studio 2.0が以下のパラメータを出力していることが分かりました。KeyPressed等のイベントも記録しているようです。

Timestamp	Number	FixationIndex	GazePointX	GazePointY	Event	StimuliName	AoiNames
267	16	4	674	374		xxx.png	Content
284	17	4	678	373		xxx.png	Content
301	18	4	656	369		xxx.png	Content
317	19	4	661	369		xxx.png	Content
334	20	4	655	365		xxx.png	Content
351	21	4	654	370		xxx.png	Content
367	22	0	0	0		xxx.png	Content
384	23	5	677	380	KeyPressed	xxx.png	Content

2017-11-12

【異常検知】 Isolation Forestによる外れ値検知

Python 機械学習異常検知

外れ値検出手法の一つであるIsolation Forestに関する以下の資料を読んで試してみたいと思っていたところ、scikit-learnに例題があったのでメモします。

外れ値検出のアルゴリズム Isolation Forest from 翔吾大澤

www.slideshare.net

import numpy as np
import matplotlib.pyplot as plt
from sklearn.ensemble import IsolationForest

np.random.seed(42)

# Generate train data
X = 0.3 * np.random.randn(100, 2)
# Generate some abnormal novel observations
ANOMALY_DATA_COUNT = 20
X_outliers = np.random.uniform(low=-4, high=4, size=(ANOMALY_DATA_COUNT, 2))
X = np.r_[X + 2, X - 2, X_outliers]

# fit the model
clf = IsolationForest(n_estimators=100, max_samples=100)
clf.fit(X)
y_pred = clf.predict(X)
# 正常を1、異常を-1と判定するようです
ANOMALY_DATA = -1
predicted_outlier_index = np.where(y_pred == ANOMALY_DATA)
predicted_outlier = X[predicted_outlier_index]

# plot the level sets of the decision function
xx, yy = np.meshgrid(np.linspace(-5, 5, 50), np.linspace(-5, 5, 50))
Z = clf.decision_function(np.c_[xx.ravel(), yy.ravel()])
Z = Z.reshape(xx.shape)

plt.title("IsolationForest")
plt.contourf(xx, yy, Z, cmap=plt.cm.Blues_r)

a = plt.scatter(X[:200, 0], X[:200, 1], c='yellow',
                edgecolor='k', s=30, marker='o')
b = plt.scatter(X[200:, 0], X[200:, 1], c='red',
                edgecolor='k', s=30, marker='o')
c = plt.scatter(predicted_outlier[:, 0], predicted_outlier[:, 1], c='blue',
                edgecolor='k', s=10, marker='x')
plt.axis('tight')
plt.xlim((-5, 5))
plt.ylim((-5, 5))
plt.legend([a, b, c],
           ["normal observations",
            "abnormal observations",
            "observations predicted as abnormal"],
           loc="upper left", prop={'size': 12})
plt.show()

f:id:ni4muraano:20171107225255p:plain

2017-11-08

【異常検知】 LOF(Local Outlier Factor)による外れ値検知

Python 機械学習異常検知

外れ値検出手法の一つであるLOFに関する以下の資料を読んで試してみたいと思っていたところ、scikit-learnに例題があったのでメモします。

外れ値検出のアルゴリズム Local Outlier Factor from 翔吾大澤

www.slideshare.net

import numpy as np
import matplotlib.pyplot as plt
from sklearn.neighbors import LocalOutlierFactor

np.random.seed(42)

# Generate train data
X = 0.3 * np.random.randn(100, 2)
# Generate some abnormal novel observations
X_outliers = np.random.uniform(low=-4, high=4, size=(20, 2))
X = np.r_[X + 2, X - 2, X_outliers]

# fit the model
clf = LocalOutlierFactor(n_neighbors=20)
y_pred = clf.fit_predict(X)
# 正常を1、異常を-1と出力するようです
ANOMALY_DATA = -1
predicted_outlier_index = np.where(y_pred == ANOMALY_DATA)
predicted_outlier = X[predicted_outlier_index]

# plot the level sets of the decision function
xx, yy = np.meshgrid(np.linspace(-5, 5, 50), np.linspace(-5, 5, 50))
Z = clf._decision_function(np.c_[xx.ravel(), yy.ravel()])
Z = Z.reshape(xx.shape)

plt.title("Local Outlier Factor (LOF)")
plt.contourf(xx, yy, Z, cmap=plt.cm.Blues_r)

a = plt.scatter(X[:200, 0], X[:200, 1], c='yellow',
                edgecolor='k', s=30, marker='o')
b = plt.scatter(X[200:, 0], X[200:, 1], c='red',
                edgecolor='k', s=30, marker='o')
c = plt.scatter(predicted_outlier[:, 0], predicted_outlier[:, 1], c='blue',
                edgecolor='k', s=10, marker='x')
plt.axis('tight')
plt.xlim((-5, 5))
plt.ylim((-5, 5))
plt.legend([a, b, c],
           ["normal observations",
            "abnormal observations",
            "observations predicted as abnormal"],
           loc="upper left", prop={'size': 12})
plt.show()

f:id:ni4muraano:20171106225250p:plain

旅行好きなソフトエンジニアの備忘録

プログラミングや技術関連のメモを始めました

【C#】 JPEGの品質を変更する

【異常検知】 GMM（Gaussian Mixture Model）による外れ値検知

【異常検知】 One class SVMによる外れ値検知

【異常検知】 Fast ABOD（Angle Based Outlier Detection）による外れ値検知

【アイトラッキング】解析のために保存しておきたいデータ

【異常検知】 Isolation Forestによる外れ値検知

【異常検知】 LOF(Local Outlier Factor)による外れ値検知