2018-04-02

【Python】不均衡な2クラスセグメンテーション問題に適用するロス関数のメモ

Python Keras Deep Learning

この論文で不均衡な2クラスセグメンテーション問題に適用するロス関数が提案されていたのでメモします。ディープラーニングを使ったセグメンテーションでデータが極端に不均衡（例えば画像のほとんどが0で、1はちょっとだけ）の場合、工夫をしないと学習が上手くいかないのですが、論文ではロス関数の工夫によりこの問題を回避しようとしています。

下記の記事ではセグメンテーションのロス関数に以下のダイス係数を利用しました。

def dice_coef(y_true, y_pred):
    y_true = K.flatten(y_true)
    y_pred = K.flatten(y_pred)
    intersection = K.sum(y_true * y_pred)
    return 2.0 * intersection / (K.sum(y_true) + K.sum(y_pred) + 1)

def dice_coef_loss(y_true, y_pred):
    return 1.0 - dice_coef(y_true, y_pred)

ni4muraano.hatenablog.com

論文ではTversky loss functionという関数を提案しており、以下のようになります。ただこれどこかで見たと思ったらIOUの修正バージョンですね。

ALPHA = 0.3 # 0～1.0の値、Precision重視ならALPHAを大きくする
BETA = 1.0 - ALPHA # 0～1.0の値、Recall重視ならALPHAを小さくする

def tversky_index(y_true, y_pred):
    y_true = K.flatten(y_true)
    y_pred = K.flatten(y_pred)
    intersection = K.sum(y_true * y_pred)
    false_positive = K.sum((1.0 - y_true) * y_pred)
    false_negative = K.sum(y_true * (1.0 - y_pred))
    return intersection / (intersection + ALPHA*false_positive + BETA*false_negative)

def tversky_loss(y_true, y_pred):
    return 1.0 - tversky_index(y_true, y_pred)

2018-03-18

【Python】How to generate one-hot encodings for an array in numpy? - 101 Numpy Exercises

Python

Q:

One-hot encodingを計算しなさい
(Kerasのnp_utils.to_categoricalを使えば良いのですが、Keras使わない時のためのメモ)
Input:

arr = np.random.randint(1,4, size=6)
arr
#> array([2, 3, 2, 2, 2, 1])

Output:

#> array([[ 0.,  1.,  0.],
#>        [ 0.,  0.,  1.],
#>        [ 0.,  1.,  0.],
#>        [ 0.,  1.,  0.],
#>        [ 0.,  1.,  0.],
#>        [ 1.,  0.,  0.]])

A:

# Input:
arr = np.random.randint(1,4, size=6)
arr
#> array([2, 3, 2, 2, 2, 1])

# Solution:
def one_hot_encodings(arr):
    uniqs = np.unique(arr)
    out = np.zeros((arr.shape[0], uniqs.shape[0]))
    for i, k in enumerate(arr):
        out[i, k-1] = 1
    return out

one_hot_encodings(arr)
#> array([[ 0.,  1.,  0.],
#>        [ 0.,  0.,  1.],
#>        [ 0.,  1.,  0.],
#>        [ 0.,  1.,  0.],
#>        [ 0.,  1.,  0.],
#>        [ 1.,  0.,  0.]])

# Method 2:
(arr[:, None] == np.unique(arr)).view(np.int8)

www.machinelearningplus.com

2018-03-12

【Python】How to find the most frequent value in a numpy array? - 101 Numpy Exercises

Python

Q:

irisのpetal lengthで最も出現頻度が高い値を見つけなさい

A:

# Input:
url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data'
iris = np.genfromtxt(url, delimiter=',', dtype='object')

# Solution:
vals, counts = np.unique(iris[:, 3], return_counts=True)
print(vals[np.argmax(counts)])
#> b'0.2'

www.machinelearningplus.com

2018-03-12

【Python】How to sort a 2D array by a column? - 101 Numpy Exercises

Python

Q:

irisデータセットをsepallengthカラムの値でソートしなさい

A:

url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data'
iris = np.genfromtxt(url, delimiter=',', dtype='object')
names = ('sepallength', 'sepalwidth', 'petallength', 'petalwidth', 'species')

# Sort by column position 0: SepalLength
print(iris[iris[:,0].argsort()][:20])
#> [[b'4.3' b'3.0' b'1.1' b'0.1' b'Iris-setosa']
#>  [b'4.4' b'3.2' b'1.3' b'0.2' b'Iris-setosa']
#>  [b'4.4' b'3.0' b'1.3' b'0.2' b'Iris-setosa']
#>  [b'4.4' b'2.9' b'1.4' b'0.2' b'Iris-setosa']

www.machinelearningplus.com

2018-03-12

【Python】How to get the second largest value of an array when grouped by another array? - 101 Numpy Exercises

Python

Q:

setosaで二番目に長いpetallengthは何？

# Input
url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data'
iris = np.genfromtxt(url, delimiter=',', dtype='object')
names = ('sepallength', 'sepalwidth', 'petallength', 'petalwidth', 'species')

A:

# Import iris keeping the text column intact
url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data'
iris = np.genfromtxt(url, delimiter=',', dtype='object')

# Solution
# Get the species and petal length columns
petal_len_setosa = iris[iris[:, 4] == b'Iris-setosa', 2].astype('float')

# Get the second last value
np.unique(np.sort(petal_len_setosa))[-2]
#> 1.7

www.machinelearningplus.com

2018-03-11

【Python】How to convert a numeric to a categorical (text) array? - 101 Numpy Exercises

Python

Q:

以下のようにiris_2dの3番目のカラムをビニングしなさい

3より小さい --> 'small'
3-5 --> 'medium'
5以上 --> 'large'

# Input
url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data'
iris_2d = np.genfromtxt(url, delimiter=',', dtype='object')

A:

# Input
url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data'
iris = np.genfromtxt(url, delimiter=',', dtype='object')
names = ('sepallength', 'sepalwidth', 'petallength', 'petalwidth', 'species')

# Bin petallength 
petal_length_bin = np.digitize(iris[:, 2].astype('float'), [0, 3, 5, 10])

# Map it to respective category
label_map = {1: 'small', 2: 'medium', 3: 'large', 4: np.nan}
petal_length_cat = [label_map[x] for x in petal_length_bin]

# View
petal_length_cat[:4]
<#> ['small', 'small', 'small', 'small']

www.machinelearningplus.com

2018-03-11

【Python】How to drop rows that contain a missing value from a numpy array? - 101 Numpy Exercises

Python

Q:

iris_2dからnanを含まない行だけ取り出しなさい

# Input
url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data'
iris_2d = np.genfromtxt(url, delimiter=',', dtype='float', usecols=[0,1,2,3])

A:

# Input
url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data'
iris_2d = np.genfromtxt(url, delimiter=',', dtype='float', usecols=[0,1,2,3])
iris_2d[np.random.randint(150, size=20), np.random.randint(4, size=20)] = np.nan

# Solution
# No direct numpy function for this.
any_nan_in_row = np.array([~np.any(np.isnan(row)) for row in iris_2d])
iris_2d[any_nan_in_row][:5]
#> array([[ 4.9,  3. ,  1.4,  0.2],
#>        [ 4.7,  3.2,  1.3,  0.2],
#>        [ 4.6,  3.1,  1.5,  0.2],
#>        [ 5. ,  3.6,  1.4,  0.2],
#>        [ 5.4,  3.9,  1.7,  0.4]])

旅行好きなソフトエンジニアの備忘録

プログラミングや技術関連のメモを始めました

【Python】不均衡な2クラスセグメンテーション問題に適用するロス関数のメモ

【Python】How to generate one-hot encodings for an array in numpy? - 101 Numpy Exercises

Q:

A:

【Python】How to find the most frequent value in a numpy array? - 101 Numpy Exercises

Q:

A:

【Python】How to sort a 2D array by a column? - 101 Numpy Exercises

Q:

A:

【Python】How to get the second largest value of an array when grouped by another array? - 101 Numpy Exercises

Q:

A:

【Python】How to convert a numeric to a categorical (text) array? - 101 Numpy Exercises

Q:

A:

【Python】How to drop rows that contain a missing value from a numpy array? - 101 Numpy Exercises

Q:

A: