2018-05-19

【統計】母欠点数に関する検定

C# 統計

ある製品を検査すると従来N個の欠点が見つかっていた。この対策が行われ、効果を検証するため製品をM個ピックアップしたところ、L個の欠点が見つかった。さて、欠点は減ったと言えるかどうか、というお題があって書籍「入門統計解析法」で調べると226ページからがまさにその内容だったので判定部のコードをメモします。

static void Main(string[] args)
{
    // 従来製品毎に3個の欠点が見つかっていた
    int conventionalDefectNumberPerProduct = 3;
    // 工程工夫後の製品を10個ピックアップしたら18個欠点が見つかった
    int currentDefectNumber = 18;
    int productNumber = 10;
    // 欠点は減った？
    Console.WriteLine(DoesDefectNumberDecrease(conventionalDefectNumberPerProduct, currentDefectNumber, productNumber));
}

static bool DoesDefectNumberDecrease(int conventionalDefectNumberPerProduct, int currentDefectNumber, int productNumber)
{
    // 有意水準0.05の時の臨界値
    const double criticalValue = -1.645;

    double rambda = (currentDefectNumber + 0.5) / productNumber;
    double u0 = (rambda - conventionalDefectNumberPerProduct) / Math.Sqrt((double)conventionalDefectNumberPerProduct / productNumber);
    return u0 <= criticalValue;
}

入門統計解析法

作者: 永田靖
出版社/メーカー: 日科技連出版社
発売日: 1992/04/01
メディア: 単行本
購入: 7人クリック: 10回
この商品を含むブログ (1件) を見る

2018-05-14

【C#】TimeSpanに割り算を適用する方法

複数のTimeSpanの平均値を求めたかったため、TimeSpanに割り算を適用する方法を調べたのですが、以下のやり方で出来ることが分かりました。

// 60秒
var t1 = new TimeSpan(0, 0, 60);
// Ticksに割り算を適用してTimeSpanのコンストラクタに入れる
var t2 = new TimeSpan(t1.Ticks/10);
// 6と表示される
Console.WriteLine(t2.Seconds);

情報源は以下。 stackoverflow.com

2018-05-13

【C#】動的にChartを追加する

C# WPF LiveCharts

円グラフを描きたいけれども、アプリを動作させてからでないと何個の円グラフを描くか決めれない状況のため、動的にChartを追加する方法を調べました。グラフを描くために使ったライブラリはLiveCharts.Wpfになります。

まずはxamlです。以下で定義したStackPanel1に動的にChartを追加します。

<Window x:Class="LiveChartsExample.MainWindow"
        xmlns="http://schemas.microsoft.com/winfx/2006/xaml/presentation"
        xmlns:x="http://schemas.microsoft.com/winfx/2006/xaml"
        xmlns:d="http://schemas.microsoft.com/expression/blend/2008"
        xmlns:mc="http://schemas.openxmlformats.org/markup-compatibility/2006"
        xmlns:local="clr-namespace:LiveChartsExample"
        xmlns:lvc="clr-namespace:LiveCharts.Wpf;assembly=LiveCharts.Wpf"
        mc:Ignorable="d"
        Title="MainWindow" Height="350" Width="525">
    <StackPanel Name="StackPanel1" Orientation="Horizontal">
    </StackPanel>
</Window>

次にC#側のソースです。

using System;
using System.Windows;
using System.Windows.Controls;
using System.Windows.Media;

using LiveCharts;
using LiveCharts.Wpf;

namespace LiveChartsExample
{
    /// <summary>
    /// MainWindow.xaml の相互作用ロジック
    /// </summary>
    public partial class MainWindow : Window
    {
        public MainWindow()
        {
            InitializeComponent();

            var random = new Random();
            // 例として円グラフを4つ描いてみる
            for (int i = 0; i < 4; ++i)
            {
                // グリッドの定義
                var grid = new Grid();
                grid.Width = 150;
                var row1 = new RowDefinition();
                row1.Height = new GridLength(30);
                var row2 = new RowDefinition();
                row2.Height = new GridLength(150);
                grid.RowDefinitions.Add(row1);
                grid.RowDefinitions.Add(row2);

                // 表題用TextBlock
                var title = new TextBlock();
                title.Text = "Chart" + i.ToString();
                title.FontSize = 20;
                title.Foreground = new SolidColorBrush(Colors.Black);
                title.HorizontalAlignment = HorizontalAlignment.Center;

                // 円グラフの定義
                var chart = new PieChart();
                chart.Name = "Chart" + i.ToString();
                chart.StartingRotationAngle = 0;
                chart.Width = 150;
                chart.Height = 150;
                chart.MouseDown += Chart_MouseDown;
                chart.LegendLocation = LegendLocation.Bottom;

                var series1 = new PieSeries()
                {
                    Title = "A",
                    Values = new ChartValues<int> { random.Next(100) },
                    DataLabels = true,
                    LabelPoint = point => string.Format("{0} ({1:P})", point.Y, point.Participation)
                };
                var series2 = new PieSeries()
                {
                    Title = "B",
                    Values = new ChartValues<int> { random.Next(100) },
                    DataLabels = true,
                    LabelPoint = point => string.Format("{0} ({1:P})", point.Y, point.Participation)
                };
                chart.Series.Add(series1);
                chart.Series.Add(series2);

                // グリッドの0行目に表題、1行目にグラフを設置
                Grid.SetRow(title, 0);
                Grid.SetRow(chart, 1);
                grid.Children.Add(title);
                grid.Children.Add(chart);

                // スタックパネルにグリッドを追加
                StackPanel1.Children.Add(grid);
            }
        }

        private void Chart_MouseDown(object sender, System.Windows.Input.MouseButtonEventArgs e)
        {
            PieChart chart = sender as PieChart;

            // 以下チャートがクリックされた時の動作を書く
        }

    }
}

上記を実行すると以下のように円グラフが４つ表示されます。 f:id:ni4muraano:20180513121255p:plain
それにしてもLiveChartsはグラフに表題を付けるのにこんな書き方をしないとダメなのでしょうか。PieChartクラスにTitleというプロパティが無く、サイトの例を見ても分からなかったので今回はGridにTextBlockとChartを置くことでTextBlockがChartのタイトルに見えるようにしています。

2018-05-03

【Python】PCANetを試してみる

Python Deep Learning

教師なし学習で画像の特徴量抽出を行う方法を調べていて以下を見つけたのでMNISTで試してみました。以下記事にgithubへのリンクがあるので、そこのpcanet.pyを写します。 qiita.com

写したpcanet.pyを使い、以下のmain.pyを書いて動作させれば特徴抽出⇒分類の結果を確認することが出来ます。ただし、pcanetのtransformメソッドが非常に時間がかかる（train, validation, test合計で1時間位かかります）ため、MNISTデータの一部だけ使用しており、かつハイパーパラメータチューニングもやっていません。この状態だとHOG⇒分類で94.7%、PCANet⇒分類で96.4%の正解率となりました。今後はPCANetのようなネットワークが他にもないか調べておきたいです。

import numpy as np
from datetime import datetime
from keras.datasets import mnist
from pcanet import PCANet
from lightgbm import LGBMClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
from skimage.feature import hog


def get_part_of_mnist(ratio_to_use=0.1):
    test_size = 1.0 - ratio_to_use
    (X_train, y_train), (X_test, y_test) = mnist.load_data()
    X_train, X_validation, y_train, y_validation = train_test_split(X_train, y_train, test_size=test_size, stratify=y_train)
    X_validation, X_dummy, y_validation, y_dummy = train_test_split(X_validation, y_validation, test_size=test_size, stratify=y_validation)
    X_test, X_dummy, y_test, y_dummy = train_test_split(X_test, y_test, test_size=test_size, stratify=y_test)
    return X_train, y_train, X_validation, y_validation, X_test, y_test


def extract_feature_using_pcanet(X_train, X_validation, X_test):
    # Arguments are basically passed as tuple in the form (height, width) but int is also allowed.
    # If int is given, the parameter will be converted into (size, size) implicitly.
    pcanet = PCANet(
        image_shape=X_train.shape[1],  # the size of an input image
        # kernel size, kernel step size, and the number of filters in the first layer, respectively
        filter_shape_l1=2, step_shape_l1=1, n_l1_output=4,
        # kernel size, kernel step size, and the number of filters in the second layer, respectively
        filter_shape_l2=2, step_shape_l2=1, n_l2_output=4,
        block_shape=2  # the size of area to calculate histogram
    )

    # Check whether all pixels can be considered. Raise ValueError if the structure is not valid.
    # Calling this function is optional. PCANet works without this line.
    pcanet.validate_structure()

    pcanet.fit(X_train)  # Train PCANet

    # Trained PCANet behaves as a transformer from images into features.
    # `images` is a 3d array in the form (n_images, height, width), who are transformed into feature vectors.
    X_train = pcanet.transform(X_train)
    X_validation = pcanet.transform(X_validation)
    X_test = pcanet.transform(X_test)

    return X_train, X_validation, X_test


def extract_feature_using_hog(X_train, X_validation, X_test):
    features_train = []
    for image in X_train:
        feature = hog(image, orientations=9, pixels_per_cell=(8, 8), cells_per_block=(3, 3))
        features_train.append(feature)
    features_validation = []
    for image in X_validation:
        feature = hog(image, orientations=9, pixels_per_cell=(8, 8), cells_per_block=(3, 3))
        features_validation.append(feature)
    features_test = []
    for image in X_test:
        feature = hog(image, orientations=9, pixels_per_cell=(8, 8), cells_per_block=(3, 3))
        features_test.append(feature)
    return np.array(features_train), np.array(features_validation), np.array(features_test)


if __name__ == '__main__':
    # MNISTデータの一部を取得する
    X_train, y_train, X_validation, y_validation, X_test, y_test = get_part_of_mnist()

    # PCANetで特徴量を生成する
    X_train, X_validation, X_test = extract_feature_using_pcanet(X_train, X_validation, X_test)
    # HOGで特徴量を生成する
    #X_train, X_validation, X_test = extract_feature_using_hog(X_train, X_validation, X_test)

    # LightGBM分類器を生成する
    model = LGBMClassifier(objective='multiclass',
                           num_leaves=31,
                           learning_rate=0.1,
                           n_estimators=300)
    model.fit(X_train, y_train,
              eval_set=[(X_validation, y_validation)],
              eval_metric='multi_logloss',
              early_stopping_rounds=5)
    y_pred = model.predict(X_test)
    print(accuracy_score(y_test, y_pred)*100)

2018-04-22

【PyTorch】モデルがevalモードの時にout of memoryが発生する事への対処法

Python PyTorch Deep Learning

PyTorchでモデルがtrainモードの時には発生しないのですが、evalモードの時にGPUのメモリが解放されないまま消費されていきout of memoryが発生していました。調べたところ、Variableにvolatileという引数があって、これをTrueにすれば良いよというアドバイスがあり、確かにout of memoryが発生しなくなりました。

# evalモードへ
model.eval()
# evalモードの時はvolatile=TrueでGPUメモリ解放される
X = Variable(X, volatile=True)

stackoverflow.com

2018-04-12

トランザクション分離レベルについて分かりやすく説明してくれてるサイト

RDBMS SQL

Repeatable Readとか何のことか良く分かっていなかったのですが、下記サイトに助けられました。 qiita.com

2018-04-02

【Python】不均衡な2クラスセグメンテーション問題に適用するロス関数のメモ

Python Keras Deep Learning

この論文で不均衡な2クラスセグメンテーション問題に適用するロス関数が提案されていたのでメモします。ディープラーニングを使ったセグメンテーションでデータが極端に不均衡（例えば画像のほとんどが0で、1はちょっとだけ）の場合、工夫をしないと学習が上手くいかないのですが、論文ではロス関数の工夫によりこの問題を回避しようとしています。

下記の記事ではセグメンテーションのロス関数に以下のダイス係数を利用しました。

def dice_coef(y_true, y_pred):
    y_true = K.flatten(y_true)
    y_pred = K.flatten(y_pred)
    intersection = K.sum(y_true * y_pred)
    return 2.0 * intersection / (K.sum(y_true) + K.sum(y_pred) + 1)

def dice_coef_loss(y_true, y_pred):
    return 1.0 - dice_coef(y_true, y_pred)

ni4muraano.hatenablog.com

論文ではTversky loss functionという関数を提案しており、以下のようになります。ただこれどこかで見たと思ったらIOUの修正バージョンですね。

ALPHA = 0.3 # 0～1.0の値、Precision重視ならALPHAを大きくする
BETA = 1.0 - ALPHA # 0～1.0の値、Recall重視ならALPHAを小さくする

def tversky_index(y_true, y_pred):
    y_true = K.flatten(y_true)
    y_pred = K.flatten(y_pred)
    intersection = K.sum(y_true * y_pred)
    false_positive = K.sum((1.0 - y_true) * y_pred)
    false_negative = K.sum(y_true * (1.0 - y_pred))
    return intersection / (intersection + ALPHA*false_positive + BETA*false_negative)

def tversky_loss(y_true, y_pred):
    return 1.0 - tversky_index(y_true, y_pred)

旅行好きなソフトエンジニアの備忘録

プログラミングや技術関連のメモを始めました