R のバックアップソース(No.8)

#contents

* R 本体（Windows版）のアップデート 22 October 2017 [#r50e5362]

[[Updating R from R (on Windows) &#8211; using the {installr} package>https://www.r-statistics.com/2013/03/updating-r-from-r-on-windows-using-the-installr-package/]]~
[[Windows用Rのバージョンアップ>http://d.hatena.ne.jp/ryamada/20140522/1400554341]]

  install.packages("installr")
  library(installr)
  updateR() 


* 『「蜂群＋平均値」に折れ線グラフを重ねる』をふたつ重ねる 2017年8月7日 [#mc957cf6]

- 横軸はカテゴリ
- 一番左のカテゴリは少し性質が違うので、左からひとつめと左からふたつめをつなぐ折れ線は他と別の破線で
- 『「蜂群＋平均値」に折れ線グラフを重ねる』をずらしてふたつ重ねる
- 平均値はグラフ作成の途中で計算させる
- 「ずらし」を沢山使うので、カテゴリを factor と score を使って振り分け

&ref(beeswarmmeanplot.png);

  # DATA HERE
  # cat0
  control0 <- c(3986, 4246, 5102, 4920, 6102)
  test0 <- c(3055, 2203, 2948, 3108, 5261)
  #
  # cat1
  control1 <- c(1848, 434, 2072, 1786, 2642)
  test1 <- c(1220, 1185, 1702, 1212, 1515)
  #
  # cat2
  control2 <- c(2981, 2217, 2880, 2962, 3764)
  test2 <- c(1138, 672, 3669, 472, 512)
  #
  # cat3
  control3 <- c(1453, 2846, 2652, 1447, 2261)
  test3 <- c(1520, 2082, 1006, 1566, 1224)
  #
  # End of DATA HERE
  ###################################################
  #
  # Draw graphs
  #
  library(beeswarm)
  #
  ## 軸の余裕分 myxlimadd
  myxlimadd <- 1
  #
  ###################################################
  #
  postscript("out.eps", horizontal=FALSE, height=6, width=5, pointsize=15)
  #
  ###################################################
  #
  mytitle <- "My Title"
  #
  # 縦軸の範囲を決めておく：繰り返し使えるように
  yminmax <- c(0,7000)
  #
  # 縦軸の目盛刻み（ラベルにもなる）
  ytick <- c(0,1000,2000,3000,4000,5000,6000,7000)
  #
  # 横軸のカテゴリ ＋ beewarm用隙間（5）
  myxlim <- 11 + myxlimadd
  #
  ## 横軸の目盛刻み
  xtick <- c(1.5,4.5,7.5,10.5)
  #
  # 左：control
  #
  # データを描く横軸の目盛上の位置：左
  myx <- c(1,4,7,10)
  # data
  data0 <- control0
  data1 <- control1
  data2 <- control2
  data3 <- control3
  #
  # 「1-3」データのみ：「0」データは NA とする
  meanright <- c(NA,mean(data1), mean(data2), mean(data3))
  #
  # 「1-3」データ左端と「0」データ。ほかのデータは NA とする
  meanleft <- c(mean(data0),mean(data1),NA,NA)
  #
  #
  # 「1-3」データ折れ線
  # 
  # cex=3, col="gray" サイズ10倍、色グレー
  plot(meanright~myx,type="b",pch="-", cex=3, col="gray", ylim=yminmax,
    xlim=c(1,myxlim),xlab="",ylab="",axes=FALSE, main=mytitle)
  # 
  # 「1-3」データ左端と「0」データ折れ線
  # 
  par(new=T)
  # 
  # cex=3, col="gray" サイズ10倍、色グレー
  plot(meanleft~myx,type="b",pch="-", cex=3, col="gray", ylim=yminmax,
    xlim=c(1,myxlim),xlab="",ylab="", lty=2, axes=FALSE)
  # 
  # beeswarm で同じ点を打つために繰り返し数を入れておく
  condition <- factor(c(
  rep(1, length(data0)), # 「0」
  # dummy                      
  2,3,
  # dummy                      
  rep(4, length(data1)), # 「1」
  # dummy                      
  5,6,
  # dummy                      
  rep(7, length(data2)), # 「2」
  # dummy                      
  8,9,
  # dummy                      
  rep(10, length(data3)), # 「3」
  11 # dummy
  ))
  # 
  score <- c(
           data0,
           NA,NA,
           data1,
           NA,NA,
           data2,
           NA,NA,
           data3,
           NA
           )
  # 
  # 蜂群プロット
  # pch=5：ダイヤモンド◇
  beeswarm( score~condition, pch=5, xlim=c(1,myxlim),ylim=yminmax,
    axes=FALSE, add=TRUE, correl="wrap")
  # 
  ###################################################
  #右にずらして重ねる
  # 
  par(new=T)
  # 
  # data
  data0 <- test0
  data1 <- test1
  data2 <- test2
  data3 <- test3
  # 
  # データを描く横軸の目盛上の位置：右
  myx <- c(2,5,8,11)
  # 
  # 「1-3」データのみ：「0」データは NA とする
  meanright <- c(NA,mean(data1), mean(data2), mean(data3))
  #
  # 「1-3」データ左端と「0」データ。ほかのデータは NA とする
  meanleft <- c(mean(data0),mean(data1),NA,NA)
  #
  # 「1-3」データ折れ線
  # cex=3, col="gray" サイズ10倍、色グレー
  plot(meanright~myx,type="b",pch="-", cex=3, col="gray",
    ylim=yminmax,xlim=c(1,myxlim),xlab="",ylab="",axes=F)
  #
  # 「1-3」データ左端と「0」データ折れ線
  #
  par(new=T)
  #
  # cex=3, col="gray" サイズ10倍、色グレー
  plot(meanleft~myx,type="b",pch="-", cex=3, col="gray",
    ylim=yminmax,xlim=c(1,myxlim),xlab="",ylab="", lty=2, axes=F)
  #
  # beeswarm で同じ点を打つために繰り返し数を入れておく
  condition <- factor(c(
  1, # dummy
  rep(2, length(data0)), # 「0」
  # dummy
  3,4,
  # dummy
  rep(5, length(data1)), # 「1」
  # dummy
  6,7,
  # dummy
  rep(8, length(data2)), # 「2」
  # dummy
  9,10,
  # dummy
  rep(11, length(data3)) # 「3」
  ))
  #
  score <- c(
           NA,
           data0,
           NA,NA,
           data1,
           NA,NA,
           data2,
           NA,NA,
           data3
           )
  #
  #pch=1：白抜き丸○
  beeswarm( score~condition, pch=1, xlim=c(1,myxlim),ylim=yminmax,
    axes=FALSE, add=TRUE, correl="wrap")
  #
  # 縦軸：side=2
  # las = 1：X軸、Y軸ともに数値ラベルを水平に書く
  # line=2：縦軸を左にずらす
  axis(2,ytick,las=1,line=1)
  #
  # 横軸：side=1
  axis(1,xtick,labels=FALSE)
  ###################################################
  #
  # postscript (eps) 出力終了
  dev.off()


*survival analysis (11 April 2004)[#h9e1c007]

ショウジョウバエの配偶行動を見るときに、観察時間内に交尾しないものがある。そこで、交尾したものを「dead」、交尾しなかったものを「survive」とみなしてsurvival analysisを行う。

Kaplan-Meier estimatorを計算
 > library(survival)
 > mydata <- read.table("data.txt", header=TRUE)
 > mydata
 > sink("out.txt")
 > myfit <- survfit( Surv(Time, Copl)~Age, data=mydata)
 > myfit
 > summary(myfit)

プロット
 > plot(myfit)

Log Rank test
 > survdiff( Surv(Time, Copl)~Age, data=mydata)

データの形式~
Copl=1は交尾。Copl=0は交尾せず（ここでは観察時間が120）。
 Age	Time	Copl
 2	41	1
 .
 .
 .
 2     120      0
 .
 .
 .

- ''Tomaru, M.'', Oguma, Y. & Watada, M. 2006. Courtship in '''Drosophila quadrilineata''' with a unique male behavioral element, abdomen bending. Journal of Ethology. 24: 133--139. [[doi: 10.1007/s10164-005-0172-4:http://dx.doi.org/10.1007/s10164-005-0172-4]]

**データの扱いについてのメモ 11 July 2011 [#u9273631]

 mytabledf <- data.frame(summary(myfit)$table)

とすると、myfit の出力テーブルがデータフレームになる。
str(mytabledf)で何が含まれているか分かる（records, n.max, n.start, events, median, X0.95LCL, X0.95UCL です）。
そこで、例えば、

 mytabledf$median

で、（Kaplan-Meier estimate の）median だけが出てくる。


*Comparing Regressions [#md8494d1]

**異なる条件下で得られた複数の回帰直線を比較する（01 February 2008） [#rda59cbd]
-傾きに差はある？
-切片に差はある？

長くなったのでこちら -> [[複数の回帰直線の比較>R_ComparingRegressions]]


*Contingency Table [#t9c03134]
**Fisher test (03 Oct 2005) [#te11009a]
 > mydata <- matrix(c(4,6,3,4), nr=2)
 > fisher.test(mydata)

**Fisher test for 2 by n (17 September 2010) [#kb93275c]
 > mydata <- matrix(c(15,19,10,15,3,8,3,3,9,2,2,0,1,3,2,4,9,7,5,7,4,10,4,4,2,2,4,2,2,1), nr=15)
 > mydata
 > fisher.test(mydata, workspace=20000000)
workspaceを大きく取らないとデータによってはエラーになる場合がある

*Goodness of Fit (14 January 2016) [#n418c3d5]
適合性の検定

メンデルの法則などのようなものの検定

Zar (1981) Example 5.1 のデータ~
null hypothesis (H&subsc{0};) は 3:1 （＝メンデル分離するかどうかを検定）

参考にしたページ~
An R Companion for the Handbook of Biological Statistics 
by Salvatore S. Mangiafico~
[[G-test of Goodness-of-Fit>http://rcompanion.org/rcompanion/b_04.html]]

**カイ二乗検定 [#o2122b54]

 > obs <- c(84,16)
 > hyp <- c(3/4,1/4)
 > chisq.test(obs, p=hyp)

手計算と同じことをする場合

 > exp = sum(obs)*hyp
 > dif <- obs - exp
 > dif2 <- dif*dif
 > dif2div <- dif2 / exp
 > chi <- sum(dif2div)
 > pchisq(chi,df=1,lower.tail=FALSE)

しかし ……、自由度が 1 のときは Yates correction を用いるのが望ましい。
ところが、chisq.test は contingency table では、correction=T 
オプションを付ければ Yates correction を行うが、
適合性の検定では行わない。~
そこで、手計算と同じようにせざるを得ない。~
差分の絶対値から0.5を引きそれを二乗するように式を変えればよい（dif2corrected）。

 > obs <- c(84,16)
 > hyp <- c(3/4,1/4)
 > exp = sum(obs)*hyp
 > dif <- obs - exp
 > dif2corrected <- (abs(dif)-0.5)*(abs(dif)-0.5)
 > dif2correcteddiv <- dif2corrected / exp
 > chic <- sum(dif2correcteddiv)
 > pchisq(chic,df=1,lower.tail=FALSE)

**G検定を行いたいなら [#g69a6ee3]

 > G = 2 * sum(obs * log(obs / exp))
 > a <- 2
 > n <- sum(obs)
 > df <- a-1
 > q <- 1 +((a*a-1)/(6*n*df))
 > Gadj <- G/q

ここでは、Gadj は Williams' correction~
a はカテゴリ数~
log が自然対数、常用対数は log10~


*ノンパラメトリック検定 [#t1badf9b]

** Kruskal-Wallis test と multiple comaprison 2017年10月11日 [#c4fb3ebb]

Kruskal-Wallis test（ノンパラメトリック分散分析）と
それに引き続く multiple comaprison（多重比較）を行う。

multiple comaprison は、
Zar (2010)((Zar, J. H. (2010) Biostatistical Analysis, 5th ed, Pearson Prentice-Hall, Upper Saddle River, NJ))
に
「ほぼ」((Zar には、P値の補正に Benjamini-Hochberg method を使うべきとは書いていない。実質的には使っているのだと思うけれども。))
従って、Dunn's method を利用。~
R では、
Benjamini-Hochberg method によって補正された P値も表示してくれる。~
Kruskal-Wallis test はデフォルトインストールだけで使える。~
multiple comaprison は
[[An R Companion for the Handbook of Biological Statistics, Kruskal-Wallis Test>https://rcompanion.org/rcompanion/d_06.html]]
に従って、FSA パッケージを利用。

データ（mydata.txt）の形式
  numdata [tab] category
  1.6059  [tab] Cat1
  ...     [tab] ...


R code
  # install.packages("FSA")
  # https://rcompanion.org/rcompanion/d_06.html
  # 
  x <- read.table ("mydata.txt", header=TRUE)
  kruskal.test(numdata ~ category, data = x)
  # 
  library(FSA)
  dunnTest(numdata ~ category, data = x, method="bh")

分布を確認するためにボックスプロットしてみるなら
  boxplot(numdata ~ category, data = x)

分布を確認するために蜂群＋ボックスプロットしてみるなら
  library(beeswarm)
  boxplot(x$numdata ~ x$category)
  beeswarm(x$numdata ~ x$category, corral="wrap", add=T)
（beeswarm が、「x$」方式でないと動かないようなので、boxplotもそうしてみた）


** ２標本検定 2017年8月7日 [#u79b3da1]

- Wilcoxon-Mann-Whitney test または、Exact Wilcoxon rank sum test、U-test
- Brunner-Munzel test
- t検定も続けて、分散の比較もしておく

  myx <- c(48, 34, 72, 86, 42)
  myy <- c(35, 46 ,78, 29, 31)
  #
  # Exact Wilcoxon Mann Whitney test
  library(exactRankTests) # wilcox.exact
  wilcox.exact(myx,myy,exact=TRUE)
  #
  library(coin)
  wilcox_test(
      c(myx,myy) ~ factor(c(rep("myx",length(myx)), rep("myy",length(myy)))),
              distribution="exact")
  #
  # Brunner-Munzel test
  library(lawstat) # brunner.munzel.test
  brunner.munzel.test(myx, myy)
  #
  # Brunner-Munzel test
  library(nparcomp) # npar.t.test
  writeLines("\n\nBrunner-Munzel test (npar.t.test)\n")
  mydata <- data.frame(myxy=c(myx, myy),
      cond=c(rep("myx",length(myx)),rep("myy",length(myy))))
  myresults <- npar.t.test(myxy~cond, data = mydata,
      method = "t.app", alternative = "two.sided")
  summary(myresults)
  #
  # permuted Brunner-Munzel test
  writeLines("\n\nBrunner-Munzel test (studentized permutation test)\n")
  myresults <- npar.t.test(myxy~cond, data = mydata,
      method = "permu", alternative = "two.sided")
  summary(myresults)
  #
  # t test
  t.test(myx, myy, var.equal=TRUE)
  # t test with unequal variance, Welch t test
  t.test(myx, myy, var.equal=FALSE)
  # F test for variance
  var.test(myx, myy)


ところが
 > library(exactRankTests) # wilcox.exact
 Package 'exactRankTests' is no longer under development.
 Please consider using package 'coin' instead.
となる

library coin を使うのは
[[奥村さんの解説>https://oku.edu.mie-u.ac.jp/~okumura/stat/wmw.html]] 
に従ってはいるが、出力が~
exactRankTests は、W値~
coin を使った場合は、Z値~
となる。

古くにU検定を教わった身には、W値のほうが、正規化されたZ値よりも気分がいい。
それに現時点では開発停止と書いてあるけれども、package `exactRankTests' は
2017-03-01版がある。~
参照：https://cran.r-project.org/web/packages/exactRankTests/index.html 

U検定：
exactRankTests の wilcox.exact の結果と
library coin を使った結果とは一致するはず~
//
Brunner-Munzel test 検定：
library lawstat の brunner.munzel.test の結果と
library nparcomp の npar.t.test を使った結果とは一致するはず~
//
t検定：
t.test で、「var.equal=TRUE」を指定しなくてもデフォルトでこれ~
//
ウェルチの t検定：
t.test で、「var.equal=FALSE」を指定


*** Brunner-Munzel検定 [#md492c58]

[[奥村さんの解説>https://oku.edu.mie-u.ac.jp/~okumura/stat/brunner-munzel.html]]
を見ていただくとして。

論文

- Brunner, E., & Munzel, U. (2000). The Nonparametric Behrens-Fisher Problem : Asymptotic Theory and a Small-Sample Approximation. Biometrical Journal, 42, 17&#8211;25. Doi: 10.1002/(SICI)1521-4036(200001)42:1<17::AID-BIMJ17>3.0.CO;2-U
- 名取真人. (2014). マン・ホイットニーのU検定と不等分散時における代表値の検定法. 霊長類研究, 30(1), 173&#8211;185. Doi: [[10.2354/psj.30.006>https://doi.org/10.2354/psj.30.006]]

**Kruskal-Wallis test (09 May 2004) [#a267b855]

3つ以上の$Ageの$Durationについて検定。Kruskal-Wallis検定の後に、ペアワイズでWilcoxon Rank Sum Test（＝U検定）をする。おまけで最後に分散分析。
 > sink("out.txt")
 > mydata2 <- read.table("data2.txt", header=TRUE)
 > mydata2
 > summary(mydata2)
 > kruskal.test(mydata2$Duration, mydata2$Age)
 > pairwise.wilcox.test(mydata2$Duration, mydata2$Age)
 > summary(aov(mydata2$Duration~mydata2$Age))

データの形式
 Age Duration 
 Age2      6
 .
 .
 .

- ''Tomaru, M.'', Oguma, Y. & Watada, M. 2006. Courtship in '''Drosophila quadrilineata''' with a unique male behavioral element, abdomen bending. Journal of Ethology. 24: 133--139. [[doi: 10.1007/s10164-005-0172-4:http://dx.doi.org/10.1007/s10164-005-0172-4]]

カテゴリ毎にサマリを出力（追記 22 February 2015）
 > by(mydata2, mydata2$Age, summary)


*パラメトリック検定 [#ta052279]

*データのまとめ (09 May 2004) [#xaf6ba55]

各$Ageの$Durationについて、
データのまとめ、平均、標準偏差の出力。~
 > sink("out.txt")
 > mydata <- read.table("data.txt", header=TRUE, fill=TRUE)
 > mydata
 > summary(mydata)
 > mean(mydata, na.rm=TRUE)
 > sd(mydata, na.rm=TRUE)

データの形式
 Age2 Age4 Age6 Age8 Age10
 9    6    7    4    8
 .
 .
 .

- ''Tomaru, M.'', Oguma, Y. & Watada, M. 2006. Courtship in '''Drosophila quadrilineata''' with a unique male behavioral element, abdomen bending. Journal of Ethology. 24: 133--139. [[doi: 10.1007/s10164-005-0172-4:http://dx.doi.org/10.1007/s10164-005-0172-4]]

*tips [#afeef77e]

**二進数 (23November2013) [#hf255012]
10進数を二進数のベクトルに変換する~
パッケージ sfsmisc のインストールが必要

 > library(sfsmisc) 
 > mydigits <- digitsBase(1101, base=2)
 > mydigits
 Class 'basedInt'(base = 2) [1:1]
       [,1]
  [1,]    1
  [2,]    0
  [3,]    0
  [4,]    0
  [5,]    1
  [6,]    0
  [7,]    0
  [8,]    1
  [9,]    1
 [10,]    0
 [11,]    1

 > mydigits <- digitsBase(1101, base=2,13) ← ベクトルの長さを指定したいとき


二進数に限らず利用できる
 
**重複しないランダムな数 (23November2013) [#n84fa29a]

 > sample(0:20,3)
 [1]  0 11  6
0から20のうちランダムに3つの数字を重複せずに取り出す

**行列の特定の数を置換 (23November2013) [#tb5633a2]

 x[(x==0)]<-- -1
xという名前の行列の要素のうち、0のものを-1に置換


*ESS [#s80ba893]

** 03 Oct 2005 [#q15bf9ff]
iESS起動
 M-x R
コマンド入力後
 C-c C-j
 C-m （改行）
で、１行ずつコマンドを実行する。

*コマンドメモ (30 March 2004) [#ve4bd0d4]
 > q() ← quit
 > source("commands.R")  ← コマンドを書き込んだソースを読み実行
 > sink("out.txt") ← ファイルに出力
 > sink() ← 出力をコンソールに戻す
 > objects() ← 現在あるオブジェクトを表示
 > mydata <- read.table("data.txt", header=TRUE) ← ファイルからデータを読み込む
 > library(ctest) ← ライブラリを使う
 > t.test(A, B) ← t検定（unpaired）。※ A、Bにはデータを入れておく
 > var.test(A, B) ← F検定
 > t.test(A, B, var.equal=TRUE) ← t検定、等分散
 > wilcox.test(A, B) ← U検定（＝Wilcoxon test）
 > ks.test(A, B) ← Kolmogorov-Smirnov test
 > getwd() ← working directoryの表示
 > setwd("c:/cygwin/home/tomaru/stats") ← working directoryの変更

追記 12 July 2013
 > length(X) ← データの数
 > mean(X) ← 平均
 > sd(X)/(length(X)^0.5) ← 標準誤差

追記 22 February 2015~
 > help.start() ← htmlヘルプを起動

**コマンドラインで利用する（Windows） 05September2013 [#q746f9e7]

cygwin shell または、コマンドプロンプトで
 rscript command.R
 rscript command.R & ← バックグラウンドで実行する場合

※ pathを通しておかなければなりません（MS-DOS！のようですが）。

参考：[[R の入出力画面を経由せずに実行：他のプログラムからの呼び出し:http://takenaka-akio.org/doc/r_auto/chapter_14.html]]

*Rのリンク [#fd0daed0]

- [[RjpWiki>http://www.okada.jp.org/RWiki/]]


|Today:&counter(today); |Yesterday:&counter(yesterday); |Total:&counter(); since 04 April 2006|