Biostat/Biomath M257 Homework 5

Due May 26 @ 11:59PM

Author

Student Name and UID

Published

May 25, 2023

System information (for reproducibility):

versioninfo()
Julia Version 1.9.0
Commit 8e630552924 (2023-05-07 11:25 UTC)
Platform Info:
  OS: macOS (arm64-apple-darwin22.4.0)
  CPU: 12 × Apple M2 Max
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-14.0.6 (ORCJIT, apple-m1)
  Threads: 9 on 8 virtual cores
Environment:
  JULIA_NUM_THREADS = 8
  JULIA_EDITOR = code

Load packages:

using Pkg

Pkg.activate(pwd())
Pkg.instantiate()
Pkg.status()
  Activating project at `~/Documents/github.com/ucla-biostat-257/2023spring/hw/hw5`
Status `~/Documents/github.com/ucla-biostat-257/2023spring/hw/hw5/Project.toml`
  [f65535da] Convex v0.15.3
  [a93c6f00] DataFrames v1.5.0
  [87dc4568] HiGHS v1.5.1
  [b99e6be6] Hypatia v0.7.2
  [4076af6c] JuMP v1.11.0
  [2f354839] Pajarito v0.8.2
  [08abe8d2] PrettyTables v2.2.4
  [3eaba693] StatsModels v0.7.2

In this exercise, we practice using disciplined convex programming (SDP in particular) to solve optimal design problems.

1 Introduction to optimal design

Consider a linear model \[\begin{eqnarray*} y_i = \mathbf{x}_i^T \boldsymbol{\beta} + \epsilon_i, \quad i = 1,\ldots, n, \end{eqnarray*}\] where \(\epsilon_i\) are independent Gaussian noises with common variance \(\sigma^2\). It is well known that the least squares estimate \(\hat{\boldsymbol{\beta}}\) is unbiased and has covariance \(\sigma^2 (\sum_{i=1}^n \mathbf{x}_i \mathbf{x}_i^T)^{-1}\).

In exact optimal design, given total number of \(n\) allowable experiments, we want to choose among a list of \(m\) candidate design points \(\{\mathbf{x}_1, \ldots, \mathbf{x}_m\}\) such that the covariance matrix is minimized in some sense. In mathematical terms, we want to find an integer vector \(\mathbf{n} = (n_1, \ldots, n_m)\) such that \(n_i \ge 0\), \(\sum_{i=1}^m n_i = n\), and the matrix \(\mathbf{V} = \left( \sum_{i=1}^m n_i \mathbf{x}_i \mathbf{x}_i^T \right)^{-1}\) is “small”.

In approximate optimal design, we want to find a probability vector \(\mathbf{p} = (p_1, \ldots, p_m)\) such that \(p_i \ge 0\), \(\sum_{i=1}^m p_i = 1\), and the matrix \(\mathbf{V} = \left( \sum_{i=1}^m p_i \mathbf{x}_i \mathbf{x}_i^T \right)^{-1}\) is “small”.

Commonly used optimal design criteria include:

  • In \(D\)-optimal design, we minimize the determinant of \(\mathbf{V}\) \[\begin{eqnarray*} &\text{minimize}& \det \left( \sum_{i=1}^m p_i \mathbf{x}_i \mathbf{x}_i^T \right)^{-1} \\ &\text{subject to}& p_i \ge 0, \sum_{i=1}^m p_i = 1. \end{eqnarray*}\]

  • In \(E\)-optimal design, we minimize the spectral norm, i.e., the maximum eigenvalue of \(\mathbf{V}\) \[\begin{eqnarray*} &\text{minimize}& \lambda_{\text{max}} \left( \sum_{i=1}^m p_i \mathbf{x}_i \mathbf{x}_i^T \right)^{-1} \\ &\text{subject to}& p_i \ge 0, \sum_{i=1}^m p_i = 1. \end{eqnarray*}\] Statistically we are minimizing the maximum variance of \(\sum_{j=1}^p a_j \text{var}(\hat \beta_j)\) over all vectors \(\mathbf{a}\) with unit norm.

  • In \(A\)-optimal design, we minimize the trace of \(\mathbf{V}\) \[\begin{eqnarray*} &\text{minimize}& \text{tr} \left( \sum_{i=1}^m p_i \mathbf{x}_i \mathbf{x}_i^T \right)^{-1} \\ &\text{subject to}& p_i \ge 0, \sum_{i=1}^m p_i = 1. \end{eqnarray*}\] Statistically we are minimizing the total variance \(\sum_{j=1}^p \text{var}(\hat \beta_j)\).

2 Q1 (10 pts) 3x4 factorial design

A drug company ask you to help design a two factor clinical trial, in which treatment A has three levels (A1, A2, and A3) and treatment B has four levels (B1, B2, B3, and B4). Drug company also tells you that the treatment combination A3:B4 has undesirable side effects so we ignore this design point.

Using dummy coding with A1 and B1 as the baseline levels, find the matrix \(C\) with each row a unique design point.

3 Q2 (30 pts) Find approximate optimal designs

Using semidefinite programming (SDP) software to find the approximate D-, E-, and A-optimal designs for this clinical trial.

Hint: This is what I got, which may or may not be correct.

Approximate Optimal Design
┌───────────┬─────────┬─────────┬─────────┬─────────┬─────────┬─────────┐
│ design_pt │   D_opt │   E_opt │   A_opt │ D_opt_n │ E_opt_n │ A_opt_n │
│    String │ Float64 │ Float64 │ Float64 │ Float64 │ Float64 │ Float64 │
├───────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┤
│      A1B1 │   0.082 │   0.272 │   0.200 │       8 │      27 │      20 │
│      A2B1 │   0.082 │   0.152 │   0.101 │       8 │      15 │      10 │
│      A3B1 │   0.097 │   0.114 │   0.104 │      10 │      11 │      10 │
│      A1B2 │   0.082 │   0.057 │   0.086 │       8 │       6 │       9 │
│      A2B2 │   0.082 │   0.039 │   0.051 │       8 │       4 │       5 │
│      A3B2 │   0.097 │   0.057 │   0.068 │      10 │       6 │       7 │
│      A1B3 │   0.082 │   0.057 │   0.086 │       8 │       6 │       9 │
│      A2B3 │   0.082 │   0.039 │   0.051 │       8 │       4 │       5 │
│      A3B3 │   0.097 │   0.057 │   0.068 │      10 │       6 │       7 │
│      A1B4 │   0.109 │   0.081 │   0.106 │      11 │       8 │      11 │
│      A2B4 │   0.109 │   0.073 │   0.080 │      11 │       7 │       8 │
└───────────┴─────────┴─────────┴─────────┴─────────┴─────────┴─────────┘

4 Q3 (30 pts) Find exact optimal designs

Using mixed-integer semidefinite programming (SDP) software to find the exact D-, E-, and A-optimal designs for this clinical trial with \(n=100\).

Hint: This is what I got. Apparently I haven’t got the E-optimal design right yet.

Exact Optimal Design
┌───────────┬───────┬───────┬───────┐
│ design_pt │ D_opt │ E_opt │ A_opt │
│    String │ Int64 │ Int64 │ Int64 │
├───────────┼───────┼───────┼───────┤
│      A1B1 │     8 │    90 │    20 │
│      A2B1 │     8 │     1 │    10 │
│      A3B1 │    10 │     1 │    10 │
│      A1B2 │     8 │     1 │     9 │
│      A2B2 │     8 │     1 │     5 │
│      A3B2 │    10 │     1 │     7 │
│      A1B3 │     8 │     1 │     9 │
│      A2B3 │     8 │     1 │     5 │
│      A3B3 │    10 │     1 │     7 │
│      A1B4 │    11 │     1 │    10 │
│      A2B4 │    11 │     1 │     8 │
└───────────┴───────┴───────┴───────┘

5 Q4 (30 bonus points) Optimal design with nuisance parameters

Suppose the regression coefficients of linear model \(\boldsymbol{\beta}\) is partitioned as \(\boldsymbol{\beta} = (\boldsymbol{\beta}_0^T, \boldsymbol{\beta}_1^T)^T\), where \(\boldsymbol{\beta}_0\) are nuisance parameters and \(\boldsymbol{\beta}_1\) are parameters of primary interest. Given an approximate design \(\mathbf{p} = (p_1, \ldots, p_m)\), let the information matrix be partitioned accordingly \[ \mathbf{I}(\mathbf{p}) = \sum_{i=1}^m p_i \mathbf{x}_i \mathbf{x}_i^T = \begin{pmatrix} \mathbf{I}_{00} & \mathbf{I}_{01} \\ \mathbf{I}_{10} & \mathbf{I}_{11} \end{pmatrix}. \] Then the information matrix for \(\boldsymbol{\beta}_1\) adjusted for nuisance parameter \(\boldsymbol{\beta}_0\) is \[ \mathbf{I}_{1 \mid 0}(\mathbf{p}) = \mathbf{I}_{11} - \mathbf{I}_{10} \mathbf{I}_{00}^{-1} \mathbf{I}_{01}. \]

Revisiting the 3x4 factorial design problem in Q1, suppose the drug company only cares about the estimation of A treatment effects. Find the approximate D-, E-, and A-optimal designs.