Kan Li

AI/Automation Adoption in Pharma

Kan Li — 2025-06-06T00:00:00+00:00

\onehalfspacing

Industry Observation of AI/Automation Adoption in Pharmaceutical and Healthcare Statistical Sectors

📋 EXECUTIVE SUMMARY

Market Reality

The pharmaceutical industry demonstrates strong interest in AI/automation technologies (18% of conference papers) but faces a critical implementation gap. Only 4% of initiatives had successfully reach production scale by the publication time, reveals a fundamental disconnect between research to operational deployment.

Key Strategic Findings

🚨 IMPLEMENTATION CRISIS

64.7% of initiatives remain in theoretical/research phase
22.9% achieve pilot implementation
8.4% reach proof-of-concept stage
Only 4.0% achieve production scale deployment

💰 BUSINESS IMPACT GAPS

Only 1.8% of papers provide quantitative ROI metrics
Average time savings reported: 54.3% (ranging 40-80%)
Limited evidence of measurable business transformation
Weak correlation between technical innovation and business outcomes

🔧 TECHNOLOGY LANDSCAPE

SAS maintains market dominance (71.8% of papers) but facing competitive pressure
Python experiencing rapid growth (15.3%, +105% year-over-year)
Machine learning applications concentrated in data quality and regulatory compliance
Limited adoption of advanced AI techniques in production environments

⚖️ REGULATORY CHALLENGES

41.3% of initiatives focus on EMA compliance frameworks
Only 0.5% adequately address AI/ML validation requirements
Regulatory uncertainty significantly hampering advancement
Critical gap between AI capabilities and validation standards

📊 Key Findings Overview

Dataset Statistics

Total Papers Analyzed: 3,058 conference papers and presentations
AI/Automation Relevant: 550 papers (18.0%)
Conference Coverage: PharmaSUG, PHUSE, WUSS, SESUG, MWSUG (2023-2025)

Technology Adoption Maturity

Maturity Level	Papers	Percentage	Status
Research/Theoretical	356	64.7%	⚪ Early Stage
Pilot Implementation	126	22.9%	🟡 Promising
Proof of Concept	46	8.4%	🟠 Emerging
Production Scale	22	4.0%	🔴 Critical Gap

Maturity Distribution Deep Dive

Research/Theoretical (64.7%):

Academic and exploratory research
Proof-of-concept demonstrations
Technology feasibility assessments
Regulatory framework discussions

Pilot Implementation (22.9%):

Limited scope deployments
Internal validation studies
Process optimization trials
Technology evaluation projects

Proof of Concept (8.4%):

End-to-end workflow demonstrations
Business case development
Stakeholder validation
Pre-production testing

Production Scale (4.0%):

Live operational systems
Regulatory-validated implementations
Business-critical applications
Scalable deployment architectures

🎯 Critical Industry Insights

1. Implementation Reality Check

96% of AI initiatives haven’t reached production scale by the time of publication
Most projects remain stuck in research/proof-of-concept phases
Significant gap between technical interest and operational deployment
Success rate varies significantly by conference (17-43% pilot/production)

2. Technology Stack Analysis

Technology	Papers	Market Share	Trend
SAS	395	71.8%	Dominant but stable
Python	84	15.3%	Growing rapidly (+105% YoY)
R	48	8.7%	Stable niche
TensorFlow	116	21.1%	Leading ML framework
Cloud Platforms	8	1.5%	Significant opportunity

Technology Evolution Trajectory:

Migration from monolithic SAS to hybrid SAS-Python architectures
Increased adoption of cloud-based analytics platforms
Growing interest in MLOps and production AI workflows
Regulatory pressure driving validated AI platform development

3. ROI Quantification Challenges

Only 10 papers (1.8%) provide quantitative ROI metrics
Limited business case development
Documented benefits include:
- Time savings: Up to 33% reduction
- Cost savings: 33-60% reduction in specific use cases
- Error reduction: Up to 60% decrease in data quality issues

ROI Measurement Gaps:

Lack of standardized metrics across organizations
Limited long-term outcome tracking
Difficulty in isolating AI impact from other process improvements
Regulatory validation costs not adequately captured

Value Realization Barriers:

Regulatory validation requirements increasing implementation costs
Skills gap in production-scale AI deployment
Integration challenges with legacy systems
Change management and organizational resistance

🏢 Application Domain Analysis

Primary Use Cases by Volume

SDTM/ADaM Creation (23.3% - 128 papers)
- Automated data standardization
- CDISC compliance tools
- Example: “aCRF Copilot: AI/ML Assisted CRF Annotation”
Data Visualization (21.1% - 116 papers)
- Dashboard automation
- Reporting enhancement
- Analytics platforms
Regulatory Submission (18.5% - 102 papers)
- eSubmission automation
- Compliance validation
- Example: “Front-Loading Statistical Programming Using Automated Synthetic Data”
Clinical Data Quality & Validation (10.9% - 60 papers)
- Data cleaning automation
- Quality control systems
- Validation frameworks
Safety & Pharmacovigilance: 6.4% (35 papers)
Statistical Programming: 7.1% (39 papers)

High-Impact Use Cases

AI-powered pharmacovigilance (6.4% but high maturity)
Statistical programming automation (7.1% with strong ROI potential)
Trial design optimization (10.4% with regulatory focus)

⚖️ Regulatory Compliance Landscape

Compliance Focus Areas

Category	Papers	Coverage	Assessment
EMA Compliance	227	41.3%	Strong focus
CDISC Standards	162	29.5%	Well-established
Data Integrity	118	21.5%	Moderate attention
FDA Validation	51	9.3%	Limited coverage
AI/ML Validation	3	0.5%	Critical gap

Current Regulatory Environment

FDA Perspective:

Growing acceptance of AI/ML in drug development
Guidance documents under development
Focus on validation and transparency requirements
Pilot programs for AI-assisted regulatory submissions

EMA Framework:

41.3% of papers reference EMA compliance considerations
Emphasis on data integrity and audit trails
Requirement for human oversight in AI decisions
Gradual evolution toward AI-friendly regulations

ICH Guidelines:

Integration of AI considerations into existing guidelines
Focus on Good Manufacturing Practice (GMP) compliance
Data integrity requirements for AI-generated data
Quality by design principles for AI systems

Validation Framework Requirements

Current Gaps:

Only 0.5% of papers address comprehensive AI/ML validation
Limited guidance on AI system qualification
Inconsistent approaches to AI validation across organizations
Regulatory uncertainty hampering investment decisions

Emerging Standards:

ISO 13485 adaptation for AI medical devices
ICH Q12 lifecycle management for AI systems
FDA Computer Software Assurance framework
Industry-specific validation protocols development

📈 Technology Adoption Trends (2023-2025)

Year-over-Year Growth

Metric	2023	2024	2025*	Growth Rate
Total AI Papers	134	245	171	+82.8%
ML-Focused Papers	52	112	64	+115.4%
Python Adoption	19	39	25	+105.3%
Production Scale	6	5	11	Variable

*2025 data partial (through conference dates)

Innovation Acceleration

Machine learning focus doubled in 2024
Python adoption more than doubled
Production deployments remain inconsistent
Strong momentum in automation tools

🌐 Vendor Ecosystem Mapping

Technology Platform Providers

Enterprise Analytics Platforms:

SAS: Market leader with 395 paper mentions
Oracle: 45 mentions, strong database integration
IBM: 23 mentions, focus on cognitive computing
Microsoft: 13 mentions, cloud-first approach

Cloud Platform Providers:

AWS: 6 mentions, highest production success rate
Microsoft Azure: Integrated with Office 365 ecosystem
Google Cloud: Limited pharmaceutical presence
Alibaba Cloud: Emerging in Asia-Pacific markets

Specialized Pharma Solutions:

Veeva Systems: CRM and regulatory solutions
Medidata: Clinical trial management platforms
IQVIA: Integrated data and analytics services
Palantir: Advanced analytics and AI platforms

Service Provider Landscape

Global CROs:

Contract Research Organizations leading in implementation
Strong regulatory expertise and operational capabilities
Partnership opportunities with technology providers
Competitive advantage in practical AI deployment

Systems Integrators:

Critical role in bridging technology and regulatory requirements
Emerging specialization in pharma AI validation
Growing market for regulatory-compliant AI implementations
Partnership opportunities with both pharma and tech companies

Vendor Ecosystem Gaps

Limited regulatory compliance support
Insufficient production-scale guidance
Fragmented technology landscape
Weak ROI measurement tools

🎯 Strategic Recommendations

For C-Suite Executives

Immediate Actions (0-6 months)

Establish quantitative ROI measurement frameworks
Assess current AI initiative maturity levels
Invest in regulatory validation capabilities
Develop multi-platform technology strategies

Medium-term Priorities (6-18 months)

Bridge proof-of-concept to production gap
Implement AI governance frameworks
Build internal AI/ML validation expertise
Establish vendor ecosystem partnerships

For IT/Data Leaders

Technology Strategy

Plan hybrid SAS-Python migration strategies
Evaluate cloud-based analytics platforms
Address ML frameworks skills gap
Prioritize data visualization infrastructure

Implementation Focus

Automated SDTM/ADaM generation tools
Clinical data quality automation
Regulatory submission platforms
Real-world evidence analytics

For Regulatory Affairs

Critical Needs

Develop AI/ML validation frameworks
Create model transparency standards
Establish regulatory review processes
Build FDA/EMA guidance implementation

💼 Market Opportunities

High-Potential Investment Areas

Regulatory-compliant AI/ML platforms
- Market gap: Only 0.5% focus on validation
- Opportunity: $XX billion market potential
Automated clinical data tools
- Strong demand: 23.3% of use cases
- ROI documented: Up to 60% cost savings
AI-powered pharmacovigilance
- Regulatory requirement
- High impact on patient safety
Cloud-based analytics platforms
- Current adoption: <2%
- Modernization opportunity

🚨 Industry Challenges & Risks

Implementation Barriers

The 96% Problem: Failure to reach production scale
Skills Gap: Limited AI/ML validation expertise
Regulatory Uncertainty: Unclear guidance for AI models
ROI Documentation: Insufficient business case development
Technology Fragmentation: Multiple platforms increase complexity

Risk Mitigation Strategies

Establish structured implementation methodologies
Develop regulatory validation frameworks
Create quantitative ROI measurement systems
Build hybrid technology capabilities
Foster industry-wide standards development

📋Discussions

Success Factors for Market Leaders

Organizational Capabilities:

Regulatory Expertise: Deep understanding of pharmaceutical regulatory requirements
Technical Excellence: Production-scale AI deployment capabilities
Change Management: Organizational readiness for AI transformation
Partnership Strategy: Effective collaboration with technology providers and CROs

Competitive Advantages:

First-mover advantage in regulatory-compliant AI solutions
Established relationships with regulatory agencies
Proven track record of production AI implementations
Comprehensive AI validation and qualification capabilities

Market Predictions

Technology Evolution:

Hybrid SAS-Python architectures will become industry standard
Cloud-based AI platforms will dominate new implementations
AI validation will become a specialized professional discipline
Regulatory AI will emerge as a distinct technology category

Competitive Dynamics:

Companies bridging the 96% implementation gap will gain significant market share
CROs will emerge as AI implementation leaders
Technology providers will consolidate around pharmaceutical-specific solutions
Regulatory expertise will become a key competitive differentiator

Analysis Limitations

Data Source Constraints:

Geographic bias toward North American and European conferences
Limited representation from Asia-Pacific and emerging markets
English-language conference focus may miss regional innovations
35.2% missing abstracts potentially affecting analysis completeness

Methodological Limitations:

Keyword-based classification may miss nuanced applications
Maturity assessment based on self-reported descriptions
Limited verification of actual implementation claims
No longitudinal tracking of project outcomes

Industry Representation Gaps:

Not all pharmaceutical companies participate in the conferences
Competitive considerations may limit disclosure of advanced AI implementations
Mix of vendor marketing versus genuine user experience reports
Potential over-representation of research-focused versus commercial applications

🎯 Conclusion

The pharmaceutical industry stands at a critical inflection point in AI/automation adoption. While technical interest is extremely high (18% of conference papers), the industry faces a massive implementation challenge with 96% of initiatives failing to reach production scale.

Organizations that can bridge the proof-of-concept to production gap while maintaining regulatory compliance will gain significant competitive advantage in the evolving pharmaceutical landscape.

This report represents analysis of publicly available conference presentations and abstracts. All findings should be validated through primary research and due diligence appropriate to specific organizational contexts and strategic decisions.

Just too lazy!! (nothing in it)

Kan Li — 2022-10-28T00:00:00+00:00

Rcpp - Using C++ and R code Together

Kan Li — 2016-11-07T00:00:00+00:00

This tutorial was presented as part of PH 1930 Statistical Computing. The examples for this session are available on Github at the link.

Overview
Configure the environment
Rcpp and RcppArmadillo
Getting started
First example
Parallelization
MCMC for GLMM

While it is possible to write C or Fortran code for use in R as covered previously in this course, Rcpp provides a clean, approachable API that lets we write high-performance code much easily and joyfully.

##Overview

In general, R is not a fast or most memory efficient language – but it is very easy to use, to share, and makes very pretty output. C++ is very fast, but not so easy to use, and debugging could be painful. Rcpp supports implementing R functions in C++ for high performance computing and to easily interface R with external libraries.

##Configure the environment for using C++ code with R

I only test the tool on Windows and Linux. No matter which platform you prefer, RStudio is highly recommended to do the work. For a Windows user, you will need to make sure you have the latest release of R (3.2.0+) and will also need to install the Rtools library before you can use any packages with C++ code in them. Usually RStudio will help to check whether the Rtools are installed and install it for you when you first use Rcpp. If you are Linux user, everything probably just works. You may also want to visit this blog post which has more information on making C++ work with R under Windows/Mac if you end up encountering any weirdness.

##Rcpp and RcppArmadillo

In this session we’ll learn how to improve our R code performance by rewriting key functions in C++. Typical bottlenecks that C++ can address include:

Loops that can’t be easily vectorized because subsequent iterations depend on previous ones.
Recursive functions, or problems which involve calling functions millions of times. The overhead of calling a function in C++ is much lower than that in R.
Problems that require advanced data structures and algorithms that R doesn’t provide. Through the standard template library (STL), C++ has efficient implementations of many important data structures, from ordered maps to double-ended queues.

To use Rcpp, a working knowledge of C++ is helpful, but not essential. Many good tutorials and references are freely available, including http://www.learncpp.com/ and http://www.cplusplus.com/. You may also enjoy Dirk Eddelbuettel’s Seamless R and C++ integration with Rcpp, which goes into much detail into all aspects of Rcpp.

RcppArmadillo is an add-on package that gives you access to tons of useful linear algebra functionality in C++. In particular it makes passing in and working with arrays, matrices and vectors pretty easy. You can check out the Armadillo docs here: http://arma.sourceforge.net/docs.html. This is where I go to look up functions to see how to use them or if they exist.

Although the base Rcpp package provides its own data structures (e.g., array, matrix, list) that can be passed easily between R and C++, the Armadillo data structures provided by the RcppArmadillo package are really nice and easier to use. Additionally, the Armadillo data structures are native C++ data structures, have multi-thread support and less unpredictable bad errors are encountered in practice. So I suggest using Armadillo data structures in most cases.

##Getting started

Before we start the first example, we install the two R package using


install.packages("Rcpp")
install.packages("RcppArmadillo")

in Rstudio command line.

We start by creating a new file in RStudio by clicking on the icon in the top left corner of the screen and select C++ File from the drop down menu.

A sample timesTwo* C++ program will pop up. We will now need to save that file somewhere. Once we have done so, we can click on the Source button and you will see that RStudio automatically calls Rcpp::sourceCpp() on the file.

This will compile the function and make it so we can access the C++ code from R - much easier than Calling C from R we taught before. The example can help you to check everything needed are correctly installed on your computer.

Some other things to note:

This statement include Rcpp header file in the program. The function is similar as library( ) in R. It also tells us that we can call any Rcpp constructs by their given name without the Rcpp:: prefix.


#include 
using namespace Rcpp;

We also put this statement before each function we want to make available to R:


//[[Rcpp::export]]

We can also define multiple C++ functions in the same file (not necessarily recommended unless some of them will be used by the main function), so we can put one in front of each one we want to make visible.

We can include R code blocks in C++ files processed with sourceCpp . The R code will be automatically run after the compilation.


/*** R
timesTwo(42)
*/

##First example We introduce our first example of calculating inner product of two vectors, and use data structure of Armadillo. The C++ code is:


#include 
// [[Rcpp::depends( RcppArmadillo)]]
using namespace arma;

// [[Rcpp::export]]
arma::mat inner(arma::vec x, arma::vec y){
  arma::mat ip=x.t()*y;
  return(ip);
}

/*** R
vecA = rnorm(100000)
*/

Note that we no longer use the:

#include

statement as in the example code provided by RStudio, but replace it with:


#include 
//[[Rcpp::depends(RcppArmadillo)]]

Also note that for all of the objects, we have to specify their type as we define the argument. This is a feature of C++ that is different from R where we just create objects without having to specify their type. Some basic C++ Armadillo objects we can pass in:

For decimal numbers like 1.2347 we need to use the double declaration, followed by the name of the argument (e.g.. my_double)
For integers (whole numbers) like 26 we use the int declaration, followed by the argument.
For numeric vectors, we use the arma::vec declaration, followed by the argument. This code should crash if you try to pass in anything other than a numeric vector (can contain doubles or integers). When you refer to the i th elements of a vector such as x , we use x(i) instead of x[i] as in R.
For numeric matrices, we use the arma::mat declaration, followed by the argument. Again, make sure it is just numbers in there.

We then write following R code and run it.


> library(Rcpp)
> library(RcppArmadillo)

> sourceCpp('test.cpp')
> vecA = rnorm(100000)

> inner(vecA,vecA)
         [,1]
[1,] 100317.2

##Parallelization in Rcpp via OpenMP

Now we make a small extension of the example and simply introduce using OpenMP to parallelize our C++ function. In order for the compiler to recognize OpenMP, we need to include the OpenMP header flag. Additionally, Rcpp has a plugin that will automatically add the OpenMP compiler flags. Add the following to the top of your C++ program.


#include 
// [[Rcpp::plugins(openmp)]]

We also have to set some compiler flags in R code. Add the following to your R code.


Sys.setenv("PKG_CXXFLAGS" = "-fopenmp")
Sys.setenv("PKG_LIBS" = "-fopenmp")

The example we are going to use is element-wised sum square of a vector. It is a special case of the inner product of two same vectors. In serial way, we can realize this in Rcpp like so:


// [[Rcpp::export]]
double sumsq_serial(vec x)
{
  double sum = 0.0;
  for (int i=0; i



The function  sq()  is simply squaring a number as

double sq(double x){
  return(x*x);
}


I do not put  // [[Rcpp::export]]  before this function because I will never call it from R. Calling such a function will slow down the performance of the code, comparing with  sum += x(i)*x(i); . I include a function here to ascertain that each thread can correctly recognizes the user defined function when doing parallel computing later.

And in parallel using OpenMP, we just add the basic compiler pragma:

// [[Rcpp::export]]
double sumsq_parallel(vec x, int ncores)
{
  double sum = 0.0;
  omp_set_num_threads(ncores);
  #pragma omp parallel for shared(x) reduction(+:sum)
  for (int i=0; i


Basically, we identify the loop in which we want to parallelize and add the following lines above it

omp_set_num_threads(int)
#pragma omp parallel for


The first line selects the number of cores to use.  This should not exceed the number of cores in your system.  If we do not write this line, all the cores available on your computer will be used. The only real trick here in this example is making use of the  shared()  and  reduction()  keyword, and it pretty much does what it looks like. We’re telling the compiler that while each iteration of the loop should be run in parallel, they share the same information from vector  x , and in the end we want the private (by thread) copies of the variable  sum  to be added up.  The code  cout <<  i <<  ", "omp_get_thread_num() <<  " of " <<  omp_get_num_threads() <<  endl;  is C++ standard way to do  print() . This code just tells which thread is currently working on the i th iteration, and the total number of thread are working together.

A small experiment is used to compare different methods proposed. A small set of micro-benchmarks in a variety of methods are conducted.

library(rbenchmark)
bench = benchmark(
		  vecA%*%vecA,
          inner(vecA,vecA),
          sumsq_parallel(vecA,1),
          sumsq_parallel(vecA,2),
          sumsq_parallel(vecA,3),
          sumsq_parallel(vecA,4), order = NULL, replications=1000)

print(bench[,1:4])

# test replications elapsed relative
# 1           vecA %*% vecA         1000   0.721    4.682
# 2       inner(vecA, vecA)         1000   0.229    1.487
# 3 sumsq_parallel(vecA, 1)         1000   0.319    2.071
# 4 sumsq_parallel(vecA, 2)         1000   0.203    1.318
# 5 sumsq_parallel(vecA, 3)         1000   0.154    1.000
# 6 sumsq_parallel(vecA, 4)         1000   0.157    1.019


We can see that C++ functions are much faster than R function  %*% . As more cores were used in parallel computing, the execution time becomes shorter. Because I only had 4 cores on my desktop, the maximal number of core I tested was 4. It seems requiring all the resources on your computer to do the work may slow down the overall performance.


##MCMC for GLMM


This tutorial session has only touched on a small part of Rcpp, giving you the basic tools to rewrite poorly performing R code in C++. With the basic knowledge of Rcpp, it will be much easier for you to understand the example of MCMC for generalized linear mixed model using Adaptive rejection Metropolis sampling (ARMS) I provide here . Three methods are compared. The first method is pure R code, using the  arms  function in R  Hi  package. The second method is writing log density functions in C++, but sampling process is still coded in R with modified  arms  function.  The third method is coding both log density functions and sampling process in Rcpp/C++. The pure R example takes 36.14 mins. The hybrid method takes 9.80 mins and the C++ code used only 1.17 mins.

The log density function coded in R:

logden.beta < - function(x, myu) {
  myu.long < - as.vector(mapply(rep, myu, nobs))
  temp < - as.vector(design.matrix %*% as.matrix(x, ncol=1)) + myu.long
  return(sum(y*temp - log(1+exp(temp))))
}

logden.u < - function(x, mybeta, myy, mycovariate, mytau) {
  temp <- mycovariate %*% as.matrix(mybeta, ncol=1) + x
  return(sum(myy * temp - log(1+exp(temp))) - 0.5*x^2*mytau)
}


The log density function coded in C++:


// [[Rcpp::export]]
double logden_beta(vec x, vec myu_long, mat design_matrix, vec y){
  vec temp = design_matrix*x + myu_long;
  vec one; one.ones(size(temp));
  vec logden = y%temp - log(one + exp(temp));
  return(sum(logden));
}

// [[Rcpp::export]]
double logden_u(double x, vec mybeta, vec myy, mat mycovariate, double mytau){
  vec x_long(size(myy)); x_long.fill(x);
  vec temp = mycovariate*mybeta + x_long;
  vec one; one.ones(size(temp));
  vec logden = myy%temp - log(one + exp(temp));
  return(sum(logden)-0.5*pow(x,2)*mytau);
}



##Common Pitfalls



  The number one pitfall I have made while working with C++ and R is forgetting that while R starts vector and matrix indexes from 1, C++ (along with pretty much every other programming language), starts indexes from zero, so you need to plan accordingly.
  In C++, we need to define the variable before use it while R does not require that.
  Don’t forget to write the semicolon after each line of the code in C++.



Resources


For additional information and advanced use, following resources provide a helpful introduction:


  Rcpp: Seamless R and C++ Integration
  Rcpp website
  Rcpp book
  Rcpp Quick Reference Guide
  High performance functions with Rcpp
  Introduction to Rcpp Attributes
  Gallery of examples
  Armadillo C++ linear algebra library
  Guide into OpenMP
  Introduction to MPI and OpenMP



PART - Parallelizing MCMC with Random Partition Trees
Kan Li — 2016-10-06T00:00:00+00:00
Background: 
In particular, conventional MCMC algorithms are computationally very expensive for large data sets. A promising approach to solve this problem is embarrassingly parallel MCMC (EP-MCMC), which first partitions the data into multiple subsets and runs independent sampling algorithms on each subset. The subset posterior draws are then aggregated via some combining rules to obtain the final approximation. This work is testing the method of parallelizing MCMC with Random Partition Trees (PART) proposed by Xiangyu Wang and David Dunson ( [paper] , [code] ) in a simulation setting of a multi-level item response theory (MLIRT) model .

Methods:
For the simulation study, I generate 100 datasets. Each dataset contain 1200 subjects with 7200 observations.  There are 23 parameters need to be estimated. The dataset is randomly split into 6 subsets. For each subset chains, I run the MLIRT model [code] for 1000 iterations after 3000 burn-in. Thinning by 1 results in 1000 samples.  The posteriors samples of each subset chain are aggregated by PART-KD. The results of PART are obtained with minimal cut length as 0.01, minimal fraction block as 0.1 and 10 trees. The model inference using 1200 subjects takes around 27 hours to finish, while each sub-chain (using 200 subjects) takes 2 to 3 hours to finish, and could be run in parallel. I repeat the process on all 100 datasets and obtain 100 copies of aggregated posterior each has 10000 samples.

Results: 
The simulation results is summarized in the blow table. Majority of the parameters have a small bias and the coverage probability is around 0.95. 
 


The posteriors are plot in figures. The grey lines are the density of 100 copies of aggregated posterior. The solid wide line is density of combine the 100 copies together. The vertical line is the true value of the parameter.



	 
	   	
	    
	    
	    	    
	    

	





In conclusion, the PART method could dramatically decreases the computation time for Bayesian inference of large data set and provides relative good estimations (after aggregating sub-chains) in a moderate complex model.






Classification of Alzheimer’s Disease patient base on gene expression data
Kan Li — 2016-08-10T00:00:00+00:00
The work investigates the sparse classification methods for high dimensional data. Here I provide a toy example to classify patient with Alzheimer’s Disease (AD) using gene expression data (select 500 gene expression). Two methods were adopted to carry out classification: sparse logistic regression and sparse proximal support vector machine (SVM). I assessed how well the methods can discriminate between patients based on classification accuracy, sensitivity and specificity. I conducted a 5-fold cross validation to avoid overestimation of the performance. All gene expression measures were scaled to zero mean and unit variance. The gglasso package is used. The optimal value of penalty parameter lambda was selected from a series of lambdas which gave the minimum cross validation error on training data set. The simulation results were presented in below tables.

 
 
 


The 500 gene expression gave a relative poor discriminative capability. The mean classification accuracy was around 0.62 via both methods. The sensitivity was surprising low, about 0.2, which indicated a poor capability of identifying AD patients. The high specificity and low sensitivity indicated that the model was inclined to classify patient as cognitive normal. The frequency of cognitive normal patient was 273/447 = 0.611, which was close to the mean accuracies we got from the models. The classification model can be viewed as simple guessing.

I then conducted functional principal component (FPC) analysis on the scaled 500 gene expression data. I selected the first 2 FPC scores which explained 95% of variance in the data. Hotelling’s t-test was conducted between case and control group on FPC scores. The test statistic is 1.49 with p-value 0.226, which indicates that there was on significant different in the gene expression sequence between cognitive normal patients and AD patients. This somehow explains the poor performance of the models in the first analysis.

In conclusion, using all 500 gene expression may not help to identify AD patient. Fitting the model with selected significant genes may improve the discriminative capability. The R code for this toy example is provided.




Beta Rectangular (BR) Regression
Kan Li — 2016-04-02T00:00:00+00:00
Background: CAMD of Critical Path Institute developed a quantitative model to describe the progression of cognitive changes in mild to moderate to test and optimize operating characteristics of trial designs for AD (via simulations based on the model). The model and the simulation tool got approval from FDA as “Fit for Purpose”. The original ADAS-cog progression model was based on Beta regression. However, as noted by Hahn (2008), Garcia et al. (2011), and Bayes et al. (2012), the Beta distribution considers neither the tail area events nor the outlying events, fails to represent excess variability and overoccurrence of tail-area events, which could limit its applications for modeling the proportional data (data in the open unit interval (0, 1) such as ADAS-cog score after being divided by 70). An alternative to the Beta distribution is the Beta rectangular (BR) distribution, which assigns more weight to extremal tail-area events and more probability to the outliers and extremal events. Specifically, the BR distribution has a shape parameter \theta  (between 0 and 1) controlling the thickness of the tails. When, \theta = 0, the BR distribution reduces to the Beta distribution. In general, when 0<\theta <1, the BR distribution has heavier tails than its Beta distribution counterpart. Please refer to Figure 2 in Wang and Luo (2016) for a visual comparison of Beta and BR distributions.

In the analysis, I compared of the Beta regression model and Beta-rectangular regression model in describing the longitudinal progression of the ADAS-cog in AD patients. I used the testing data set (with ADNI study and 49 meta studies) provided in the tool development group web page, combining with the Study 1013 from Coalition Against Major Diseases (CAMD) database required from Critical Path Institute.  The total observations were 6339 from 1005 subjects in 51 studies.

Results: The table below presents the mean of estimated posterior of parameters directly from BUGS models. The “Original Model” refers to the original Beta regression model. The “BR” refers to the Beta-rectangular model. The BUGS code for this Beta-rectangular model is provided [code]. 


 


The table indicated that the shape parameter \theta = 0.1184 > 0, and the deviance information criterion (DIC, a Bayesian model selection tool and smaller values indicate a better model fit) in favor of the Beta-rectangular regression model over the Beta regression model. These facts suggested that BR model provided a better model fit in describing longitudinal progression of the ADAS-cog and the existence of potential outliers and heavier tails than what the Beta distribution can model. It worth to include more well processed clinical trial data to do further analysis.



IEOR PhD 101
Kan Li — 2012-05-16T00:00:00+00:00

  注： 本文首发于一亩三分田数学统计IEOR版 细说IEOR PhD


在上一篇里我对打算在毕业之后找工作的IE master同学提了一些个人建议。其中许多建议和应该思考的问题都不只是限于IE专业的学生。希望拿到AD的同学及早的开始规划这两年的学习。同时感谢大家对上一篇的支持。

这里另起一篇要谈谈IE 的Phd都是做什么的。 内容对master的同学帮助不大，不过有时间了解了解也没坏处。


  Phd


IE起源于传统制造业，具体传统上IE是做什么的请自己wiki一下。随着制造业在美国的落寞，IE也变成了一个夕阳学科，或者说很成熟了， 所以如今的IE已经很少再做制造业方面的研究（比如生产排程），培养一些master 输入业界已经够用了。

作为有志向攻读Phd的青年，有必要了解一下如今的IEOR倾向的研究方向。

Operations Research/Management science是IE的核心。 就是应用 mathematical modeling，statistics， simulation 来为某个应用领域做决策的学科。其中大家听说的manufactory, supply chain, healthcare, revenue management, disaster respond等等方向都是OR/MS 作为依托的应用领域。有些系在某应用领域项目比较多，研究的老师比较多，带头人就愿意专门分出来，成立个组什么的。 但实质差别不大，只是学的应用领域的东西不同。OR是所有IE系都有的， 只是应用领域各自都有特点和侧重。OR方向我会之后详细讲解。 类似的还有商学院 operations management/decision science 专业，很多数学课都会来IE系来学， 但他们不要求特别高的数学理论， 而侧重商业分析。

剩下的方向都是小众，你只有在不多的学校找到相应的研究方向。我对每个方向和每个学校并不是特别清楚，所以大概介绍一下。 你可以去 IIE 网站 了解更多IE的东西。

Statistics，IE 也研究统计， 主要是研究制造业中 quality control， process improvement, reliability analysis 。代表学校ASU， 做集成电路生产制造， rutgers。

Stochastic/simulation,  研究仿真理论， 理论性比较强。 Northwestern 仿真界的牛校。

Human factor, 我只知道是设计产品更符合人体工效学，使人操作更顺手，降低疲劳积累速度，减少失误，从而提高工作效率。代表学校，VT， Wisconsin-M, OSU

Information system，这个方向在IE系成气候的并不多。代表学校，Wisconsin-M 有研究healthcare information system， USC

Manufactory， 这个现在主要研究制造技术（快速成型, 机械切割什么的），纳米材料，材料加工等。 和ME有很多重叠的。 一些老牌的IE系还保留着这方面的研究，比如PSU，PITT。 所以一些做材料或是ME的可以考虑申请这些IE系，竞争小些。

还有一些更小众的研究领域，像是 Energy， 比如 iowa 有研究电力系统的， UT – Austin 有研究石油领域的。具体干什么我不清楚。还有 complex system/social network , 比如 西北。

其实现在的研究越来越多的都是跨学科研究了，各个院系研究的领域很大因素取决于它周围的资源。 比如Cornell，Columbia 就比较多的是做 金融， risk management， revenue management； Phoenix由于有很多电子制造业还是西部distribution center， 所以ASU研究电子器材制造，质量控制，供应链比较突出； U Michigan， Wisconsin， Pitt 由于都有附属的大型医学中心， 所以研究偏向healthcare。具体各系老师做什么，你只有和该系的老师学生进一步交流才能知道了。很多学校网站上老师的介绍都是百年不更新，所以不要太依赖那个信息。另一种方法是把教授名字输入google scholar 中，年限定2009年以后，看看他这几年都发了什么文章，也可判断他最近在做什么。

这里我来具体讲一下Operations Research 一些情况。如果你对OR有兴趣，你就要先知道一下INFORMS 协会, 你可以找他们的网站，从那你能了解更多关于OR的东西。

OR的研究领域

现在OR的常见的研究领域是 logistics/supply chain, 还有Health care。 当然还有别的， Phd是个把问题越研究越专的东西，各方向下面越分越细，由于我做的是healthcare方向的东西，所以其他方向具体些研究什么问题，用什么方法，我就说不了那么详细了。

Logistics 也在走下坡，因为研究的已经差不多了。supply chain 研究的主要是inventory control，distribution network， coordination等， 很多学校都有。我只说自己觉的有一个研究趋势，是把商业供应的研究成果扩展到humanitarian and non-profit 供应链。 GIT 一直是supply chain 领域的领导者，当然他是IE学科的领导者，Northwestern 也有研究小组， 还有Berkley，法国 INSEAD 商学院都有教授做。这里我们可以看到，牛校总是走在前沿，这也与他们有一定经济实力拉的来项目有关。毕竟投钱做non-profit 事情的机构还是少。 这个领域主要研究疫苗或是救灾物资的储备，选址，灾难发生后的evacuation， last mile distribution，vehicle scheduling ，routing，GIS的应用；灾后debride 的处理， NGO之间的合作。 目标是以有限的物资最快最好的服务尽可能多的目标群体，控制成本，减少浪费， 同时要兼顾公平等人道主义问题。如果你对这方面感兴趣，可以关注一下上面几个学校相关小组的页面。

Health care 不能说是新兴的，但是也是很多学校逐渐涉猎的。很多学校都在招具有这方面研究经验的教师， 原因是做医疗有钱。 IE教授拉钱是很辛苦的，一般从NSF 申请到的一个科研项目三年只有30多万。但是healthcare 那边的从CDC和各种基金会拉到的项目经费都是按百万计算的，能养活一大批人。IE教授是拉不来这样项目的，一般都是由医学院或是公卫那边的PI牵头，IE这边给他们打工，不过也能分得挺大一杯粥了。这存在一个问题就是，发paper你永远名字写不到前面，因为你只是在做一种技术支持的感觉。当然你也可以把你技术支持部分的内容放大，独立发些文章。那IE 在health care 具体做什么？ 一般和医学院合作的话 研究therapy  decision making, medical decision making. 比如研究化疗应该什么时候开始，器官移植匹配度达到怎样就选择移植，用药每次要多大计量，每一疗程出现什么症状采取什么疗法 等等， 通过dynamic模型来预测优化治疗效果。 和医院合作的话，研究优化医院系统，提高医疗资源利用，减少浪费，OR（operation room） 的排程，healthcare quality, 主要运用process improvement， simulation 的东西比较多。 和公卫合作，主要是health policy 的制定。 像pandemic 的预测， influenza dynamic modeling，预测该用哪种vaccine 类型，物资储备、资源分配、人员调度等优化。大概就这些范围，如果有兴趣可以下载一些Michigan，Pitt，W-M， NCSU相关领域教授的文章看看introduction。

OR都学些什么

上面讲的都是OR现在热门的应用领域， 那OR都具体学什么？ 其实OR就是把上面那些情景中的问题抽象成数学模型，然后再解数学模型的学科。OR对数学要求很高，不但要你熟练掌握各种类型的模型，抽象实际问题的能力，还要具备解数学模型的理论推导能力，还有优化算法的能力。编程方面的要求就是能把你的数学公式转变成机器语言，调用solver， 或是自己写算法，让计算机求解。

OR 一般要解决的都是 optimization 的问题， 根据不同的问题，采用不同的两类模型，确定性模型 和概率模型。IEOR Phd修的课和master修的课差别很大，一些学校要求学生修过 real analysis。

确定性模型一般都要学习optimization 的课，一般都有 Linear optimization,  non- linear optimization, integer programing, network optimization， dynamic optimization等。像上面提到的supply chain 领域，如不牵扯需求预测， 多用确定性模型。

概率模型一般都要学  Probability theory，Statistics，Stochastic process, queuing theory, Markov decision process 等。像解决healthcare领域的问题，多会用到概率模型。

一般每一系列中前面几门都是必修的，靠后的课程是你根据你要研究的方向才选是多修确定性模型还是概率模型的课。当然每一系列还有一些更高级的课，通常是把上面课程中的某一章抽出来再往细将，一般比较理论。

另外一般还会学到statistics 和 simulation。 因为另一种解决optimization问题的方法是simulation。 一般在要研究的系统非常复杂的情景（如医院系统，某一地区的disaster respond），无法建立数学模型的情况采用。 Simulation 和 mathematical modeling 是很不一样的方法。 Simulation 其实是一种枚举的方法，没有最优解，只能说是所得结果中最好的。 常常要对上千次的仿真结果做统计分析，确定那个解。

作为OR的Phd 一般都需要你掌握的软件有，
CPLEX  (这个是解确定性模型的软件，不是开源的，你可以先学习另外一种软件和语言GLPK/GMPL， 这个是开源的，先了解一下是怎么回事)
MATLAB (很多时候用他来编算法，解问题，相比快捷)
C++ 得拿得出手啊， 很多数学运算软件都有C++API
Excel/VBA( excel 要用好)
Arena(商业仿真软件，实际研究中不太会用到，但是一般是IE  under的课，学会了可以找找有没TA的机会)

IEOR Phd 怎么申请

Phd 的申请不是很乐观，如果不是清华，上交，复旦，同济，西安交大，北大学IE，数学，物流，学习成绩处于上游的同学我的建议还是谨慎申请排名前15的phd.  因为一般ie系每年招生只在5-10个之间，除了GIT这种称谓IE school的，好的年份最多都不会超过15个。我们就按每个系50%招的是中国学生，名额也不够呀。你可以上学校网站上找在读phd，有些人会有主页或cv， 看看这些人的背景，和你自己比较比较。

另外一种方法就是曲线救国，先申master， 找教授参与项目，不要怕吃亏，没钱也白干，而且还要干的很积极。一般教授也不是瞎子，看你这么用功，他要有余钱或是ta机会也一般会帮你的。还有要向教授表明自己想读phd， 修phd的课，一方面看你是不是真喜欢这领域，另一方面验证你学phd的课不会吃力。如果你这两点做好了，考qualify 或是 重申phd 都是顺水推舟的事，考过qualify 一般都会给你钱的。一般qualify都是第一学年结束的夏天。当然还要注意，用这种方式的人越来越多，master 都海招了，大家都这样做你就不一定就能凸显出来。

IEOR Phd 干什么工作?

具体Phd 干什么工作由于我还没经历过找工作， 所以也谈不周详。
不过有这样几种选择， 去业界的话
航空公司做航班调度排程.
UPS、Fedex这样的物流公司做operations research analysis.
Consultant。 还是推荐 ZS
医疗机构，如果你研究的是healthcare 相关的

找教职的话， 一般排名10以前的去排名20-30的找教职，排名20以前的去排名30-40的找教职。当然你也不一定非要选择IE系教职。也可以考虑先去做几年post-do ， 或是去哪个研究机构历练两年，heathcare的比如Merck Research Laboratories 或者CDC。然后根据情况看你再回不回学校教书。

IEOR Phd 相对master 就要好找工作了，一般科研结构起薪平均在7万。 我也不知道问题究竟出在哪了，可能中国学生多半表现自己的欲望和技巧要差一些，找工作一般感觉要比其他国家的Phd逊色。一般OR Phd 的大部分是土耳其，伊朗，中国学生。Phd其实更注重口语的，所以练口语也是很重要的，建议和给master的一样。



IEOR 101
Kan Li — 2012-05-13T00:00:00+00:00

  注： 本文首发于一亩三分田数学统计IEOR版 细说IEOR


偶然从Warald 的blog看到这个论坛和这个版块， 发现关于IEOR的帖子很少，多半都是提问贴，而且回答也多少有失准确性。所以想着可以结合自己的经历和所见所闻给大家讲讲IEOR的东西，希望对后来的申请者和学弟学妹们有所帮助。我所发现的这个问题其实是所有留学论坛的通病，参与发帖与互动的基本都是还未踏上美洲大陆的申请者，然而真正的过来人，如果不是当年的版主或是什么，基本很少会再回来。所以，如果一亩三分田没有相关的什么规定，我倒是不介意你转载到其他论坛上去让更多的人有所收获。能标上出处也好。

由于一路工科走过来，文笔上大家就不要强求了。 而且我只能抽空写写，想到哪里写哪里，留言提问可能也没时间回答，所以抱歉抱歉。当然，我也不能说我在这学科有多资深，除了我的领域，很多地方也了解的不全面或是压根不了解（比如human factor 方向, 印象中有这个方向的学校不多，比如VT， Wisconsin- Madison），也可能含个人见解和局限性，如果你有不同意见，也很乐意交流。

由于Master 和 Phd 基本是两条不同的道路，所以我也分开来侧重介绍。


  Master


Master 主要的培养方向是在业界找工作，录取条件相对较低，所以很多申请Master的一般都能有不错的AD。 我在申请方面的建议就是，如果你觉得你条件还行，GPA，GRE够高，手头有钱，大胆！胆大！

对于master来说， 如果你不是IE学院下特殊项目， 比如finance engineering (Columbia, Cornell) ，并不明确的分什么方向， 大家都修一样的必修课，一般是基础的probability，statistics，manufactory system，operations research然后就是根据你想做的工作选择其他一些选修课，比如去做supply chain (facility layout，inventory control，supply chain)， 去manufactory做质量管理(total quality management，6 sigma)，去医院做health care方面 (个别学校有这个条件) health care management， health care quality，  还有一些通识类比如，decision science，simulation，project management， data base 等等， 都是很基础的。

由于IE是个小系，很多排名10名以后的IE系根本就没能力开那么多master的课让你选，所以你基本不得不把系里开的master的课都得上过一边，别无选择。当然很多学校允许你到外系去修一两门课， 这样我建议可以去商学院选选，还有就是到计算机或是信息科学系学学data mining， database 之类的。一般修够30个学分（10门课）就能毕业。

IE 的master相对是比较好读的，如果你家底殷实，主要目的是体验美国生活，顺便捞个文凭，那IE是个好选择，去名校的概率也大。如果你觉得五六十万对你家来说还是挺吃力的，之后想在美国找个工作回报父母呀，我建议你还是不要考虑IE了，因为IE真心不好找工作。 IE看着是个万精油的专业，可万精油也意味着啥都不如人家专业的。我知道有些人会想我工科背景，学些management的东西都好呀，可等你找工作的时候，你会感觉到这两年怎么啥都没学到，management（还有商学院的finance和marketing）对于中国人，特别是没有工作经验的这群学生来说真的就是鬼扯呢。你会发现你竞争不过美国硕士，美国本科生，印度同学（基本都是学EE，ME的），还有国内读过IE本科的同胞。 当然有些人是找到工作了，一种原因是他有比较多cs背景，每天做的编程的工作；一种原因是他有一些名企的相关实习或几年的工作经历；一种原因他是“人家家孩子”，他做什么就是比你强。

好，如果你觉得自己是一个别无长处的IE master学生, 这里我有些建议告诉你该怎么做（其他专业也可借鉴）。如果你拿到了某个学校的master并且准备去了，就去网页上找系里面中国学长学姐的联系方式（最好是PhD）。 我提的这个建议可能会被这些学长学姐打的，因为就我本身来说也是不喜欢有人打扰的。所以你发一次没回就不要再发了。不回的意思就是我没工夫搭理你，我啥时候心情好了再考虑回不回你。第一封用英文，因为学校邮箱很多显示不了中文。信里要有礼貌，别一上来就让人感觉你有所求。信里可以了介绍一下你，问其他可以用中文的联系方式。之后的联系中问些生活方面的信息，然后了解一下哪些课需要TA，grader， 哪些课的TA经常大家不愿做； 至于哪个老板做什么，可能不可能要学生，我觉得这种问题就不要问了，因为对你来说要RA渠道获得钱的可能性几乎没有。  知道哪些课需要TA或是grader，先看看自己有没修过这些课，修过太好，没修过赶紧看看。

当你踏上美洲大陆的那一刻，你就要试着让自己变成一个外向的人。对人热情，有问题就问，放开，不要扭扭捏捏，没啥不好意思的。 及早的去学院报到，找小秘聊，聊学校提供得能为我所用得各种资源（计算机，图书馆，就业指导中心，健身房乱七八糟的）。找系主任聊、透漏你想做grader，TA的意思，聊你想找工作的打算。找管研究生教学的教授聊，聊选课，透漏你要TA，grader的意思。要不要的上不重要，重要的是让他们知道有你这么个人认存在。而且我一直觉得在国外机会都是自己争取的，中国学生很多都怕这个怕那个，不知道在怕什么，这一点就输给印度学生了。脸皮要厚，事情成不成一回事，咱先问一下也损失不了个咱啥。找工作也一样。

如果你的志向是继续Phd，你的选课和所要做的事情是不一样的， 这个我之后会说。

读master应该做什么？

对于致力于毕业找工作的同学，你在美国学好课的同时，最重要的是交际。 这里的交际不是你跟中国狐朋狗友开party！ 而是广泛的认识人，美国人，其他国家的人，练口语，练到你能清楚地表达你的想法让别人听懂，练到你做presentation自如不怯场，练到你听得明白别人的问题，并能很快反应出来用英语回答。至于口语音调美不美都是次要的。

这里最直接的资源就是教会，我不是个教徒，也没怎么去过，不过我认识的人都从这段经历中收获很多。有的美国教会会举办很多活动邀请国际学生参加，意在让你感受上帝的仁爱。每周会组织学习学习圣经啊，讨论讨论其他文化还有新闻问题呀，做些游戏什么的。节日里便会邀请你去hosting family一起过节。在这个过程中你就练了口语，了解了美国文化，至于你信不信教一般是没人强迫你的。如果你排斥这种方法，那也只有多和本班的美国同学交朋友了，做课程项目的时候多和他们一组，虽然和美国人一组效率很低，但是他们有他们的资源。中国学生许多因为觉得自己口语不行，文化差异大而选择中国人一活做课程项目，这是毫无意义的。要想成功，就要更多的暴露自己的弱点。

除了好的GPA 和 口语能力外， 对行业的了解和面试技巧都是要及早准备的。一般IE master 项目是1年半到两年，由于只是休学分，有些也可以暑假修课，一年毕业，不过我不推荐。由于短学制，所以你一来的那个fall就应该开始准备材料CV之类去找实习，去学校的就业指导中心改CV， cover letter， 参加模拟面试之类的。 在秋季的校园招聘中去了解来的都是哪些企业，当地企业，他们要什么样得人，有没有实习机会。 另一个重要资源是  学校 的 co-op  program（co-op 是什么自己查），这个和你专业直接对口，干的好了以后可能直接留到那公司。coop一般是面向本科生的，不过还是我那句话，机会都是要出来的，问啊！问啊！长嘴光出气那！美国没有说什么规定是死的。如果你能在来年春季或是夏季做个实习的话，之后找工作就会顺很多。所以你要想找工作的话第一年就按我上面说的做。

其实我不提倡找工作的再去操心哪个小组招不招做研究的。这些只是个人想法。因为IE的科研研究经历和项目对你找工作基本没价值，另外你第一年能找到给你钱的老板的机会不大，因为项目funding的支出和分配，哪一部分干什么干什么早就是定好的。但是这也不妨碍多去了解各个老板都是做什么的。哪个我比较感兴趣就去聊聊，万一找不到实习什么的还可能有条路，就是看他暑假要不要人，打个工。或者他带什么课要不要TA之类的。所以和偏应用研究的老师处好关系也是有益无害的。

IE Master 找什么样的工作？

IE是个万精油，所以你如果觉的你哪个方面有所长，能和人家专业的拼拼，那你可以去试一下。 有人说我学OR找什么样的工作，答案是没有你能做的工作。这里不是说学OR找不到工作，而是你要具体到什么应用领域。真正设置专门做OR的单位很少，而且可以说要的都是Phd，招人的一般是国防部门（你没戏），带研究中心的大型医疗机构，大咨询公司。至于master，Consultant喜欢要IE的，但是进去很难，他们一般喜欢有美国本科学历的，而且这是个跟人打交道的行业，所以我上面交代的准备工作没做好的话，你也不要对他报什么希望，参考公司ZS associate。 传统的制造业里，如果你本科不是IE或是没相关实习工作经验的比较难。做supply chain 的一般是商学院的会和你竞争，比没优势，你可以选择偏向做modeling的职位，比如Fedex 有operations research analysis I 这样的职位，做供应链的建模仿真，所以这就要求你学好线性规划，network，库存管理，simulation还有相关解线性规划问题的软件和仿真软件，对计算机编程要求高。另外一行能找到工作的是做质量管理，这就要凸现你统计的优势，上好TQM，6sigma，考个green belt，熟练应用其中process improvement的方法和模型，用好minitab。找到这种职位工作的几率比较大。 如果你对这些没兴趣，其他我真心觉的你应该去学计算机。

其实我觉的在美国找工作，不分专业，第一就是口语，第二就是和系主任拉好关系，借别人的关系网，自己去多问多争取，机会不会自己来敲门。