B. Johansson, S. Jain, J. Montoya-Torres, J. Hugan, and E. Yücesan, eds. STATISTICAL ANALYSIS OF SIMULATION OUTPUT DATA:

THE PRACTICAL STATE OF THE ART

Averill M. Law

Averill M. Law & Associates

4729 East Sunrise Drive, #462

Tucson, AZ 85718, USA

ABSTRACT

One of the most important but neglected aspects of a simulation study is the proper design and analysis of simulation experiments. In this tutorial we give a state-of-the-art presentation of what the practitioner really needs to know to be successful. We will discuss how to choose the simulation run length, the warmup-period duration (if any), and the required number of model replications (each using different random numbers). The talk concludes with a discussion of three critical pitfalls in simulation output-data analysis. 1 INTRODUCTION

of fact, a very common mode of operation is to make a single simulation run of somewhat arbitrary length

samples from probability distributions are typically used to drive a simulation model through time, these estimates are just particular realizations of random variables that may have large variances. As a result, these estimates could, in a particular simulation run, differ greatly from the corresponding true characteristics for the model. The net effect is, of course, that there could be a significant probability of making erroneous inferences about the system under study.

We now describe more precisely the random nature of simulation output. Let Y1 ,Y2

, be an output

stochastic process [see, for example, section 4.3 in Law (2007)] from a single simulation run. For example, Yi might be the delay in queue for the ith job to arrive at a single-server queueing system. Alternatively, Yi might be the total cost of operating an inventory system in the ith month. The ' Y s i

are random

variables that will not, in general, be independent or identically distributed (IID). Thus, many of the formulas from classical statistics (see Section 2) will not be directly applicable to the analysis of simulation output data.

Example 1. For the queueing system mentioned above, the delays in queue will not be independent, since a large delay for one customer waiting in queue will tend to be followed by a large delay for the next customer waiting in queue. Suppose that the simulation is started at time zero with no customers in the system, as is usually the case. Then the delays in queue at the beginning of the simulation will tend to be smaller than later delays and, thus, the delays are not identically distributed. Let

11 12 1

, ,..., m

y y y be a realization of the random variables Y Y Ym

, , , 1 2 resulting from running the

simulation with a particular set of random numbers

11 12

u u, ,... . If we run the simulation with a different

978-1-4244-9864-2/10/$26.00 ©2010 IEEE 65Law

set of random numbers

21 22

u u, ,... , then we will obtain a different realization

21 22 2

, ,..., m

y y y of the random variables Y Y Ym

, , , 1 2 . (The two realizations are not the same since the different random numbers used in the two runs produce different samples from the input probability distributions.) In general, suppose that we make n independent replications (runs) of the simulation (i.e., different random numbers are used for each replication, each replication uses the same initial conditions, and the statistical counters for the simulation are reset at the beginning of each replication) each of length m, resulting in the observations:...