A factorial experiment is essentially an RCT with a lot of experimental conditions, and therefore is extremely difficult to power.

The RCT and the factorial experiment have very different logical underpinnings.

In an RCT the primary objective is direct comparison of a small number of experimental conditions, whereas in a factorial experiment, the primary objective is estimation of main effects and interactions. These estimates are obtained by combining experimental conditions in a principled way by means of factorial analysis of variance (ANOVA). In fact, individual experimental conditions of a factorial experiment are NEVER directly compared in a factorial ANOVA (which can be very counterintuitive for those trained in RCTs).

This difference in the underlying logic extends to how RCTs and factorial experiments are powered. An RCT with a small number of subjects per experimental condition is unlikely to have sufficient statistical power. In contrast, a factorial experiment with a small number of subjects per condition may have excellent statistical power. Why? Power for estimation of main effects and interactions in factorial ANOVA is based on comparison of combinations of experimental conditions, not direct comparison of individual conditions. So the number of subjects in each individual condition does not matter; what matters for power is the total sample size per level of a factor, across all experimental conditions.

Read an informal introduction to factorial experiments aimed at those with a background in the RCT. For more, read Collins, Dziak, and Li (2009) and Chapters 3 and 4 in Collins (2018).

Factorial experimental designs require larger numbers of subjects than available alternative designs.

When used to address suitable research questions, balanced factorial experimental designs often require many fewer subjects than alternative designs. For a brief explanation, see Collins et al. (2014); for a more extensive explanation, see Collins, Dziak, and Li (2009) and Collins (2018).

For example, Collins et al. (2011) wanted to use a factorial experiment to examine six components under consideration for inclusion in a clinic-based smoking cessation intervention. They found that whereas conducting individual experiments on each of the components would have required over 3,000 subjects, with a factorial design they would have sufficient power with about 500 subjects. In other words, conducting a factorial experiment rather than six individual experiments meant that they needed about 2,500 fewer subjects.

If you want to add a factor to a balanced factorial experiment, you will have to increase the number of subjects dramatically to maintain power.

If the factor to be added has an expected effect size no smaller than that of the factor with the smallest effect size that is already in the experiment, power will be about the same without any increase in the number of subjects.

If the factor to be added has a smaller anticipated effect size than those upon which the power analyses was previously based, it will be necessary to increase the sample size accordingly to maintain power. However, unless the anticipated effect size of the new effect is considerably smaller, the required increase will be modest. For more about this, see Collins et al. (2014); Collins, Dziak, and Li (2009), and Collins (2018).

The power of a factorial experiment depends on the overall sample size per level of each factor, not the number of experimental conditions or the number of subjects in each condition (except to the extent that these impact overall per-level sample size). Scientists whose backgrounds are primarily in designs like the RCT often find this counterintuitive.

Read an informal introduction to factorial experiments aimed at those with a background in the RCT.

The only reason to conduct a factorial experiment is to test for interactions between factors.

Even if it were somehow known with certainty that there were no interactions between factors, a factorial experiment might still be attractive if it required fewer research subjects than the alternatives being considered.

In fact, in some ways not expecting any interactions is an ideal scenario for the use of factorial designs, because it provides a great justification for the use of extremely efficient fractional factorial designs. (A brief introduction to fractional factorial designs can be found in Collins, Dziak, & Li, 2009; and Chapter 5 of Collins, 2018.)

There is always less statistical power for interactions than for main effects in a factorial ANOVA. Power decreases as the order of the interaction increases.

When effect coding is used in a 2k design, statistical power is the same for all regression coefficients of the same size (assuming equal ns across experimental conditions, approximately equal otherwise), whether they correspond to main effects or interactions, and irrespective of the order of the interaction.

Of course the effect sizes for interactions may be smaller than those for the main effects in a given study, and the effect sizes for higher-order interactions may be smaller than those for lower-order interactions. (This is consistent with the sparsity, or Pareto, principle in engineering.) If that is the case, then the power will be lower for the smaller effect sizes. But the lower power is due to the smaller effect size, not to anything inherent about interactions or the use of a factorial design.

Note that the regression coefficient is not the only way to express the effect size of an interaction. This is explained in Chapter 4 of Collins (2018).

Any interaction between factors necessarily makes interpretation of main effects impossible.

Whereas it is always important to consider interactions thoughtfully when interpreting main effects, when effect coding is used in a balanced factorial experiment, the main effects are interpretable whether or not there are interactions.

We recommend use of effect (-1,1) coding for component selection experiments in MOST. When effect coding is used and there are equal ns per condition, main effects and interactions are uncorrelated. This makes main effects more readily interpretable.

When dummy (0,1) coding is used many of the effects being tested are highly correlated. This can lead to interpretational difficulties.

Dummy coding and effect coding produce estimates of different effects, and thus the ANOVA results must be interpreted differently. For a detailed explanation, please see the Kugler et al. (2018).


Collins, L.M. (2018). Optimization of behavioral, biobehavioral, and biomedical interventions: The multiphase optimization strategy (MOST). New York: Springer.

Collins, L. M., Baker, T. B., Mermelstein, R. J., Piper, M. E., Jorenby, D. E., Smith, S. S., Schlam, T. R., Cook, J. W., & Fiore, M. C. (2011). The multiphase optimization strategy for engineering effective tobacco use interventions. Annals of Behavioral Medicine, 41, 208-226.

Collins, L. M., Dziak, J. D., Kugler, K. C., & Trail, J. B. (2014). Factorial experiments: Efficient tools for evaluation of intervention components. American Journal of Preventive Medicine, 47, 498-504.

Collins, L. M., Dziak, J. J., & Li, R. (2009). Design of experiments with multiple independent variables: A resource management perspective on complete and reduced factorial designs. Psychological Methods, 14, 202-224.

Kugler, K.C., Dziak, J.J., & Trail, J. (2018). Coding and interpretation of effects in analysis of data from a factorial experiment. In Collins, L. M., & Kugler, K. C. (Eds.), Optimization of behavioral, biobehavioral, and biomedical interventions: Advanced topics. New York: Springer.