To overcome variability, estimate scene characteristics, and compress sensory input, perceptual systems pool data into statistical summaries. Despite growing evidence for statistical representations in perception, the underlying mechanisms remain poorly understood. One example of such representations occurs in auditory scenes, where background texture appears to be represented with time-averaged sound statistics. We probed the averaging mechanism using â€œtexture stepsâ€�â€”textures containing subtle shifts in stimulus statistics. Although generally imperceptible, steps occurring in the previous several seconds biased texture judgments, indicative of a multi-second averaging window. Listeners seemed unable to willfully extend or restrict this window but showed signatures of longer integration times for temporally variable textures. In all cases the measured timescales were substantially longer than previously reported integration times in the auditory system. Integration also showed signs of being restricted to sound elements attributed to a common source. The results suggest an integration process that depends on stimulus characteristics, integrating over longer extents when it benefits statistical estimation of variable signals and selectively integrating stimulus components likely to have a common cause in the world. Our methodology could be naturally extended to examine statistical representations of other types of sensory signals. Sound texture perception is thought to be mediated by time-averaged sound statistics. McWalter and McDermott use texture â€œstepsâ€� to reveal an obligatory multi-second averaging process whose extent depends on texture variability. Averaging excludes other concurrent sounds, implicating texture perception as inseparable from auditory scene analysis.