The perceptual organization of two-tone sequences into auditory streams was investigated using a modeling framework consisting of an auditory pre-processing front end [Dau et al., J. Acoust. Soc. Am. 102, 2892–2905 (1997)] combined with a temporal coherence-analysis back end [Elhilali et al., Neuron 61, 317–329 (2009)]. Two experimental paradigms were considered: (i) Stream segregation as a function of tone repetition time (TRT) and frequency separation (Df) and (ii) grouping of distant spectral components based on onset/offset synchrony. The simulated and experimental results of the present study supported the hypothesis that forward masking enhances the ability to perceptually segregate spectrally close tone sequences. Furthermore, the modeling suggested that effects of neural adaptation and processing though modulation-frequency selective filters may enhance the sensitivity to onset asynchrony of spectral components, facilitating the listeners’ ability to segregate temporally overlapping sounds into separate auditory objects. Overall, the modeling framework may be useful to study the contributions of bottom-up auditory features on “primitive” grouping, also in more complex acoustic scenarios than those considered here.