This paper presents a systematic method for optimal lumping of a large number of components in order to minimize the loss of information. In principle, a rigorous composition-based model is preferable to describe a system accurately. However, computational intensity and numerical issues restrict such applications in process modeling, simulation and design. A pseudo-component approach that lumps a large number of components in a system into a much smaller number of hypothetical groups reduces the dimensionality at the cost of losing information. Moreover, empirical and heuristic approaches are commonly used to determine the lumping scheme. Given an objective function defined with a linear weighting rule, an optimal lumping problem is formulated as a mixed integer nonlinear programming (MINLP) problem both in discrete and in continuous settings. A reformulation of the original problem is also presented, which significantly reduces the number of independent variables. The application to a system with 144 components demonstrates that the optimal lumping problem can be efficiently solved with a stochastic optimization method, Tabu Search (TS) algorithm. The case study also reveals that the discrete formulation is preferable due to the reduced search space compared to a continuous model formulation.