We propose, numerically analyze and experimentally demonstrate a low-complexity, modulation-order independent, non-data-aided (NDA), feed-forward carrier phase recovery (CPR) algorithm. The proposed algorithm enables synchronous decoding of arbitrary square-quadrature amplitude modulation (QAM) constellations and it is suitable for a realistic hardware implementation based on block-wise parallel processing. The proposed method is based on principal component analysis (PCA) and it outperforms the well-known and widely used blind phase search (BPS) algorithm at low signal-to-noise ratio (SNR) values, showing much lower cycle slip rate (CSR) both numerically and experimentally. For operation at higher SNR values, a hybrid two-stage implementation combining the proposed method and BPS is also proposed and their performance are investigated benchmarking them against the two-stage BPS (2S-BPS). The complexity of the proposed simple and hybrid methods are evaluated against 2S-BPS and computational complexity savings of 92% and 40% are expected for the simple and hybrid methods, respectively.