When multiple cores share data, they must synchronize access to it, as it might other wise become corrupt. Although this is also a problem on single core systems with multiple processes and/or threads, it is more prominent on multicore systems, as the cores operate truly concurrently.
Synchronization adds processing overhead, which is why systems try to keep synchronization to a minimum. Most implementations are optimized for the average case, ensuring that, on average, the overhead of accessing shared data is kept minimal. However, for real-time system, and especially for hard real-time systems, the worst-case execution time is more important than the average case. This thesis investigates synchronization on multicore real-time systems. In particular, the thesis explores on-chip hardware solutions that ensure that the synchronization overhead is minimal, especially in the worst-case. This potentially allows a multicore real-time system to support more tasks, improving utilization. Additionally, the solutions ensure that every operation is bounded, thereby preventing any task from never being serviced, i.e., the solutions are starvation free. As a ﬁrst eﬀort, a hardware locking unit for a Java processor is presented that reduces the synchronization overhead compared to the existing locking solutions on that processor. This solution is then used to compare a preferred locking protocol for real-time systems with simple, non-preemptive critical section. Improving on this solution, the Hardlock is presented as a general real-time hardware locking solution. The Hardlock is not processor speciﬁc, and the overhead of acquiring and releasing locks is minimized. Finally, a real-time transactional memory unit is presented as an alternative to locks. The presented unit is built upon an existing Java unit but improves the existing unit by not being Java speciﬁc and allowing increased concurrency. The unit is compared with locks built upon compare-and-swap, a typical atomic primitive. All solutions are tested using synthetic benchmarks that try to maximize the congestion. Additionally, the Java solution is tested using a 3D printer use case. Although the Java solution performs better than the existing solutions, the Hardlock has the best lock performance, with a worst-case uncontended lock acquisition of 2 clock cycles, and clock cycle lock release. Overall, using fast locks for short, non-preemptive critical sections yields the best performance, whereas for long critical sections the performance of lock acquisition and release is insigniﬁcant. Finally, the transactional memory unit is shown to perform comparably with the compare-and-swap locks under high congestion, whereas at lower congestion the transactional memory performs signiﬁcantly better.
|Series||DTU Compute PHD-2019|