Power and performance optimization of multicore systems:

The research work in this area aims to analyze, develop and implement power and performance aware scheduling algorithms for embedded multiprocessor systems constrained by various parameters such as quality, hardware resources, temperature etc. Utilizing typical characteristics of an application is also an integral part of our scheduling strategy. With the continuous growth of homogeneous and heterogeneous MPSoCs, application mapping has become an important design problem which, if solved in conjunction with power reduction problem, could accelerate the efficiency of power reduction algorithm. In this work, we try to solve scheduling and mapping problems collectively.

We have developed an adaptive low-power scheduling framework which dynamically reduces the system power in presence of variable workload displayed by many modern data/control intensive applications such as DSP, automotive, scientific etc. We have tested this framework on various applications including MPEG decoder.

Relevant publications:

  • Parth Malani, Prakash Mukre, Qinru Qiu and Qing Wu, "Adaptive scheduling and voltage scaling for multiprocessor real-time applications with non-deterministic workload", in proceedings of IEEE/ACM Design Automation and Test in Europe (DATE) conference and exhibition, Munich, Germany, March 2008 . [pdf]
  • Parth Malani, Prakash Mukre and Qinru Qiu, “Power optimization for conditional task graphs in DVS enabled multiprocessor systems", in proceedings of IEEE/IFIP International Conference on Very Large Scale Integration (VLSI-SoC), Atlanta, GA, October 2007. [pdf]
  • Parth Malani, Prakash Mukre and Qinru Qiu, “Profile-based low power scheduling of Conditional Task Graph: A Communication Aware Approach”, in proceedings of IEEE International Symposium on Circuits and Systems (ISCAS), New Orleans, LA, May 2007. [pdf]

 

Video processing system design and optimization:

Video processing is the front-runner in the pool of modern multimedia applications. The dynamic workload and heavy computational requirements of this application has inspired chip designers to opt for dedicated video processing systems in many cases. Managing power and performance in such systems becomes challenging especially with stringent Quality of Service (QoS) guarantees to be met.

This research targets power and performance optimization of video processing systems. Developing the mapping technique to partition the application on the set of processing cores is also an essential part of this work. We have devised a macroblock level partitioning technique for video decoder application running on a shared memory multiprocessor system to maximize the performance by reducing bus contentions. We have also developed a statistical workload prediction model for MPEG-2 video decoder which predicts the decode time of future video frames with very high accuracy. The model is utilized to estimate the computational requirements at run time and reduce the power by adjusting the processor speed according to the predicted processing bandwidth.

Relevant publications:

  • Parth Malani, Ying Tan and Qinru Qiu, “Resource-aware high performance scheduling for embedded MPSoCs with the application of MPEG decoding”, in proceedings of IEEE International Conference on Multimedia and Expo (ICME), Beijing, China, July 2007. [pdf]
  • Ying Tan, Parth Malani, Qinru Qiu and Qing Wu, “Workload prediction and dynamic voltage scaling for MPEG decoder”, in proceedings of IEEE/ACM Asia and South Pacific Design Automation Conference (ASPDAC), Yokohama, JAPAN, January 2006. [pdf]

 

Application development on parallel systems:

This research project focuses on development, implementation and optimization of a set of cognitive computing algorithms on heterogeneous multiprocessor cluster consisting of 24 Cell Broadband Engine (CellBE) processors and an Intel Xeon head node. The CellBE elements are available in form of Sony PlayStation 3 which has a PowerPC unit (PPU) and six special purpose processing cores (SPUs) available to meet the high computation demand of the application. The cluster head is a separate Intel Xeon machine with 2 quad cores which manages application partitioning, communication and user interface. Thus the entire setup has 176 physical processing cores.

Performance is optimized through SIMD and parallel programming on CellBE elements and by managing the inter-node communication in efficient manner. Inter node communication is handled through MPI while the CellBE elements use DMA transfers for on chip communication. The application partitioning strategy on such a heterogenous system is crucial which decides the computation to communication ratio. This requires careful analysis of the application algorithm. For more details on this project please visit the AMPS Lab Research webpage.

 
 
© 2009 Parth Malani
Updated: 06/29/09