1.Bimodal Predictors – 1 level PC referenced
2.Local predictors – 2 level PC referenced
3.Global Predictors – 2 level with no PC reference.
Each type of predictors has its own merits and demerits. To further delve into subject we tried to simulate and measure the performance of few more branch predictors which uses a reference to the PC (program counter) along with the Global history table. Dr. Scott McFarling, in one of his papers, had suggested 2 schemes which combines PC with global history table to achieve improvement in branch prediction accuracies.
Global predictors using address information [Scott McFarling Report TN-36 1993]
Global Select (Gselect):
Gselect prediction scheme is an improved version of global predictor which uses both, the global history table and the current address of the branch, to index the 2nd level 2-bit saturating counters.
As seen in Fig 1. Gselect predictor concatenates the last few bits of the branch address with the Global history table to index a particular 2-bit saturating counter in the 2nd level counter table
This scheme shows some improvement over global predictors as global predictors lacks the information about the branch which is being currently executed.
Fig. 1- Global Select Predictors
Global Share (Gshare):
This is another scheme proposed by Scott McFarling which takes advantage of global history and program counter. Here more number of bits (compared to Gselect) each of PC and global history are used to reference same number of 2nd level 2-bit saturating counter.
Fig. 2- Global Share Predictors
Fig. 2 shows that m bits of Program counter are X-ORed with n bits (m=n) of Global History Table to index a particular 2-bit saturating counter in the 2nd level counter table.
Simple Scalar Simulator which we used for our analysis supports Gshare algorithm to be implemented but not Gselect. As we wanted to compare the performance statistics of Gshare & Gselect, we incorporated few modifications in the code, without changing the working flow of the simulator, to implement Gselect algorithm. We implemented these changes in branch prediction module of the simulator named bpred.c. The functions which modified were bpred_lookup() which is to used get predicted PC & bpred_update() which is used to update the history bits.
After implementing Gselect, we tried to compare the performance of Gselect and Gshare with help of 2 integer benchmarks bzip and gzip.
Fig. 3- Performance Statistics of Gshare & Gselect Predictors for gzip Integer benchmark
Graphs above show the statistics of Gshare and Gselect Branch prediction scheme with different configurations.
Fig. 4- Performance Statistics of Gshare & Gselect Predictors for bzip Integer benchmark
The variable parameter was global table history bits which we varied from 8 to 14 bits for Gshare thus referencing 256 to 16384 2nd level counters. Whereas for Gselect the history bits used were 4 to 7 which again in turn referenced 256 to 16384 2nd level counters.
The results were very intuitive. For any number of history bits used, Gshare always outperformed Gselect. The results were very much inline to our expectations as XOR-ing n history bits and n PC bits has more information than the concatenation of n/2 bits of each. This leads to better prediction accuracies for Gshare than Gselect.
Overall Performance Comparison:
To compare all the branch prediction schemes discussed so far in this paper, we kept Level 2 table size fixed for all the schemes. The size chosen was 2048 2-bit saturating counters which we found to be optimal by running various configurations of individual schemes.
Fig. 5- Configurations for different Branch Prediction Schemes...