VLIW processors, architecture and CPU features

VLIW-type CPUs have a number of benefits and disadvantages in comparison with different processors and haven’t solely been utilized in CPUs, but additionally as shader models for GPUs and additionally in DSPs.

Today, VLIW designs appear to have disappeared from PC {hardware}, nevertheless they continue to be a legitimate choice within the design of latest processors for various areas of the {hardware} market regardless of their disuse.

How does a VLIW processor work?


In a superscalar processor or typical ILP, the directions are captured and processed individually through the instruction cycle of every one. Whether we’re speaking about an in-order or an out-of-order execution. In the case of a VLIW processor, what is completed is to group a number of directions into one and ship them collectively to the completely different models which can be obtainable within the processor.

To get this VLIW processors rely closely on the compiler to generate binary code, which is able to group the completely different directions right into a single instruction, at all times taking into consideration the occupancy degree of every of the execution models at every second of execution, which is able to rely upon the variety of clock cycles required by every one among them. the directions.

Since the directions can have completely different levels of period when it comes to clock cycles, this can be a efficiency downside, since throughout a number of clock cycles we may have execution models that may do nothing and that shall be executing a NOP instruction, which implies that throughout that clock cycle stated unit doesn’t carry out any operation. This makes VLIW processors extremely depending on the compiler for max effectivity.

Advantages and disadvantages of a VLIW design

CPU Reverse Render

Mainly the benefits it brings are the next:

  • The {hardware} in control of decoding the directions is far less complicated than an ILP or TLP CPU, this enables leaving extra free house on the chip for execution models and subsequently having the ability to execute extra directions on the identical time.
  • Having extra space additionally means that you can place a bigger variety of registers, which is good for facilitating speculative execution typical of out-of-order processors with out the necessity for a kind buffer.

Regarding its disadvantages, the primary of them is in the truth that a way more complicated compiler is required, the second being the one which we’ve talked about earlier than and that’s primarily based on the very fact that there’s a better waste of the completely different execution models, since that almost all of them are going to spend a great time unoccupied.


To perceive it higher, think about that you’ve grouped in a VLIW Three directions that want the primary four cycles to be executed, the second 7 cycles and the third 10 cycles. The execution unit in control of finishing up the primary instruction shall be 6 clock cycles with out doing something, the second 3 and all this as a result of the third will want 10 cycles to perform.

On the opposite hand, we’ve so as to add the truth that though on the instruction degree the binaries don’t change, when growing a brand new CPU it’s doable that an instruction already exists will increase or decreases the variety of cycles. This makes a special compiler needed even for brand new iterations of a brand new processor, which makes it troublesome to launch extra superior variations of a processor and requires in lots of instances the creation of a binary to binary compiler, which reorders the directions for the brand new CPU.

Generation of directions by the compiler

Binary code color

So that you would be able to perceive it higher, we’ve ready a few lists, the primary is the execution in a superscalar processor or often called ILP, the second is a VLIW kind CPU.

Starting with an ILP-type processor, an inventory of its directions could be the next:

  1. Load A1
  2. Load B1
  3. Load A2
  4. Charge B2
  5. Multiply the values ​​of A1 and B1
  6. Add the values ​​of A2 and B2
  7. Add A1 and A2
  8. Cargo A3
  9. B3 load
  10. Multiply A3 by B3
  11. Add B1 and B2.

On the opposite hand, a VLIW processor will group a number of of the directions into one:

  1. The A2 and B2 are charged concurrently
  2. Load A2 and B2, multiply A1 and B1, add A2 and B2.
  3. Load A3, B3, multiply A3 by B3 and add B1 and B2.

The incontrovertible fact that we’ve managed to group the 11 directions into solely Three very massive directions implies that the period of time that every of the VLIW directions would require will at most be the time it takes for essentially the most complicated instruction within the group of directions.

Memory entry of this kind of processors


As we mentioned earlier, VLIW processors rely upon the compiler and many occasions they add NOP statements to the code throughout compilation. The cause for doing it’s because making a VLIW CPU with directions of variable dimension is extraordinarily complicated, so it’s executed is to create a set dimension of bits at which the CPU reads the directions and fetch that quantity of knowledge from reminiscence in every cycle. and directions.

This implies that VLIW processors require a lot wider information buses than typical CPUs as a result of the truth that they group numerous bits every time they seize new directions to be executed. This being its nice Achilles heel, since in ILP processors, widespread in PC CPUs, narrower information widths and subsequently less complicated reminiscence controllers are used.

The regular factor in VLIW processors is that they seize the next directions to be executed whereas the present VLIW instruction is executed. Since by grouping a number of directions into one, the seize time of every one among them individually is lowered.