PC Architecture. Chapter 15. The evolution of the Pentium 4. A book by Michael B. Karbo

Copyright Michael Karbo and ELI Aps., Denmark, Europe.

  • Next chapter.
  • Previous chapter.

    Chapter 15. Evolution of the Pentium 4

    As was mentioned earlier, the older P6 architecture was released back in 1995. Up to 2002, the Pentium III processors were sold alongside the Pentium 4. That means, in practise, that Intel’s sixth CPU generation has lasted 7 years.

    Similarly, we may expect this seventh generation Pentium 4 to dominate the market for a number of years. The processors may still be called Pentium 4, but it comes in al lot varietes.

    A mayor modification comes with the version using 0.65 micron process technology. It will open for higher clock frequencies, but there will also be a number of other improvements.

    Hyper-Threading Technology is a very exciting structure, which can be briefly outlined as follows: In order to exploit the powerful pipeline in the Pentium 4, it has been permitted to process two threads at the same time. Threads are series of software instructions. Normal processors can only process one thread at a time.

    In servers, where several processors are installed in the same motherboard (MP systems), several threads can be processed at the same time. However, this requires that the programs be set up to exploit the MP system, as discussed on page 31.

    The new thing is that a single Pentium 4 logically can function as if there physically were two processors in the pc. The processor core (with its long pipelines) is simply so powerful that it can, in many cases, act as two processors. It’s a bit like one person being able to carry on two independent telephone conversations at the same time.

    Figur 110. The Pentium 4 is ready for MP functions.

    Hyper-Threading works very well in Intel’s Prescott-versions of Pentium 4. You gain performance when you operate more than one task at the time. If you have two programs working simultaneously, both putting heavy pressure on the CPU, you will benefit from this technology. But you need a MP-compatible operating system (like Windows XP Professional) to benefit from it.

    The next step in this evolution is the production of dual-core processors. AMD produces Opteron chips which hold two processors in one chip. Intel is working on dual core versions of the Pentium 4 (with the codename ”Smithfield”). These chips will find use in servers and high performance pc’s. A dual core Pentium 4 with Hyper-Threading enabled will in fact operate as a virtual quad-core processor.

    Figur 111. A dual core processor with Hyper Threading operates as virtual quad-processor.

    Intel also produces EE-versions of the Pentium 4. EE is for Extreme Edition, and these processors are extremely speedy versions carrying 2 MB of L2 cache. 

    In late 2004 Intel changed the socket design of the Pentium 4. The new processors have no ”pins”; they connect directly to the socket using little contacts in the processor surface.

    Figur 112. The LGA 775 socket for Pentium 4.


    The last processor I will discuss is the popular Athlon and Athlon 64 processor series (or K7 and K8).

    It was a big effort on the part of the relatively small manufacturer, AMD, when they challenged the giant Intel with a complete new processor design.

    The first models were released in 1999, at a time when Intel was the completely dominant supplier of PC processors. AMD set their sights high – they wanted to make a better processor than the Pentium II, and yet cheaper at the same time. There was a fierce battle between AMD and Intel between 1999 and 2001, and one would have to say that AMD was the victor. They certainly took a large part of the market from Intel.

    The original 1999 Athlon was very powerfully equipped with pipelines and computing units:

  • Three instruction decoders which translated X86 program CISC instructions into the more efficient RISC instructions (ROP’s) – 9 of which could be executed at the same time.

  • Could handle up to 72 instructions (ROP out of order) at the same time (the Pentium III could manage 40, the K6-2 only 24).

  • Very strong FPU performance, with three simultaneous instructions.

    All in all, the Athlon was in a class above the Pentium II and III in those years. Since Athlon processors were sold at competitive prices, they were incredibly successful. They also launched the Duron line of processors, as the counterpart to Intel’s Celeron, and were just as successful with it.

    Figur 113. Athlon was a huge success for AMD. During 2001-2002, the Athlon XP was in strong competition with the Pentium 4.


    Athlon XP versus Pentium 4

    The Athlon processor came in various versions. It started as a Slot A module (see Fig. 107 on page 42). It was then moved to Socket A, when the L2 cache was integrated.

    In 2001, a new Athlon XP version was released, which included improvements like a new Hardware Auto Data Prefetch Unit and a bigger Translation Look-aside Buffer. The Athlon XP was much less advanced than the Pentium 4 but quite superior at clock frequencies less than 2000 MHz. A 1667 MHz version of AthlonXP was sold as 2000+. This indicates, that the processor as a minimum performs like a 2000 MHz Pentium 4.

    Later we saw Athlons in other versions. The latest was based on a new kernel called ”Barton”. It was introduced in 2003 with a L2-cachen of 512 KB. AMD tried to sell the 2166 MHz version under the brand 3000+. It did not work. A Pentium 4 running at 3000 MHz had no problems outperforming the Athlon.

    Opteron/ Athlon64

    AMD’s 8th generation CPU was released in 2003. It is based on a completely new core called Hammer.

    A new series of 64-bits processors is called Athlon 64, Athlon 64 FX and Opteron. These CPU’s has a new design in two areas:

  • The memory controller is integrated in the CPU. Traditionally this function has been housed in the north bridge, but now it is placed inside the processor.

  • AMD introduces a completely new 64-bit set of instructions.

    Moving the memory controller into the CPU is a great innovation. It gives a much more efficient communication between CPU and RAM (which has to be ECC DDR SDRAM – 72 bit modules with error correction).)

    Every time the CPU has to fetch data from normal RAM, it has to first send a request to the chipset’s controller. It has to then wait for the controller to fetch the desired data – and that can take a long time, resulting in wasted clock ticks and reduced CPU efficiency. By building the memory controller directly into the CPU, this waste is reduced. The CPU is given much more direct access to RAM. And that should reduce latency time and increase the effective bandwidth.

    The Athlon 64 processors are designed for 64 bits applications. This should be more powerful than the existing 32 bit software. We will probably see plenty of new 64 bit software in the future, since Intel is releasing 64 bit processors compatible with the Athlon 64 series.

    Figur 114. In the Athlon 64 the memory controller is located inside the processor. Hence, the RAM modules are interfacing directly with the CPU.

    Overall the Athlon 64 is an updated Athlon-processor with integrated north bridge and 64 bits  instructions. Other news are:

  • Support for SSE2 instructions and 16 registers for this.

  • Dual channel interface to DDR RAM giving a 128 bit memory bus, although the discount version Athlon 64 keeps the 64 bit bus.

  • Communikationen to and from the south bridge via a new HyperTransport bus, operating with high-speed serial transfer.

  • New sockets of 754 and 940 pins.

    A complete line of chips

    AMD expects to use the K8 kernel in all types of processors:


    The Opteron is the most expensive and advanced version to be used in multi-processor servers. The models are called 200, 400 and 800, and they use 2, 4 or 8 CPUs on the same motherboard – without use of a north bridge.

    All processors share a common memory of up to 64 GB. Each Opteron has three Hyper­Transport I/O channels, which each can move 6,4 GB/secund.

    The Athlon FX is a Opteron to be used in single processor configurations, high-end pc’s and workstations. There is dual RAM interface, but only one channel of Hyper Transport Link.

    This is the discount version with reduced performance and lower prices. Only 64 bit RAM interface and smaller L2-cache.

    Figur 115. Three versions of the latest AMD processor.

    Historical overview

    I will close off this review with a graphical summary of a number of different CPU’s from the last 25 years. The division into generations is not always crystal clear, but I have tried to present things in a straightforward and reasonably accurate way:

    Figur 116. There are scores of different processors. A selection of them is shown here, divided into generations.

    But what is the most powerful CPU in the world? IBM’s Power4 must be a strong contender. It is a monster made up of 8 integrated 64-bit processor cores. It has to be installed in a 5,200 pin socket, uses 500 watts of power (there are 680 million transistors), and connects to a 32 MB L3 cache, which it controls itself. Good night to Pentium.

  • Next chapter.
  • Previous chapter.