ARM Overview and History

Introduction to ARM Processor
Most of the mobile device manufacturers are using at least one of the types of ARM processors. The applications ranging from consumer electronics like tablets, mobile phones, PDA’s, washing machines and hand held gaming consoles to networking solutions like routers and so many robotic applications.

The ARM is a 32-bit RISC instruction set architecture (ISA) developed by ARM Holdings. It is known as the Advanced RISC Machine, and before that as the Acorn RISC Machine. ARM processors can be used in any domain; they were originally conceived as a processor for desktop personal computers by Acorn Computers, a market now dominated by the x86 families used by IBM PC compatible and Apple Macintosh computers. The relative simplicity of ARM processors made them suitable for low power applications. This has made them dominant in the mobile and embedded electronics market as relatively low cost and small microprocessors and microcontrollers. Mainly the processors are used in handheld devices, Consumer Electronics and Robotics.

What is ARM?
The ARM processor is one of the families of CPUs based on RISC architecture developed by ARM holdings called Advanced RISC Machine (ARM). ARM processors are mostly used in electronic devices such as smart phones, tablets and consumer goods. The ARM processors are smaller in size, reduced complexity and low power makes them suitable for many devices, because of their reduced instruction set they require fewer transistors, which enable smaller die size for the integrated circuit. ARM makes 32-bit and 64-bit RISC multi core processors. RISC processors are designed to perform a smaller number of types of computer instructions so that they can operate at a higher speed and performing millions of instructions per second.

Why ARM?
  • High performance:
    • Load/store register based architecture.
    • Pipeline: parallel processing of Fetch, Decode and Execute operations.
    • 32-bit fixed length instructions.
    • Single cycle execution of most instructions but not all instructions.
    • Conditional execution of instructions.
    • Additional of a barrier shifter, hardware multiply-accumulate.
    • ARM is one of the most licensed and thus widespread processor cores in the world.
  • THUMB architecture – 16-bit instructions set for code efficiency
  • On-chip debug – reduces development time
  • Small in die size for example ARM7TDMI core is 0.53mm2 in a 0.18um process
  • Low power for example ARM7TDMI core is 0.25mW/MHz in a 0.18um process
  • Used especially in portable devices due to low power consumption and reasonable performance

History
  • ARM was developed at Acron computer limited of Cambridge England between 1983 and 1985, before this RISC concepts introduced in 1980 at Stanford and Berkeley.
  • Acron started in 1983 and by 1985 design of first commercial RISC machine called Acron RISC Machine (ARM), later it is called Advanced RISC Machine(Now).
  • ARM is the industry’s leading provider of 16/32 bit embedded RISC microprocessor solutions.
  • ARM cores Licensed to partners who fabricate ICs and sell to customers.
  • The company is best known for its processors although it also designs, licenses and sells software development tools under the RealView and KIEL brands, systems and platforms, system on chip infrastructure and software.
  • ARM do not make ICs but it grant license of core to different silicon vendors like ATMEL, NXP etc
  • These companies make ICs
  • Examples are: LPC 21XX series from NXP and AT91RM9200 from ATMEL.

ARM Based products


ARM Features
  • RISC Architecture
  • All instructions are 32 bits long
  • Most instructions execute in a single cycle
  • Most instructions can be conditionally executed
  • A load/store architecture
    • Data processing instructions act only on registers.
  • Three operand format
  • Combined ALU and Shifter for high speed bit manipulation
    • Specifies memory access instructions with powerful auto-indexing addressing modes.
  • 32 bit and 8 bit data types
  • And also 26 bit data type on ARM architecture V4
  • Flexible multiple register load and store instructions
  • 32 bit general purpose processor
  • ARM has 32-bit architecture but supports 16 bit or 8 bit data types also.
  • ARM is programmable as little endian or big endian data alignment in memory.
  • High performance ,low power consumption and small size
  • Large, regular Register File
  • Pipelining
  • Uniform and fixed-length(32 bit) instruction
  • 3-address instruction format
  • Simple addressing modes
  • Conditional execution of the instructions
  • Control over both ALU and Shifter in every data processing instruction
  • Multiple load/store register instructions
  • The ability to perform a general shift operation and ALU operation in a single instruction that executes in a single clock cycle
  • Coprocessor instruction interfacing or instruction set extension via coprocessors
  • THUMB architecture-(dense 16-bit compressed instruction set)
  • Hardware virtualization support

Coprocessors
  • Up to 16 coprocessor can be defined
  • Expands the ARM instruction set
  • Each coprocessor can have up to 16 private registers of any reasonable size
  • Load – store architecture


Barrel Shifter
A unique and powerful feature of the ARM processor is the ability to shift the 32-bit binary pattern in one of the source registers left or right by a specific number of positions before it enters the ALU. This shift increases the power and the flexibility of many data processing operations.
PRE r5 =5
r7 = 8
MOV r7, r5, LSL #2 ; let r7 = r5*4
POST r5 = 5
r7 = 20

Mnemonic
LSL Logical shift left
LSR Logical shift right
ASR Arithmetic shift right
ROR Rotate right
RRX Rotate Right Extended

Instruction Set Architectures
  • Compiler options dictate which components are compiled into ARM or Thumb
  • Developers choose which components require higher performance or reduced memory
  • Switching between states is performed using a BX (Branch and Exchange)

Most ARMs implement two instruction sets
  • 32-bit ARM instruction set
  • 16-bit Thumb instruction set

ARM 32-bit instruction set:
  • The ARM instruction set provides the complete range of instructions for efficient embedded software. Following are the add-ons which ARM instruction set best:
    • Promotes efficient implementation in low power, high performance cores
    • Provides compilers with a range of instructions suited to the generation of efficient code.
    • Contains instructions for media operations, DSP functions, system control and co-processor operations
  • All instructions are 32 bit in length
  • All instructions must be word aligned
  • PC value stored in bits[31:2] and bits [1:0] equal to zero

THUMB 16-bit Instruction Set (T variant):
  • Instructions 16 bit in length
  • Instructions half-word aligned
  • PC value stored in bits[31:1] and bit [0] equal to zero
  • Contains a sub-set of the ARM instruction set compressed into 16-bit instructions
  • Promotes the generation of high density, typically 65% to 70% of ARM code size
  • Maximizes the use of on-chip memories
  • re-encoded subset of ARM instruction
  • Half the size of ARM instructions(16 bit)
  • Greater code density
  • On execution 16 bit thumb transparently decompressed to full 32 bit ARM without loss of performance
  • Has all the advantages of 32 bit core
  • Low performance in time-critical code
  • Doesn’t include some instruction needed for exception handling
  • 40% more instructions than ARM code
  • 30% less external memory power than ARM code
  • With 32 bit memory - ARM code 40% faster than Thumb code
  • With 16 bit memory - Thumb code 45% faster than Arm code
  • For best performance - use 32 bit memory and ARM code
  • For best cost and power efficiency - use 16 bit memory and thumb code
  • In typical embedded system
    • Use ARM code in 32 bit on-chip memory for small speed-critical routines
    • Use Thumb code in 16 bit off-chip memory for large non-critical routines
ARM architecture
  • 32-bit RISC-processor core (32-bit instructions)
    • Arm is not 100% RISC, some amendment (enhancements) to meets requirement of embedded system.
  • A large uniform register file
  • ARM is a 32-bit Load – Store architecture, where data processing operates on register contents only not directly from Memory.
  • Uniform and fixed length instructions
  • Instructions are 32-bit long
  • 8/16/32 bit data types
  • Pipelined (ARM7-3stage, ARM9-5stage)
  • Good speed/power consumption ratio
  • 7 modes of operation
  • High code density
  • Mostly single cycle execution
  • Speed 1Mhz to 1.25Ghz
  • 32 bit barrel shifter
  • Conditional execution of all instructions

Enhancement to basic RISC features
  • Control over ALU and shifter for every data processing operations to maximize their usage.
  • Auto-increment and auto-decrement addressing modes to optimize program loops.
  • Load and store multiple instructions to maximize data throughput.
  • Conditional execution of instruction to maximize execution throughput.


Figure1: Evolution of Arm Architecture

ARM Architecture Versions

Architecture -Version

Bits width

Family

ARM-Version1

32/26

ARM1

ARM-Version2

32/26

ARM2, ARM3

ARM-Version3

32/26

ARM6, ARM7

ARM-Version4

32

Strong Arm, ARM7TDMI, ARM9TDMI

ARM-Version5

32

ARM7EJ, ARM9E, ARM10E

ARM-Version6

32

ARM11, ARM Cortex - M

ARM-Version7

32

ARM Cortex-A, ARM Cortex-M, ARM Cortex-R

ARM-Version8

64/32

ARM Cortex-A, ARM Cortex-M, ARM Cortex-R

Figure2: Architecture Versions
  • Version 1 (1983 - 85)
    • 26 bit addressing, no multiply or coprocessor
  • Version 2
    • Includes 32 bit result multiply co processor
  • Version 3
    • 32 bit addressing
  • Version 4
    • Add signed, unsigned half-ward and signed byte load and store instructions
    • Version 4T
  • 16 – bit thumb compressed form of instruction introduced
  • Version 5T
    • Superset of 4T adding new instruction
  • Version 5TE
    • Add signal processing signal extension
  • Examples:
    • ARM6 : v3
    • ARM7 : v3, ARM7TDMI : v4T
    • Strong ARM : v4
    • ARM 9E-S : v5TE
ARM Nomenclature
ARM 7TDMI -- ARM7 family processor which has
T – Thump Instruction set
D – Debug unit
M – MMU
I – trace circuit inside the core

































#Tags