aarch64 EL0 mode and user space range

679 Views Asked by At

I'm currently porting SO3 operating system on Aarch64 architecture. I'm doing that using QEMU/virt64 emulation. Everything went well as long as I am in the kernel space running code in EL1. But when I started tackling the user space side, I faced a couple of issues and discovered various things. I would like to make sure that I inderstood correctly.

(SO3 mechanisms are very similar to Linux in a very simplified form. We consider a 48-bit VA addressing, ttbr0/ttbr1_el1 uses the same page table at the moment).

First, it is impossible to run kernel code in EL0 mode (hence code located at addresses > 0xffff00..), right ? (currently, the first user thread prologue starts its execution in the kernel space, but in user mode, so I need to remap some pages in the application area) or I missed some features that armv8 would support?

Second, I noticed that in EL0, any access to memory location with VA using bit 47 leads to an exception (in EL1) (I placed the stack top in the last pages of the user range and) Any idea why it fails? (SCTRL/TCR are configured slightly the same way as Linux)

Both cases leads to esr_el1: 0x92000044 (MMU fault)

ADDINGS

There are four translation levels.

Furthermore, something weird: if I set the bit 6 (AP1) in the TTE of a block descriptor, it fails when the MMU switches to the L0 page table. It only works for a page TTE (Level 3).

Here are the values of system registers:

  • TCR_EL1: 0x15b5503510
  • SCTLR_EL1: 0x34f4d91d

which corresponds for TCR to :

#define TCR_CACHE_FLAGS TCR_IRGN_WBWA | TCR_ORGN_WBWA
#define TCR_TG_FLAGS    TCR_TG0_4K | TCR_TG1_4K
#define TCR_SMP_FLAGS   (TCR_SH0_INNER | TCR_SH1_INNER)

tcr = TCR_CACHE_FLAGS | TCR_SMP_FLAGS | TCR_TG_FLAGS | TCR_ASID16 | TCR_A1;
tcr |= TCR_TxSZ(48) | (TCR_PS_BITS_256TB << TCR_IPS_SHIFT);

and SCTLR to:

#define SCTLR_EL1_SET   (SCTLR_ELx_M    | SCTLR_ELx_C    | SCTLR_ELx_SA   |\
             SCTLR_EL1_SA0  | SCTLR_EL1_SED  | SCTLR_ELx_I    |\
             SCTLR_EL1_DZE  | SCTLR_EL1_UCT           |\
             SCTLR_EL1_NTWE | SCTLR_ELx_IESB | SCTLR_EL1_SPAN |\
             ENDIAN_SET_EL1 | SCTLR_EL1_UCI  | SCTLR_EL1_RES1)

The memory region attributes are taken from Linux :

#define MT_NORMAL       0
#define MT_NORMAL_TAGGED    1
#define MT_NORMAL_NC        2
#define MT_NORMAL_WT        3
#define MT_DEVICE_nGnRnE    4
#define MT_DEVICE_nGnRE     5
#define MT_DEVICE_GRE       6

/* MAIR_ELx memory attributes (used by Linux) */
#define MAIR_ATTR_DEVICE_nGnRnE     UL(0x00)
#define MAIR_ATTR_DEVICE_nGnRE      UL(0x04)
#define MAIR_ATTR_DEVICE_GRE        UL(0x0c)
#define MAIR_ATTR_NORMAL_NC     UL(0x44)
#define MAIR_ATTR_NORMAL_WT     UL(0xbb)
#define MAIR_ATTR_NORMAL_TAGGED     UL(0xf0)
#define MAIR_ATTR_NORMAL        UL(0xff)
#define MAIR_ATTR_MASK          UL(0xff)

/* Position the attr at the correct index */
#define MAIR_ATTRIDX(attr, idx)     ((attr) << ((idx) * 8))

#define MAIR_EL1_SET                            \
    (MAIR_ATTRIDX(MAIR_ATTR_DEVICE_nGnRnE, MT_DEVICE_nGnRnE) |  \
     MAIR_ATTRIDX(MAIR_ATTR_DEVICE_nGnRE, MT_DEVICE_nGnRE) |    \
     MAIR_ATTRIDX(MAIR_ATTR_DEVICE_GRE, MT_DEVICE_GRE) |        \
     MAIR_ATTRIDX(MAIR_ATTR_NORMAL_NC, MT_NORMAL_NC) |      \
     MAIR_ATTRIDX(MAIR_ATTR_NORMAL, MT_NORMAL) |            \
     MAIR_ATTRIDX(MAIR_ATTR_NORMAL_WT, MT_NORMAL_WT) |      \
     MAIR_ATTRIDX(MAIR_ATTR_NORMAL, MT_NORMAL_TAGGED))

The kernel code is mapped as MT_NORMAL

And the TTE are configured with the following functions:

static inline void set_pte_table(u64 *pte, enum dcache_option option)
{
    u64 attrs = PTE_TABLE_NS;

    *pte |= PTE_TYPE_TABLE;
    *pte |= attrs;
}

static inline void set_pte_block(u64 *pte, enum dcache_option option)
{
    u64 attrs = PTE_BLOCK_MEMTYPE(option);

    /* Set the PTE with R/W permissions for both kernel and user mode */
    *pte |= PTE_TYPE_BLOCK | PTE_BLOCK_AF | PTE_BLOCK_INNER_SHARE | PTE_BLOCK_NS;
    *pte |= attrs;
}

static inline void set_pte_page(u64 *pte, enum dcache_option option)
{
    u64 attrs = PTE_BLOCK_MEMTYPE(option);

    /* Set the PTE with R/W permissions for both kernel and user mode */
    *pte |= PTE_TYPE_PAGE | PTE_BLOCK_AF | PTE_BLOCK_INNER_SHARE | PTE_BLOCK_NS | PTE_BLOCK_AP1;
    *pte |= attrs;
}

If I try to set PTE_BLOCK_AP1 in the set_pte_block function, it fails.

1

There are 1 best solutions below

10
On

The translation regime that applies to EL1 and EL0 is one and the same. As such, you can absolutely run code in kernel mode if you configure the memory system correctly. You want to make sure that:

  • AP[1] (bit 6) in the TTE is 1.
  • If the page shall be executable, UXN (bit 54) in the TTE is 0.
  • None of the page tables have their APTable or UXNTable bits set to a value that doesn't allow them to map pages as userland accessible / userland executable.
  • If you're on ARMv8.1 or later, PAN is disabled. To do this, run msr pan, 0 and set SCTLR_EL1.SPAN (bit 23) to 1 (otherwise PAN will be re-enabled on each exception entry).
  • If your target has FEAT_E0PD (mandatory in ARMv8.5, optional in ARMv8.4), make sure that TCR_EL1.E0PD1 (bit 56) is 0 (assuming your kernel is in the upper half of the address space, otherwise it's E0PD0, bit 55).

(Bits are numbered starting from 0, same as in the ARMv8 reference manual.)

For your second question we'd need to know the exact value you load into TCR_EL1, but my guess is that T0SZ has a value that makes the TTBR0-mapped address space smaller than 48 bits. In either case, ESR_EL1 will hold the exception syndrome when read at the exception vector.