🤬
  • RCA for CVE-2021-1905: Qualcomm Adreno GPU memory mapping use-after-free

    Change-Id: I340795966087833f8bc15df9f4bc90e794392de4
  • Loading...
  • Ben Hawkes committed 3 years ago
    0cebd59e
    1 parent 42ace3da
  • ■ ■ ■ ■ ■ ■
    0day-RCAs/2021/CVE-2021-1905.md
     1 +# CVE-2021-1905: Qualcomm Adreno GPU memory mapping use-after-free
     2 +*Ben Hawkes, Project Zero*
     3 + 
     4 +## The Basics
     5 + 
     6 +**Disclosure or Patch Date:** 1 May 2021
     7 + 
     8 +**Product:** Qualcomm Adreno GPU
     9 + 
     10 +**Advisory:** https://www.qualcomm.com/company/product-security/bulletins/may-2021-bulletin
     11 + 
     12 +**Affected Versions:** Prior to Android 2021-05-01 security patch level
     13 + 
     14 +Note: the Qualcomm Adreno GPU kernel driver may be used in other platforms aside from Android, but the following analysis was performed with Android in mind, since Android is a high priority area of interest for Project Zero.
     15 + 
     16 +**First Patched Version:** Android 2021-05-01 security patch level
     17 + 
     18 +**Issue/Bug Report:** N/A
     19 + 
     20 +**Patch CL:**\
     21 +https://source.codeaurora.org/quic/la/kernel/msm-4.9/commit/?id=d236d315145f8250523ce9e14897d62e5d6639fc\
     22 +https://source.codeaurora.org/quic/la/kernel/msm-4.9/commit/?id=ec3c8cf016991818ca286c4fd92255393c211405
     23 + 
     24 +**Bug-Introducing CL:** N/A
     25 + 
     26 +**Reporter(s):** N/A
     27 + 
     28 +## The Code
     29 + 
     30 +**Proof-of-concept:** N/A
     31 + 
     32 +**Exploit sample:** N/A
     33 + 
     34 +**Did you have access to the exploit sample when doing the analysis?** No
     35 + 
     36 +## The Vulnerability
     37 + 
     38 +**Bug class:** use-after-free (UaF)
     39 + 
     40 +**Vulnerability details:**
     41 + 
     42 +There are two conditions required to trigger this vulnerability.
     43 + 
     44 +The first condition is to trigger a state error in a core GPU structure used to track GPU mappings. A GPU shared mapping with multiple VMAs (Linux kernel virtual memory areas) is created (e.g. by splitting a larger mapping). One of the mappings is closed, which results in the `kgsl_gpumem_vm_close` function being called via the registered `struct vm_operations_struct`. The `kgsl_gpumem_vm_close` then clears the `entry->memdesc.useraddr` field of the GPU shared mapping's `struct kgsl_mem_entry`. Unfortunately this has an unintended logical effect for the remaining VMA, since the entry structure is shared, and this field is used to check whether the entry is already mapped.
     45 + 
     46 +Specifically this means that `get_mmap_entry` will successfully return this entry when the GPU mapping is mapped for a second time. This occurs in both `kgsl_mmap` and `kgsl_get_unmapped_area`, but the latter looks most interesting for this attack.
     47 + 
     48 +The `kgsl_get_unmapped_area` function is called by the Linux kernel's mmap implementation. A semaphore (`mmap_sem`) is held which prevents multiple threads in the same process from calling this function concurrently. In the Qualcomm GPU design, multiple processes can share the same GPU address space (such as a child process that is forked after the KGSL file descriptor is opened), and so multiple VMAs can share the same underlying `struct kgsl_mem_entry`.
     49 + 
     50 +The second condition is to trigger a race condition in `kgsl_get_unmapped_area` between two processes trying to map the same GPU mapping at the same time. Since this occurs after the first condition has been triggered, which can result in the same `struct kgsl_mem_entry` is being used at the same time in each process. Since there are no locks held on this structure, this can lead to unexpected behavior.
     51 + 
     52 +There are a number of paths that could be explored to exploit this issue, such as using an error path to call `kgsl_iommu_put_gpuaddr` on a successfully allocated mapping.
     53 + 
     54 +**Patch analysis:**
     55 + 
     56 +Although only one patch is listed in the Qualcomm advisory, we believe both patches listed above are relevant to this issue. The first patch changes the way `kgsl_gpumem_vm_close` accounts for the fact that multiple VMAs may point to the same GPU shared mapping. The second patch adds locking to the `memdesc` field of the `struct kgsl_mem_entry`, which aims to prevent similar race conditions in memory management routines.
     57 + 
     58 +**Thoughts on how this vuln might have been found _(fuzzing, code auditing, variant analysis, etc.)_:**
     59 + 
     60 +Given the complex interplay of discretely triggering one condition followed by winning a race condition, this issue would be challenging to fuzz, but it might be possible with a well-crafted fuzzer designed specifically for the Qualcomm GPU driver (e.g. by biasing system calls toward relevant process management, memory management and well-formed KGSL ioctl system calls).
     61 + 
     62 +It is possible that this issue was found manually, either by observing the lack of locking on the shared `struct kgsl_mem_entry` and working backward to establish a path to triggering this, or by observing the suspicious state management in `kgsl_gpumem_vm_close` and building the attack up from there.
     63 + 
     64 +**(Historical/present/future) context of bug:**
     65 + 
     66 +A different use-after-free (UaF) vulnerability was discovered and fixed by Man Yue Mo from the GitHub Security Lab. This vulnerability was in a different part of GPU memory management code, and was not known to be exploited in-the-wild. His write-up of this attack can be found [here](https://securitylab.github.com/research/one_day_short_of_a_fullchain_android/).
     67 + 
     68 +Another issue, CVE-2021-1906, was fixed by Qualcomm at the same time and reported as in-the-wild. This change is believed to be related to CVE-2020-11261 (also marked as exploited in-the-wild), and is not directly useful by itself.
     69 + 
     70 +## The Exploit
     71 + 
     72 +(The terms *exploit primitive*, *exploit strategy*, *exploit technique*, and *exploit flow* are [defined here](https://googleprojectzero.blogspot.com/2020/06/a-survey-of-recent-ios-kernel-exploits.html).)
     73 + 
     74 +**Exploit strategy (or strategies):** N/A
     75 + 
     76 +**Exploit flow:** N/A
     77 + 
     78 +**Known cases of the same exploit flow:** N/A
     79 + 
     80 +**Part of an exploit chain?** N/A
     81 + 
     82 +## The Next Steps
     83 + 
     84 +### Variant analysis
     85 + 
     86 +**Areas/approach for variant analysis (and why):**
     87 + 
     88 +Generally all of the structures that can be shared between multiple processes (such as `struct kgsl_process_private`) should be carefully investigated for state assumptions, reference counting issues, and race conditions.
     89 + 
     90 +**Found variants:**
     91 + 
     92 +A cursory review of relevant structure members and memory management related ioctls and callbacks didn't surface any variants of this issue.
     93 + 
     94 +### Structural improvements
     95 + 
     96 +What are structural improvements such as ways to kill the bug class, prevent the introduction of this vulnerability, mitigate the exploit flow, make this type of vulnerability harder to exploit, etc.?
     97 + 
     98 +**Ideas to kill the bug class:**
     99 + 
     100 +In this case it's hard to say if the attack would have proceeded with the classical memory corruption route (e.g. using the freed object to achieve arbitrary R/W), or with a GPU specific approach (such as granting arbitrary physical memory R/W to an attacker controlled GPU context). If the former approach, then upcoming memory tagging designs would likely help. The latter approach would require further study.
     101 + 
     102 +**Ideas to mitigate the exploit flow:** N/A
     103 + 
     104 +**Other potential improvements:** N/A
     105 + 
     106 +### 0-day detection methods
     107 + 
     108 +What are potential detection methods for similar 0-days? Meaning are there any ideas of how this exploit or similar exploits could be detected **as a 0-day**?
     109 + 
     110 +Kernel crash log analysis might be one approach, but establishing the root-cause of an issue like this using only crash output would be challenging. Runtime anomaly detection might be another option, but would require specialist tooling.
     111 + 
     112 +## Other References
     113 + 
Please wait...
Page is in error, reload to recover