| 1 | + | # CVE-2021-1048: refcount increment on mid-destruction file |
| 2 | + | *Jann Horn* |
| 3 | + | |
| 4 | + | ## The Basics |
| 5 | + | |
| 6 | + | **Disclosure or Patch Date:** it's complicated (but the Android bulletin is from 6 November 2021) |
| 7 | + | |
| 8 | + | **Product:** Android / Linux kernel |
| 9 | + | |
| 10 | + | **Advisory:** [ASB 2021-11](https://source.android.com/security/bulletin/2021-11-01#kernel-components_1) |
| 11 | + | |
| 12 | + | **Affected Versions (upstream Linux):** |
| 13 | + | - 5.9-rc2 - 5.9-rc3 (mainline: only release candidates affected) |
| 14 | + | - 5.8.4 - 5.8.7 (short-lived stable branch) |
| 15 | + | - date range: 2020-08-26 - 2020-09-09 |
| 16 | + | - 5.7.18 and higher (short-lived stable branch, EOL before fix) |
| 17 | + | - date range: 2020-08-26 - EOL |
| 18 | + | - 5.4.61 - 5.4.63 (LTS stable branch) |
| 19 | + | - date range: 2020-08-26 - 2020-09-09 |
| 20 | + | - 4.19.142 - 4.19.143 (LTS stable branch) |
| 21 | + | - date range: 2020-08-26 - 2020-09-09 |
| 22 | + | - 4.14.195 - 4.14.196 |
| 23 | + | - date range: 2020-08-26 - 2020-09-09 |
| 24 | + | - 4.9.234 - 4.9.235 |
| 25 | + | - date range: 2020-08-26 - 2020-09-12 |
| 26 | + | - 4.4.234 - 4.4.235 |
| 27 | + | - date range: 2020-08-26 - 2020-09-12 |
| 28 | + | |
| 29 | + | **Affected Versions (Android devices):** possibly some Android devices before SPL 2021-11-06, depending on LTS syncs |
| 30 | + | |
| 31 | + | **First Patched Version:** |
| 32 | + | - upstream: 5.9-rc4, 5.8.8, 5.4.64, 4.19.144, 4.14.197, 4.9.236, 4.4.236 |
| 33 | + | - Android devices: SPL 2021-11-06 or lower (see "context of bug" section for explanation) |
| 34 | + | |
| 35 | + | **Issue/Bug Report:** unknown |
| 36 | + | |
| 37 | + | **Patch CL:** https://git.kernel.org/linus/77f4689de17c |
| 38 | + | |
| 39 | + | **Bug-Introducing CL:** https://git.kernel.org/linus/a9ed4a6560b8 (bugfix for another memory corruption) |
| 40 | + | |
| 41 | + | **Reporter(s):** unknown |
| 42 | + | |
| 43 | + | ## The Code |
| 44 | + | |
| 45 | + | **Proof-of-concept:** N/A |
| 46 | + | |
| 47 | + | **Exploit sample:** N/A |
| 48 | + | |
| 49 | + | **Did you have access to the exploit sample when doing the analysis?** no |
| 50 | + | |
| 51 | + | ## The Vulnerability |
| 52 | + | |
| 53 | + | **Bug class:** object state confusion leading to use-after-free |
| 54 | + | |
| 55 | + | **Vulnerability details:** |
| 56 | + | |
| 57 | + | `ep_loop_check_proc()` is trying to increment the refcount of a file with |
| 58 | + | `get_file()`. However, `get_file()` is only allowed when a refcounted reference |
| 59 | + | is already held to the file; and `ep_loop_check_proc()` instead relies on |
| 60 | + | locking `ep->mtx` to protect the weak reference to the file from concurrent |
| 61 | + | removal by `eventpoll_release()`, which doesn't prevent encountering a file with |
| 62 | + | refcount zero. |
| 63 | + | |
| 64 | + | Here is a diagram of the relevant lifetime states of `struct file`: |
| 65 | + | |
| 66 | + | ![](CVE-2021-1048-file-states.png) |
| 67 | + | |
| 68 | + | Essentially, `get_file()` is called on an object that may be in a state in which |
| 69 | + | `get_file()` is not permitted. |
| 70 | + | |
| 71 | + | **Patch analysis:** |
| 72 | + | |
| 73 | + | `get_file()` is replaced with `get_file_rcu()`, which is valid for (a superset |
| 74 | + | of) all possible states of the file. |
| 75 | + | |
| 76 | + | **Thoughts on how this vuln might have been found _(fuzzing, code auditing, variant analysis, etc.)_:** |
| 77 | + | Since the bug was quickly fixed in upstream Linux, but not in all Android |
| 78 | + | devices, there's a good chance that the attackers specifically searched for |
| 79 | + | memory corruption fixes that are present upstream but not in Android devices. |
| 80 | + | |
| 81 | + | This reminds me of |
| 82 | + | https://googleprojectzero.blogspot.com/2019/11/bad-binder-android-in-wild-exploit.html , |
| 83 | + | another case where a bug was fixed upstream but not in all Android kernels. |
| 84 | + | |
| 85 | + | **(Historical/present/future) context of bug:** |
| 86 | + | |
| 87 | + | The commit that introduced the bug (and fixed another one) was included in the |
| 88 | + | Android Security Bulletin for December 2020, forcing all Android vendors to |
| 89 | + | include that commit. However, the fix for this bug, despite quickly landing in |
| 90 | + | upstream stable kernels (see "Affected Versions" above), was only included in an |
| 91 | + | Android Security Bulletin in November 2021. |
| 92 | + | |
| 93 | + | This means that devices by Android vendors who only cherrypick bugfixes |
| 94 | + | referenced in Android Security Bulletins, rather than pulling the complete |
| 95 | + | Android common kernel tree, will have been vulnerable for almost a year, even |
| 96 | + | though upstream stable releases (and Android common kernels) were only affected |
| 97 | + | for ~2-3 weeks. |
| 98 | + | |
| 99 | + | That doesn't necessarily mean that all Android devices were affected that long |
| 100 | + | though; for example, Pixel 4 XL devices seem to have been patched in their |
| 101 | + | March 2021 security update through the periodic LTS update from 4.14.191 to |
| 102 | + | 4.14.199. |
| 103 | + | The kernel versions that were shipped to Pixel 4 XL devices are (from running |
| 104 | + | `strings` on `boot.img` in the firmware images): |
| 105 | + | |
| 106 | + | - in the December 2020 update: `4.14.191-gf6c9439f069c-ab6924784` (still vulnerable?) |
| 107 | + | - in the January 2021 update: `4.14.191-gd36f32db91a3-ab6960308` (still vulnerable?) |
| 108 | + | - in the February 2021 update: `4.14.191-gd36f32db91a3-ab7006457` (still vulnerable?) |
| 109 | + | - in the March 2021 update: `4.14.199-g815ef3fd6754-ab7079165` (fixed) |
| 110 | + | - in the April 2021 update: `4.14.199-gb0863551cb91-ab7132611` (fixed) |
| 111 | + | |
| 112 | + | |
| 113 | + | ## The Exploit |
| 114 | + | |
| 115 | + | (The terms *exploit primitive*, *exploit strategy*, *exploit technique*, and *exploit flow* are [defined here](https://googleprojectzero.blogspot.com/2020/06/a-survey-of-recent-ios-kernel-exploits.html).) |
| 116 | + | |
| 117 | + | **Exploit strategy (or strategies):** N/A - no exploit sample to analyze |
| 118 | + | |
| 119 | + | **Exploit flow:** |
| 120 | + | |
| 121 | + | **Known cases of the same exploit flow:** |
| 122 | + | |
| 123 | + | **Part of an exploit chain?** |
| 124 | + | |
| 125 | + | ## The Next Steps |
| 126 | + | |
| 127 | + | ### Variant analysis |
| 128 | + | |
| 129 | + | **Areas/approach for variant analysis (and why):** |
| 130 | + | |
| 131 | + | I think there are two approaches for variant analysis here: |
| 132 | + | |
| 133 | + | 1. Check whether any Linux kernel patches listed in Android Security Bulletins |
| 134 | + | are referenced by other commits in the `Fixes:` tag, and verify for any hits |
| 135 | + | that they either aren't security-relevant or have also been included in an ASB. |
| 136 | + | 2. Look whether there are any other codepaths that extract a file from an epoll |
| 137 | + | item and assume that its refcount is non-zero. |
| 138 | + | |
| 139 | + | **Found variants:** |
| 140 | + | |
| 141 | + | I found no variants with clear security implications. |
| 142 | + | |
| 143 | + | Re #1, the following upstream Linux commits referenced in bulletins from 2020 |
| 144 | + | and 2021 are referenced by followup fix commits: |
| 145 | + | |
| 146 | + | - d0cb50185ae9 (`do_last(): fetch directory ->i_mode and ->i_uid before it's too late`) |
| 147 | + | - followup: 6404674acd59 (`vfs: fix do_last() regression`) |
| 148 | + | - reported by syzkaller: https://syzkaller.appspot.com/bug?extid=190005201ced78a74ad6 |
| 149 | + | - looks like just a NULL deref when racing? |
| 150 | + | - 07e6124a1a46 (`vt: selection, close sel_buffer race`) |
| 151 | + | - followup: e8c75a30a23c (`vt: selection, push sel_lock up`) |
| 152 | + | - deadlock fix |
| 153 | + | - followup: 4b70dd57a15d (`vt: selection, push console lock down`) |
| 154 | + | - deadlock fix |
| 155 | + | - 594cc251fdd0 (`make 'user_access_begin()' do 'access_ok()'`) |
| 156 | + | - followup: ab10ae1c3bef (`lib: Reduce user_access_begin() boundaries in strncpy_from_user() and strnlen_user()`) |
| 157 | + | - looks like a powerpc-specific performance regression fix? |
| 158 | + | - 6d390e4b5d48 (`locks: fix a potential use-after-free problem when wakeup a waiter`) |
| 159 | + | - followup: dcf23ac3e846 (`locks: reinstate locks_delete_block optimization`) |
| 160 | + | - performance regression fix |
| 161 | + | - a9ed4a6560b8 (`epoll: Keep a reference on files added to the check list`) |
| 162 | + | - followup: 77f4689de17c (`fix regression in "epoll: Keep a reference on files added to the check list"`) |
| 163 | + | - original case |
| 164 | + | - 21998a351512 (`x86/speculation: Avoid force-disabling IBPB based on STIBP and enhanced IBRS.`) |
| 165 | + | - followup: 33fc379df76b (`x86/speculation: Fix prctl() when spectre_v2_user={seccomp,prctl},ibpb`) |
| 166 | + | - fixes incorrect reporting of speculation mitigation status on X86 |
| 167 | + | - followup: 1978b3a53a74 (`x86/speculation: Allow IBPB to be conditionally enabled on CPUs with always-on STIBP`) |
| 168 | + | - fixes not being able to turn on IBPB on X86 |
| 169 | + | - 8019ad13ef7f (`futex: Fix inode life-time issue`) |
| 170 | + | - followup: 8d67743653dc (`futex: Unbreak futex hashing`) |
| 171 | + | - performance regression fix, theoretically also correctness fix |
| 172 | + | |
| 173 | + | Re #2: The only place that looks vaguely interesting in that regard is |
| 174 | + | `ep_item_poll()`: From what I can tell, it can invoke `vfs_poll()` on a file |
| 175 | + | whose refcount is already zero, but only before the file's `->release()` handler |
| 176 | + | is called. But I think that's fine. |
| 177 | + | |
| 178 | + | ### Structural improvements |
| 179 | + | |
| 180 | + | What are structural improvements such as ways to kill the bug class, prevent the introduction of this vulnerability, mitigate the exploit flow, make this type of vulnerability harder to exploit, etc.? |
| 181 | + | |
| 182 | + | **Ideas to kill the bug class:** |
| 183 | + | In my opinion, the bug class here is "object state confusion", and killing the |
| 184 | + | bug class would have to involve using static analysis and annotations to |
| 185 | + | sanity-check whether object states match the requirements. |
| 186 | + | |
| 187 | + | **Ideas to mitigate the exploit flow:** N/A |
| 188 | + | |
| 189 | + | **Other potential improvements:** |
| 190 | + | When cherrypicking specific security fixes, it would probably be a good idea to |
| 191 | + | at least monitor the upstream repository for commits that refer to the |
| 192 | + | cherrypicked patch with `Fixes:`. |
| 193 | + | |
| 194 | + | ### 0-day detection methods |
| 195 | + | |
| 196 | + | What are potential detection methods for similar 0-days? Meaning are there any ideas of how this exploit or similar exploits could be detected **as a 0-day**? |
| 197 | + | |
| 198 | + | ## Other References |
| 199 | + | |