d2480c7e87
Summary: NOTE: If your `hg bisect` brings you here & the error you are seeing looks like ` expected Result<&PyAny, PyErr>, found Result<Bound<'_, PyAny>, PyErr>` then see these [migration notes](https://pyo3.rs/v0.21.0/migration.html#from-020-to-021) for the fix or click on `fbcode/security/ace/pyo3/authz.rs` or similar files from bellow to see what the fix is! In order to upgrade `pyo3` to [`0.21.1`](https://github.com/PyO3/pyo3/releases/tag/v0.21.1), the following had to take place: ## [PyO3] * Address [migration notes](https://pyo3.rs/v0.21.0/migration.html#from-020-to-021) for `Bound<'py, T>` * Address [#3595](https://github.com/PyO3/pyo3/pull/3595) - this is done in a crude way for now since there are many call sites depending on `fbcode/dba/rust/common/service_address/py/pyo3_conversion_helper.rs` which would require a more thorough review. * Address [#3821](https://github.com/PyO3/pyo3/pull/3821) - `pyo3-build-config` is now dependent on and used by PyO3 macros. Currently, the only thing that gets checked is `abi3` compatibility. To address that, we introduce a fixup for that automatically generate `pyo3-build-config*.txt` configuration files, given an `fbsource` Python version. We are tryin to stay as close to `pyo3-build-config*.txt` spec as possible even though only a single bool from that file is ever since there is validation for the other fields but also to future proof future changes. By also generating this file ourselves, we prevent PyO3 from attempting to locate a Python interpreter some other way which seems to start leaking into the PyO3 API as an implementation choice already. ## [PyPi + Rust] * Upgrade `orjson` to [`3.10.1`](https://github.com/ijl/orjson/releases/tag/3.10.1) * Added `README.md` notes for future upgrades * Upgrade `py-polars` to [`0.20.22`](https://github.com/pola-rs/polars/releases/tag/py-0.20.22) * Removed `py-polars` and `polars` from `target_os = "windows"`. The `third-party/pypi/polars` Python extension has only been supported for Mac and Linux for a while now so its only natural to do that on the Rust side as well. What is more, `polars-util` is bringing in `stacker = = "0.1.14"` which does not build on Windows mostly because its using a much more recent version of `libc` than we use in `third-party/rust` (see P1228807344) * Upgrade `pydantic-core` to [`2.18.2`](https://github.com/pydantic/pydantic-core/releases/tag/v2.18.2) * Removed old `third-party/pypi/pydantic-core` versions * Upgrade `safetensors` to [`0.4.3`](https://github.com/huggingface/safetensors/releases/tag/v0.4.3) * Patch `third-party/pypi/cryptography/41.0.7` to account for PyO3's [#2975](https://github.com/PyO3/pyo3/pull/2975) (`0.19.0`) where `pyo3::once_cell` was renamed to `pyo3::sync` (see D56826865) * Upgrade `tokenizers` and `tokenizers-python` to [`0.19.1`](https://github.com/huggingface/tokenizers/releases/tag/v0.19.1) * Removed old `third-party/pypi/tokenizers` versions * Fixed `third-party/pypi/tokenizers/BUCK` * Migrated `third-party/pypi/tokenizers/0.19.1/BUCK` to mirror other Python packages that bind to Rust crates e.g. `libcst`, `polars` etc. * Removed Windows support from `third-party/pypi/tokenizers` ## [Rust] * Upgrade `indexmap` to [`2.2.6`](https://github.com/indexmap-rs/indexmap/releases/tag/2.2.6) * Both latest `pydantic-core` and `c2pa` depend on `serde_json > 1.0.112` which brings in `indexmap = 2.2.1`. The latter has deprecated `.take` and `.remove` on both `IndexMap` and `IndexSet` leading to a bunch of errors (see bellow), all addressed: ```bash error: use of deprecated method `indexmap::set::IndexSet::<T, S>::take`: `take` disrupts the set order -- use `swap_take` or `shift_take` for explicit behavior. --> fbcode/hphp/hack/src/package/types.rs:76:16 | 76 | self.0.take(value) | ^^^^ | = note: `-D deprecated` implied by `-D warnings` = help: to override `-D warnings` add `#[allow(deprecated)]` ``` Reviewed By: capickett Differential Revision: D56671179 fbshipit-source-id: 3ae69c069b7f005570c1a06d37194cf056282a18 |
||
---|---|---|
.github/workflows | ||
assets | ||
experimental | ||
public_autocargo/experimental | ||
reverie | ||
reverie-examples | ||
reverie-memory | ||
reverie-process | ||
reverie-ptrace | ||
reverie-syscalls | ||
reverie-util | ||
safeptrace | ||
scripts | ||
tests | ||
.gitignore | ||
Cargo.toml | ||
CHANGELOG.md | ||
CODE_OF_CONDUCT.md | ||
CONTRIBUTING.md | ||
LICENSE | ||
README.md | ||
rust-toolchain.toml | ||
rustfmt.toml |
Reverie
Reverie is a user space system-call interception framework for Linux. It can be used to intercept, modify, or elide a syscall before the kernel executes it. In essence, Reverie sits at the boundary between user space and kernel space.
Some potential use cases include:
- Observability tools, like
strace
. - Failure injection to test error handling logic.
- Manipulating scheduling decisions to expose concurrency bugs.
See the reverie-examples
directory for examples of
tools that can be built with this library.
Features
- Ergonomic syscall handling. It is easy to modify syscall arguments or return values, inject multiple syscalls, or suppress the syscall entirely.
- Async-await usage allows blocking syscalls to be handled without blocking other guest threads.
- Can intercept CPUID and RDTSC instructions.
- Typed syscalls. Every syscall has a wrapper to make it easier to access pointer values. This also enables strace-like pretty-printing for free.
- Avoid intercepting syscalls we don't care about. For example, if we only care
about
sys_open
, we can avoid paying the cost of intercepting other syscalls. - Can act as a GDB server. This allows connection via the GDB client where you can step through the process that is being traced by Reverie.
Terminology and Background
Clients of the Reverie library write tools. A tool runs a shell command creating a guest process tree, comprised of multiple guest threads and processes, in an instrumented manner. Each Reverie tool is written as a set of callbacks (i.e. handlers), which are invoked each time a guest thread encounters a trappable event such as a system call or inbound signal. The tool can stipulate exactly which events streams it subscribes to. The tool itself is stateful, maintaining state between consecutive invocations.
Building and Testing
Reverie needs the following system-level dependencies:
sudo apt install pkg-config libunwind-devel
(These are required to get backtraces from the guest process.)
To test, run:
cargo test -- --test-threads=1
To run the strace
example:
cd reverie-examples
cargo run --bin strace -- ls
Usage
Currently, there is only the reverie-ptrace
backend which uses ptrace
to
intercept syscalls. Copy one of the example tools to a new Rust project (e.g.
cargo init
). You’ll see that it depends both on the general reverie
crate
for the API and on the specific backend implementation crate,
reverie_ptrace
.
Performance
Since ptrace
adds significant overhead when the guest has a syscall-heavy
workload, Reverie will add similarly-significant overhead. The slowdown depends
on how many syscalls are being performed and are intercepted by the tool.
The primary way you can improve performance with the current implementation is
to implement the subscriptions
callback, specifying a minimal set of syscalls
that are actually required by your tool.
Overall architecture
When implementing a Reverie tool, there are three main components of the tool to consider:
- The process-level state,
- the thread-level state, and
- the global state (which is shared among all processes and threads in the traced process tree).
This separation of process-, thread-, and global-state is meant to provide an abstraction that allows future Reverie backends to be used without requiring the tool to be rewritten.
Process State
Whenever a new process is spawned (i.e., when fork
or clone
is called by the
guest), a new instance of the process state struct is created and managed by the
Reverie backend.
Thread State
When a syscall is intercepted, it is always associated with the thread that called it.
Global State
The global state is accessed via RPC messages. Since a future Reverie backend may use in-guest syscall interception, the syscall handler code may not be running in the same address space. Thus, all shared state is communicated via RPC messages. (There is, however, currently only a single ptrace-based backend where all tracer code is in the same address space.)
Platform and Architecture Support
Reverie currently only supports the following platforms and architectures:
Platform | Architecture | Notes |
---|---|---|
Linux | x86-64 | Full support |
Linux | aarch64 | Missing timers & cpuid/rdtsc interception |
Other platforms and architectures are currently unplanned.
Future Plans
- Add a more performant backend. The rough goal is to have handlers executing in the guest with close to regular functional call overhead. Global state and its methods will still be centralized, but the RPC/IPC mechanism between guest & the centralized tool process will become much more efficient.
Contributing
Contributions are welcome! Please see the CONTRIBUTING.md file for guidance.
License
Reverie is BSD 2-Clause licensed as found in the LICENSE file.