Buffer Sharing and Synchronization¶
The dma-buf subsystem provides the framework for sharing buffers for hardware (DMA) access across multiple device drivers and subsystems, and for synchronizing asynchronous hardware access.
This is used, for example, by drm “prime” multi-GPU support, but is of course not limited to GPU use cases.
The three main components of this are: (1) dma-buf, representing a sg_table and exposed to userspace as a file descriptor to allow passing between devices, (2) fence, which provides a mechanism to signal when one device has finished access, and (3) reservation, which manages the shared or exclusive fence(s) associated with the buffer.
Reservation Objects¶
The reservation object provides a mechanism to manage shared and exclusive fences associated with a buffer. A reservation object can have attached one exclusive fence (normally associated with write operations) or N shared fences (read operations). The RCU mechanism is used to protect read access to fences from locked write-side updates.
See struct dma_resv
for more details.
Parameters
struct dma_resv *obj
the reservation object
Parameters
struct dma_resv *obj
the reservation object
Reserve space to add shared fences to a dma_resv.
Parameters
struct dma_resv *obj
reservation object
unsigned int num_fences
number of fences we want to add
Description
Should be called before dma_resv_add_shared_fence()
. Must
be called with obj locked through dma_resv_lock()
.
Note that the preallocated slots need to be re-reserved if obj is unlocked
at any time before calling dma_resv_add_shared_fence()
. This is validated
when CONFIG_DEBUG_MUTEXES is enabled.
RETURNS Zero for success, or -errno
reset shared fences for debugging
Parameters
struct dma_resv *obj
the dma_resv object to reset
Description
Reset the number of pre-reserved shared slots to test that drivers do
correct slot allocation using dma_resv_reserve_shared()
. See also
dma_resv_list.shared_max
.
Add a fence to a shared slot
Parameters
struct dma_resv *obj
the reservation object
struct dma_fence *fence
the shared fence to add
Description
Add a fence to a shared slot, obj must be locked with dma_resv_lock()
, and
dma_resv_reserve_shared()
has been called.
See also dma_resv.fence
for a discussion of the semantics.
-
void dma_resv_add_excl_fence(struct dma_resv *obj, struct dma_fence *fence)¶
Add an exclusive fence.
Parameters
struct dma_resv *obj
the reservation object
struct dma_fence *fence
the exclusive fence to add
Description
Add a fence to the exclusive slot. obj must be locked with dma_resv_lock()
.
Note that this function replaces all fences attached to obj, see also
dma_resv.fence_excl
for a discussion of the semantics.
-
struct dma_fence *dma_resv_iter_first_unlocked(struct dma_resv_iter *cursor)¶
first fence in an unlocked dma_resv obj.
Parameters
struct dma_resv_iter *cursor
the cursor with the current position
Description
Returns the first fence from an unlocked dma_resv obj.
-
struct dma_fence *dma_resv_iter_next_unlocked(struct dma_resv_iter *cursor)¶
next fence in an unlocked dma_resv obj.
Parameters
struct dma_resv_iter *cursor
the cursor with the current position
Description
Returns the next fence from an unlocked dma_resv obj.
-
struct dma_fence *dma_resv_iter_first(struct dma_resv_iter *cursor)¶
first fence from a locked dma_resv object
Parameters
struct dma_resv_iter *cursor
cursor to record the current position
Description
Return the first fence in the dma_resv object while holding the
dma_resv.lock
.
-
struct dma_fence *dma_resv_iter_next(struct dma_resv_iter *cursor)¶
next fence from a locked dma_resv object
Parameters
struct dma_resv_iter *cursor
cursor to record the current position
Description
Return the next fences from the dma_resv object while holding the
dma_resv.lock
.
-
int dma_resv_copy_fences(struct dma_resv *dst, struct dma_resv *src)¶
Copy all fences from src to dst.
Parameters
struct dma_resv *dst
the destination reservation object
struct dma_resv *src
the source reservation object
Description
Copy all fences from src to dst. dst-lock must be held.
-
int dma_resv_get_fences(struct dma_resv *obj, struct dma_fence **fence_excl, unsigned int *shared_count, struct dma_fence ***shared)¶
Get an object’s shared and exclusive fences without update side lock held
Parameters
struct dma_resv *obj
the reservation object
struct dma_fence **fence_excl
the returned exclusive fence (or NULL)
unsigned int *shared_count
the number of shared fences returned
struct dma_fence ***shared
the array of shared fence ptrs returned (array is krealloc’d to the required size, and must be freed by caller)
Description
Retrieve all fences from the reservation object. If the pointer for the exclusive fence is not specified the fence is put into the array of the shared fences as well. Returns either zero or -ENOMEM.
-
long dma_resv_wait_timeout(struct dma_resv *obj, bool wait_all, bool intr, unsigned long timeout)¶
Wait on reservation’s objects shared and/or exclusive fences.
Parameters
struct dma_resv *obj
the reservation object
bool wait_all
if true, wait on all fences, else wait on just exclusive fence
bool intr
if true, do interruptible wait
unsigned long timeout
timeout value in jiffies or zero to return immediately
Description
Callers are not required to hold specific locks, but maybe hold
dma_resv_lock()
already
RETURNS
Returns -ERESTARTSYS if interrupted, 0 if the wait timed out, or
greater than zer on success.
-
bool dma_resv_test_signaled(struct dma_resv *obj, bool test_all)¶
Test if a reservation object’s fences have been signaled.
Parameters
struct dma_resv *obj
the reservation object
bool test_all
if true, test all fences, otherwise only test the exclusive fence
Description
Callers are not required to hold specific locks, but maybe hold
dma_resv_lock()
already.
RETURNS
True if all fences signaled, else false.
-
struct dma_resv_list¶
a list of shared fences
Definition
struct dma_resv_list {
struct rcu_head rcu;
u32 shared_count, shared_max;
struct dma_fence __rcu *shared[];
};
Members
rcu
for internal use
shared_count
table of shared fences
shared_max
for growing shared fence table
shared
shared fence table
-
struct dma_resv¶
a reservation object manages fences for a buffer
Definition
struct dma_resv {
struct ww_mutex lock;
seqcount_ww_mutex_t seq;
struct dma_fence __rcu *fence_excl;
struct dma_resv_list __rcu *fence;
};
Members
lock
Update side lock. Don’t use directly, instead use the wrapper functions like
dma_resv_lock()
anddma_resv_unlock()
.Drivers which use the reservation object to manage memory dynamically also use this lock to protect buffer object state like placement, allocation policies or throughout command submission.
seq
Sequence count for managing RCU read-side synchronization, allows read-only access to fence_excl and fence while ensuring we take a consistent snapshot.
fence_excl
The exclusive fence, if there is one currently.
There are two ways to update this fence:
First by calling
dma_resv_add_excl_fence()
, which replaces all fences attached to the reservation object. To guarantee that no fences are lost, this new fence must signal only after all previous fences, both shared and exclusive, have signalled. In some cases it is convenient to achieve that by attaching astruct dma_fence_array
with all the new and old fences.Alternatively the fence can be set directly, which leaves the shared fences unchanged. To guarantee that no fences are lost, this new fence must signal only after the previous exclusive fence has signalled. Since the shared fences are staying intact, it is not necessary to maintain any ordering against those. If semantically only a new access is added without actually treating the previous one as a dependency the exclusive fences can be strung together using
struct dma_fence_chain
.
Note that actual semantics of what an exclusive or shared fence mean is defined by the user, for reservation objects shared across drivers see
dma_buf.resv
.fence
List of current shared fences.
There are no ordering constraints of shared fences against the exclusive fence slot. If a waiter needs to wait for all access, it has to wait for both sets of fences to signal.
A new fence is added by calling
dma_resv_add_shared_fence()
. Since this often needs to be done past the point of no return in command submission it cannot fail, and therefore sufficient slots need to be reserved by callingdma_resv_reserve_shared()
.Note that actual semantics of what an exclusive or shared fence mean is defined by the user, for reservation objects shared across drivers see
dma_buf.resv
.
Description
There are multiple uses for this, with sometimes slightly different rules in how the fence slots are used.
One use is to synchronize cross-driver access to a struct dma_buf
, either for
dynamic buffer management or just to handle implicit synchronization between
different users of the buffer in userspace. See dma_buf.resv
for a more
in-depth discussion.
The other major use is to manage access and locking within a driver in a
buffer based memory manager. struct ttm_buffer_object is the canonical
example here, since this is where reservation objects originated from. But
use in drivers is spreading and some drivers also manage struct
drm_gem_object
with the same scheme.
-
struct dma_resv_iter¶
current position into the dma_resv fences
Definition
struct dma_resv_iter {
struct dma_resv *obj;
bool all_fences;
struct dma_fence *fence;
unsigned int seq;
unsigned int index;
struct dma_resv_list *fences;
unsigned int shared_count;
bool is_restarted;
};
Members
obj
The dma_resv object we iterate over
all_fences
If all fences should be returned
fence
the currently handled fence
seq
sequence number to check for modifications
index
index into the shared fences
fences
the shared fences; private, MUST not dereference
shared_count
number of shared fences
is_restarted
true if this is the first returned fence
Description
Don’t touch this directly in the driver, use the accessor function instead.
-
void dma_resv_iter_begin(struct dma_resv_iter *cursor, struct dma_resv *obj, bool all_fences)¶
initialize a dma_resv_iter object
Parameters
struct dma_resv_iter *cursor
The dma_resv_iter object to initialize
struct dma_resv *obj
The dma_resv object which we want to iterate over
bool all_fences
If all fences should be returned or just the exclusive one
-
void dma_resv_iter_end(struct dma_resv_iter *cursor)¶
cleanup a dma_resv_iter object
Parameters
struct dma_resv_iter *cursor
the dma_resv_iter object which should be cleaned up
Description
Make sure that the reference to the fence in the cursor is properly dropped.
-
bool dma_resv_iter_is_exclusive(struct dma_resv_iter *cursor)¶
test if the current fence is the exclusive one
Parameters
struct dma_resv_iter *cursor
the cursor of the current position
Description
Returns true if the currently returned fence is the exclusive one.
-
bool dma_resv_iter_is_restarted(struct dma_resv_iter *cursor)¶
test if this is the first fence after a restart
Parameters
struct dma_resv_iter *cursor
the cursor with the current position
Description
Return true if this is the first fence in an iteration after a restart.
-
dma_resv_for_each_fence_unlocked¶
dma_resv_for_each_fence_unlocked (cursor, fence)
unlocked fence iterator
Parameters
cursor
a
struct dma_resv_iter
pointerfence
the current fence
Description
Iterate over the fences in a struct dma_resv
object without holding the
dma_resv.lock
and using RCU instead. The cursor needs to be initialized
with dma_resv_iter_begin()
and cleaned up with dma_resv_iter_end()
. Inside
the iterator a reference to the dma_fence is held and the RCU lock dropped.
When the dma_resv is modified the iteration starts over again.
-
dma_resv_for_each_fence¶
dma_resv_for_each_fence (cursor, obj, all_fences, fence)
fence iterator
Parameters
cursor
a
struct dma_resv_iter
pointerobj
a dma_resv object pointer
all_fences
true if all fences should be returned
fence
the current fence
Description
Iterate over the fences in a struct dma_resv
object while holding the
dma_resv.lock
. all_fences controls if the shared fences are returned as
well. The cursor initialisation is part of the iterator and the fence stays
valid as long as the lock is held and so no extra reference to the fence is
taken.
Parameters
struct dma_resv *obj
the reservation object
struct ww_acquire_ctx *ctx
the locking context
Description
Locks the reservation object for exclusive access and modification. Note, that the lock is only against other writers, readers will run concurrently with a writer under RCU. The seqlock is used to notify readers if they overlap with a writer.
As the reservation object may be locked by multiple parties in an undefined order, a #ww_acquire_ctx is passed to unwind if a cycle is detected. See ww_mutex_lock() and ww_acquire_init(). A reservation object may be locked by itself by passing NULL as ctx.
When a die situation is indicated by returning -EDEADLK all locks held by
ctx must be unlocked and then dma_resv_lock_slow()
called on obj.
Unlocked by calling dma_resv_unlock()
.
See also dma_resv_lock_interruptible()
for the interruptible variant.
-
int dma_resv_lock_interruptible(struct dma_resv *obj, struct ww_acquire_ctx *ctx)¶
lock the reservation object
Parameters
struct dma_resv *obj
the reservation object
struct ww_acquire_ctx *ctx
the locking context
Description
Locks the reservation object interruptible for exclusive access and modification. Note, that the lock is only against other writers, readers will run concurrently with a writer under RCU. The seqlock is used to notify readers if they overlap with a writer.
As the reservation object may be locked by multiple parties in an undefined order, a #ww_acquire_ctx is passed to unwind if a cycle is detected. See ww_mutex_lock() and ww_acquire_init(). A reservation object may be locked by itself by passing NULL as ctx.
When a die situation is indicated by returning -EDEADLK all locks held by
ctx must be unlocked and then dma_resv_lock_slow_interruptible()
called on
obj.
Unlocked by calling dma_resv_unlock()
.
-
void dma_resv_lock_slow(struct dma_resv *obj, struct ww_acquire_ctx *ctx)¶
slowpath lock the reservation object
Parameters
struct dma_resv *obj
the reservation object
struct ww_acquire_ctx *ctx
the locking context
Description
Acquires the reservation object after a die case. This function
will sleep until the lock becomes available. See dma_resv_lock()
as
well.
See also dma_resv_lock_slow_interruptible()
for the interruptible variant.
-
int dma_resv_lock_slow_interruptible(struct dma_resv *obj, struct ww_acquire_ctx *ctx)¶
slowpath lock the reservation object, interruptible
Parameters
struct dma_resv *obj
the reservation object
struct ww_acquire_ctx *ctx
the locking context
Description
Acquires the reservation object interruptible after a die case. This function
will sleep until the lock becomes available. See
dma_resv_lock_interruptible()
as well.
Parameters
struct dma_resv *obj
the reservation object
Description
Tries to lock the reservation object for exclusive access and modification. Note, that the lock is only against other writers, readers will run concurrently with a writer under RCU. The seqlock is used to notify readers if they overlap with a writer.
Also note that since no context is provided, no deadlock protection is possible, which is also not needed for a trylock.
Returns true if the lock was acquired, false otherwise.
Parameters
struct dma_resv *obj
the reservation object
Description
Returns true if the mutex is locked, false if unlocked.
-
struct ww_acquire_ctx *dma_resv_locking_ctx(struct dma_resv *obj)¶
returns the context used to lock the object
Parameters
struct dma_resv *obj
the reservation object
Description
Returns the context used to lock a reservation object or NULL if no context was used or the object is not locked at all.
WARNING: This interface is pretty horrible, but TTM needs it because it doesn’t pass the struct ww_acquire_ctx around in some very long callchains. Everyone else just uses it to check whether they’re holding a reservation or not.
Parameters
struct dma_resv *obj
the reservation object
Description
Unlocks the reservation object following exclusive access.
Parameters
struct dma_resv *obj
the reservation object
Description
Returns the exclusive fence (if any). Caller must either hold the objects
through dma_resv_lock()
or the RCU read side lock through rcu_read_lock()
,
or one of the variants of each
RETURNS The exclusive fence or NULL
-
struct dma_fence *dma_resv_get_excl_unlocked(struct dma_resv *obj)¶
get the reservation object’s exclusive fence, without lock held.
Parameters
struct dma_resv *obj
the reservation object
Description
If there is an exclusive fence, this atomically increments it’s reference count and returns it.
RETURNS The exclusive fence or NULL if none
get the reservation object’s shared fence list
Parameters
struct dma_resv *obj
the reservation object
Description
Returns the shared fence list. Caller must either hold the objects
through dma_resv_lock()
or the RCU read side lock through rcu_read_lock()
,
or one of the variants of each
DMA Fences¶
DMA fences, represented by struct dma_fence
, are the kernel internal
synchronization primitive for DMA operations like GPU rendering, video
encoding/decoding, or displaying buffers on a screen.
A fence is initialized using dma_fence_init()
and completed using
dma_fence_signal()
. Fences are associated with a context, allocated through
dma_fence_context_alloc()
, and all fences on the same context are
fully ordered.
Since the purposes of fences is to facilitate cross-device and cross-application synchronization, there’s multiple ways to use one:
Individual fences can be exposed as a
sync_file
, accessed as a file descriptor from userspace, created by callingsync_file_create()
. This is called explicit fencing, since userspace passes around explicit synchronization points.Some subsystems also have their own explicit fencing primitives, like
drm_syncobj
. Compared tosync_file
, adrm_syncobj
allows the underlying fence to be updated.Then there’s also implicit fencing, where the synchronization points are implicitly passed around as part of shared
dma_buf
instances. Such implicit fences are stored instruct dma_resv
through thedma_buf.resv
pointer.
DMA Fence Cross-Driver Contract¶
Since dma_fence
provide a cross driver contract, all drivers must follow the
same rules:
Fences must complete in a reasonable time. Fences which represent kernels and shaders submitted by userspace, which could run forever, must be backed up by timeout and gpu hang recovery code. Minimally that code must prevent further command submission and force complete all in-flight fences, e.g. when the driver or hardware do not support gpu reset, or if the gpu reset failed for some reason. Ideally the driver supports gpu recovery which only affects the offending userspace context, and no other userspace submissions.
Drivers may have different ideas of what completion within a reasonable time means. Some hang recovery code uses a fixed timeout, others a mix between observing forward progress and increasingly strict timeouts. Drivers should not try to second guess timeout handling of fences from other drivers.
To ensure there’s no deadlocks of
dma_fence_wait()
against other locks drivers should annotate all code required to reachdma_fence_signal()
, which completes the fences, withdma_fence_begin_signalling()
anddma_fence_end_signalling()
.Drivers are allowed to call
dma_fence_wait()
while holdingdma_resv_lock()
. This means any code required for fence completion cannot acquire adma_resv
lock. Note that this also pulls in the entire established locking hierarchy arounddma_resv_lock()
anddma_resv_unlock()
.Drivers are allowed to call
dma_fence_wait()
from theirshrinker
callbacks. This means any code required for fence completion cannot allocate memory with GFP_KERNEL.Drivers are allowed to call
dma_fence_wait()
from theirmmu_notifier
respectivelymmu_interval_notifier
callbacks. This means any code required for fence completeion cannot allocate memory with GFP_NOFS or GFP_NOIO. Only GFP_ATOMIC is permissible, which might fail.
Note that only GPU drivers have a reasonable excuse for both requiring
mmu_interval_notifier
and shrinker
callbacks at the same time as having to
track asynchronous compute work using dma_fence
. No driver outside of
drivers/gpu should ever call dma_fence_wait()
in such contexts.
DMA Fence Signalling Annotations¶
Proving correctness of all the kernel code around dma_fence
through code
review and testing is tricky for a few reasons:
It is a cross-driver contract, and therefore all drivers must follow the same rules for lock nesting order, calling contexts for various functions and anything else significant for in-kernel interfaces. But it is also impossible to test all drivers in a single machine, hence brute-force N vs. N testing of all combinations is impossible. Even just limiting to the possible combinations is infeasible.
There is an enormous amount of driver code involved. For render drivers there’s the tail of command submission, after fences are published, scheduler code, interrupt and workers to process job completion, and timeout, gpu reset and gpu hang recovery code. Plus for integration with core mm with have
mmu_notifier
, respectivelymmu_interval_notifier
, andshrinker
. For modesetting drivers there’s the commit tail functions between when fences for an atomic modeset are published, and when the corresponding vblank completes, including any interrupt processing and related workers. Auditing all that code, across all drivers, is not feasible.Due to how many other subsystems are involved and the locking hierarchies this pulls in there is extremely thin wiggle-room for driver-specific differences.
dma_fence
interacts with almost all of the core memory handling through page fault handlers viadma_resv
,dma_resv_lock()
anddma_resv_unlock()
. On the other side it also interacts through all allocation sites throughmmu_notifier
andshrinker
.
Furthermore lockdep does not handle cross-release dependencies, which means
any deadlocks between dma_fence_wait()
and dma_fence_signal()
can’t be caught
at runtime with some quick testing. The simplest example is one thread
waiting on a dma_fence
while holding a lock:
lock(A);
dma_fence_wait(B);
unlock(A);
while the other thread is stuck trying to acquire the same lock, which prevents it from signalling the fence the previous thread is stuck waiting on:
lock(A);
unlock(A);
dma_fence_signal(B);
By manually annotating all code relevant to signalling a dma_fence
we can
teach lockdep about these dependencies, which also helps with the validation
headache since now lockdep can check all the rules for us:
cookie = dma_fence_begin_signalling();
lock(A);
unlock(A);
dma_fence_signal(B);
dma_fence_end_signalling(cookie);
For using dma_fence_begin_signalling()
and dma_fence_end_signalling()
to
annotate critical sections the following rules need to be observed:
All code necessary to complete a
dma_fence
must be annotated, from the point where a fence is accessible to other threads, to the point wheredma_fence_signal()
is called. Un-annotated code can contain deadlock issues, and due to the very strict rules and many corner cases it is infeasible to catch these just with review or normal stress testing.struct dma_resv
deserves a special note, since the readers are only protected by rcu. This means the signalling critical section starts as soon as the new fences are installed, even beforedma_resv_unlock()
is called.The only exception are fast paths and opportunistic signalling code, which calls
dma_fence_signal()
purely as an optimization, but is not required to guarantee completion of adma_fence
. The usual example is a wait IOCTL which callsdma_fence_signal()
, while the mandatory completion path goes through a hardware interrupt and possible job completion worker.To aid composability of code, the annotations can be freely nested, as long as the overall locking hierarchy is consistent. The annotations also work both in interrupt and process context. Due to implementation details this requires that callers pass an opaque cookie from
dma_fence_begin_signalling()
todma_fence_end_signalling()
.Validation against the cross driver contract is implemented by priming lockdep with the relevant hierarchy at boot-up. This means even just testing with a single device is enough to validate a driver, at least as far as deadlocks with
dma_fence_wait()
againstdma_fence_signal()
are concerned.
DMA Fences Functions Reference¶
Parameters
void
no arguments
Description
Return a stub fence which is already signaled. The fence’s timestamp corresponds to the first time after boot this function is called.
Parameters
void
no arguments
Description
Return a newly allocated and signaled stub fence.
-
u64 dma_fence_context_alloc(unsigned num)¶
allocate an array of fence contexts
Parameters
unsigned num
amount of contexts to allocate
Description
This function will return the first index of the number of fence contexts
allocated. The fence context is used for setting dma_fence.context
to a
unique number by passing the context to dma_fence_init()
.
-
bool dma_fence_begin_signalling(void)¶
begin a critical DMA fence signalling section
Parameters
void
no arguments
Description
Drivers should use this to annotate the beginning of any code section
required to eventually complete dma_fence
by calling dma_fence_signal()
.
The end of these critical sections are annotated with
dma_fence_end_signalling()
.
Opaque cookie needed by the implementation, which needs to be passed to
dma_fence_end_signalling()
.
Return
-
void dma_fence_end_signalling(bool cookie)¶
end a critical DMA fence signalling section
Parameters
bool cookie
opaque cookie from
dma_fence_begin_signalling()
Description
Closes a critical section annotation opened by dma_fence_begin_signalling()
.
-
int dma_fence_signal_timestamp_locked(struct dma_fence *fence, ktime_t timestamp)¶
signal completion of a fence
Parameters
struct dma_fence *fence
the fence to signal
ktime_t timestamp
fence signal timestamp in kernel’s CLOCK_MONOTONIC time domain
Description
Signal completion for software callbacks on a fence, this will unblock
dma_fence_wait()
calls and run all the callbacks added with
dma_fence_add_callback()
. Can be called multiple times, but since a fence
can only go from the unsignaled to the signaled state and not back, it will
only be effective the first time. Set the timestamp provided as the fence
signal timestamp.
Unlike dma_fence_signal_timestamp()
, this function must be called with
dma_fence.lock
held.
Returns 0 on success and a negative error value when fence has been signalled already.
-
int dma_fence_signal_timestamp(struct dma_fence *fence, ktime_t timestamp)¶
signal completion of a fence
Parameters
struct dma_fence *fence
the fence to signal
ktime_t timestamp
fence signal timestamp in kernel’s CLOCK_MONOTONIC time domain
Description
Signal completion for software callbacks on a fence, this will unblock
dma_fence_wait()
calls and run all the callbacks added with
dma_fence_add_callback()
. Can be called multiple times, but since a fence
can only go from the unsignaled to the signaled state and not back, it will
only be effective the first time. Set the timestamp provided as the fence
signal timestamp.
Returns 0 on success and a negative error value when fence has been signalled already.
Parameters
struct dma_fence *fence
the fence to signal
Description
Signal completion for software callbacks on a fence, this will unblock
dma_fence_wait()
calls and run all the callbacks added with
dma_fence_add_callback()
. Can be called multiple times, but since a fence
can only go from the unsignaled to the signaled state and not back, it will
only be effective the first time.
Unlike dma_fence_signal()
, this function must be called with dma_fence.lock
held.
Returns 0 on success and a negative error value when fence has been signalled already.
Parameters
struct dma_fence *fence
the fence to signal
Description
Signal completion for software callbacks on a fence, this will unblock
dma_fence_wait()
calls and run all the callbacks added with
dma_fence_add_callback()
. Can be called multiple times, but since a fence
can only go from the unsignaled to the signaled state and not back, it will
only be effective the first time.
Returns 0 on success and a negative error value when fence has been signalled already.
-
signed long dma_fence_wait_timeout(struct dma_fence *fence, bool intr, signed long timeout)¶
sleep until the fence gets signaled or until timeout elapses
Parameters
struct dma_fence *fence
the fence to wait on
bool intr
if true, do an interruptible wait
signed long timeout
timeout value in jiffies, or MAX_SCHEDULE_TIMEOUT
Description
Returns -ERESTARTSYS if interrupted, 0 if the wait timed out, or the remaining timeout in jiffies on success. Other error values may be returned on custom implementations.
Performs a synchronous wait on this fence. It is assumed the caller directly or indirectly (buf-mgr between reservation and committing) holds a reference to the fence, otherwise the fence might be freed before return, resulting in undefined behavior.
See also dma_fence_wait()
and dma_fence_wait_any_timeout()
.
Parameters
struct kref *kref
Description
This is the default release functions for dma_fence
. Drivers shouldn’t call
this directly, but instead call dma_fence_put()
.
Parameters
struct dma_fence *fence
fence to release
Description
This is the default implementation for dma_fence_ops.release
. It calls
kfree_rcu()
on fence.
Parameters
struct dma_fence *fence
the fence to enable
Description
This will request for sw signaling to be enabled, to make the fence
complete as soon as possible. This calls dma_fence_ops.enable_signaling
internally.
-
int dma_fence_add_callback(struct dma_fence *fence, struct dma_fence_cb *cb, dma_fence_func_t func)¶
add a callback to be called when the fence is signaled
Parameters
struct dma_fence *fence
the fence to wait on
struct dma_fence_cb *cb
the callback to register
dma_fence_func_t func
the function to call
Description
Add a software callback to the fence. The caller should keep a reference to the fence.
cb will be initialized by dma_fence_add_callback()
, no initialization
by the caller is required. Any number of callbacks can be registered
to a fence, but a callback can only be registered to one fence at a time.
If fence is already signaled, this function will return -ENOENT (and not call the callback).
Note that the callback can be called from an atomic context or irq context.
Returns 0 in case of success, -ENOENT if the fence is already signaled and -EINVAL in case of error.
Parameters
struct dma_fence *fence
the dma_fence to query
Description
This wraps dma_fence_get_status_locked()
to return the error status
condition on a signaled fence. See dma_fence_get_status_locked()
for more
details.
Returns 0 if the fence has not yet been signaled, 1 if the fence has been signaled without an error condition, or a negative error code if the fence has been completed in err.
-
bool dma_fence_remove_callback(struct dma_fence *fence, struct dma_fence_cb *cb)¶
remove a callback from the signaling list
Parameters
struct dma_fence *fence
the fence to wait on
struct dma_fence_cb *cb
the callback to remove
Description
Remove a previously queued callback from the fence. This function returns true if the callback is successfully removed, or false if the fence has already been signaled.
WARNING: Cancelling a callback should only be done if you really know what you’re doing, since deadlocks and race conditions could occur all too easily. For this reason, it should only ever be done on hardware lockup recovery, with a reference held to the fence.
Behaviour is undefined if cb has not been added to fence using
dma_fence_add_callback()
beforehand.
-
signed long dma_fence_default_wait(struct dma_fence *fence, bool intr, signed long timeout)¶
default sleep until the fence gets signaled or until timeout elapses
Parameters
struct dma_fence *fence
the fence to wait on
bool intr
if true, do an interruptible wait
signed long timeout
timeout value in jiffies, or MAX_SCHEDULE_TIMEOUT
Description
Returns -ERESTARTSYS if interrupted, 0 if the wait timed out, or the remaining timeout in jiffies on success. If timeout is zero the value one is returned if the fence is already signaled for consistency with other functions taking a jiffies timeout.
-
signed long dma_fence_wait_any_timeout(struct dma_fence **fences, uint32_t count, bool intr, signed long timeout, uint32_t *idx)¶
sleep until any fence gets signaled or until timeout elapses
Parameters
struct dma_fence **fences
array of fences to wait on
uint32_t count
number of fences to wait on
bool intr
if true, do an interruptible wait
signed long timeout
timeout value in jiffies, or MAX_SCHEDULE_TIMEOUT
uint32_t *idx
used to store the first signaled fence index, meaningful only on positive return
Description
Returns -EINVAL on custom fence wait implementation, -ERESTARTSYS if interrupted, 0 if the wait timed out, or the remaining timeout in jiffies on success.
Synchronous waits for the first fence in the array to be signaled. The caller needs to hold a reference to all fences in the array, otherwise a fence might be freed before return, resulting in undefined behavior.
See also dma_fence_wait()
and dma_fence_wait_timeout()
.
-
void dma_fence_init(struct dma_fence *fence, const struct dma_fence_ops *ops, spinlock_t *lock, u64 context, u64 seqno)¶
Initialize a custom fence.
Parameters
struct dma_fence *fence
the fence to initialize
const struct dma_fence_ops *ops
the dma_fence_ops for operations on this fence
spinlock_t *lock
the irqsafe spinlock to use for locking this fence
u64 context
the execution context this fence is run on
u64 seqno
a linear increasing sequence number for this context
Description
Initializes an allocated fence, the caller doesn’t have to keep its
refcount after committing with this fence, but it will need to hold a
refcount again if dma_fence_ops.enable_signaling
gets called.
context and seqno are used for easy comparison between fences, allowing
to check which fence is later by simply using dma_fence_later()
.
-
struct dma_fence¶
software synchronization primitive
Definition
struct dma_fence {
spinlock_t *lock;
const struct dma_fence_ops *ops;
union {
struct list_head cb_list;
ktime_t timestamp;
struct rcu_head rcu;
};
u64 context;
u64 seqno;
unsigned long flags;
struct kref refcount;
int error;
};
Members
lock
spin_lock_irqsave used for locking
ops
dma_fence_ops associated with this fence
{unnamed_union}
anonymous
cb_list
list of all callbacks to call
timestamp
Timestamp when the fence was signaled.
rcu
used for releasing fence with kfree_rcu
context
execution context this fence belongs to, returned by
dma_fence_context_alloc()
seqno
the sequence number of this fence inside the execution context, can be compared to decide which fence would be signaled later.
flags
A mask of DMA_FENCE_FLAG_* defined below
refcount
refcount for this fence
error
Optional, only valid if < 0, must be set before calling dma_fence_signal, indicates that the fence has completed with an error.
Description
the flags member must be manipulated and read using the appropriate atomic ops (bit_*), so taking the spinlock will not be needed most of the time.
DMA_FENCE_FLAG_SIGNALED_BIT - fence is already signaled DMA_FENCE_FLAG_TIMESTAMP_BIT - timestamp recorded for fence signaling DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT - enable_signaling might have been called DMA_FENCE_FLAG_USER_BITS - start of the unused bits, can be used by the implementer of the fence for its own purposes. Can be used in different ways by different fence implementers, so do not rely on this.
Since atomic bitops are used, this is not guaranteed to be the case. Particularly, if the bit was set, but dma_fence_signal was called right before this bit was set, it would have been able to set the DMA_FENCE_FLAG_SIGNALED_BIT, before enable_signaling was called. Adding a check for DMA_FENCE_FLAG_SIGNALED_BIT after setting DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT closes this race, and makes sure that after dma_fence_signal was called, any enable_signaling call will have either been completed, or never called at all.
-
struct dma_fence_cb¶
callback for
dma_fence_add_callback()
Definition
struct dma_fence_cb {
struct list_head node;
dma_fence_func_t func;
};
Members
node
used by
dma_fence_add_callback()
to append this struct to fence::cb_listfunc
dma_fence_func_t to call
Description
This struct will be initialized by dma_fence_add_callback()
, additional
data can be passed along by embedding dma_fence_cb in another struct.
-
struct dma_fence_ops¶
operations implemented for fence
Definition
struct dma_fence_ops {
bool use_64bit_seqno;
const char * (*get_driver_name)(struct dma_fence *fence);
const char * (*get_timeline_name)(struct dma_fence *fence);
bool (*enable_signaling)(struct dma_fence *fence);
bool (*signaled)(struct dma_fence *fence);
signed long (*wait)(struct dma_fence *fence, bool intr, signed long timeout);
void (*release)(struct dma_fence *fence);
void (*fence_value_str)(struct dma_fence *fence, char *str, int size);
void (*timeline_value_str)(struct dma_fence *fence, char *str, int size);
};
Members
use_64bit_seqno
True if this dma_fence implementation uses 64bit seqno, false otherwise.
get_driver_name
Returns the driver name. This is a callback to allow drivers to compute the name at runtime, without having it to store permanently for each fence, or build a cache of some sort.
This callback is mandatory.
get_timeline_name
Return the name of the context this fence belongs to. This is a callback to allow drivers to compute the name at runtime, without having it to store permanently for each fence, or build a cache of some sort.
This callback is mandatory.
enable_signaling
Enable software signaling of fence.
For fence implementations that have the capability for hw->hw signaling, they can implement this op to enable the necessary interrupts, or insert commands into cmdstream, etc, to avoid these costly operations for the common case where only hw->hw synchronization is required. This is called in the first
dma_fence_wait()
ordma_fence_add_callback()
path to let the fence implementation know that there is another driver waiting on the signal (ie. hw->sw case).This function can be called from atomic context, but not from irq context, so normal spinlocks can be used.
A return value of false indicates the fence already passed, or some failure occurred that made it impossible to enable signaling. True indicates successful enabling.
dma_fence.error
may be set in enable_signaling, but only when false is returned.Since many implementations can call
dma_fence_signal()
even when before enable_signaling has been called there’s a race window, where thedma_fence_signal()
might result in the final fence reference being released and its memory freed. To avoid this, implementations of this callback should grab their own reference usingdma_fence_get()
, to be released when the fence is signalled (through e.g. the interrupt handler).This callback is optional. If this callback is not present, then the driver must always have signaling enabled.
signaled
Peek whether the fence is signaled, as a fastpath optimization for e.g.
dma_fence_wait()
ordma_fence_add_callback()
. Note that this callback does not need to make any guarantees beyond that a fence once indicates as signalled must always return true from this callback. This callback may return false even if the fence has completed already, in this case information hasn’t propogated throug the system yet. See alsodma_fence_is_signaled()
.May set
dma_fence.error
if returning true.This callback is optional.
wait
Custom wait implementation, defaults to
dma_fence_default_wait()
if not set.Deprecated and should not be used by new implementations. Only used by existing implementations which need special handling for their hardware reset procedure.
Must return -ERESTARTSYS if the wait is intr = true and the wait was interrupted, and remaining jiffies if fence has signaled, or 0 if wait timed out. Can also return other error values on custom implementations, which should be treated as if the fence is signaled. For example a hardware lockup could be reported like that.
release
Called on destruction of fence to release additional resources. Can be called from irq context. This callback is optional. If it is NULL, then
dma_fence_free()
is instead called as the default implementation.fence_value_str
Callback to fill in free-form debug info specific to this fence, like the sequence number.
This callback is optional.
timeline_value_str
Fills in the current value of the timeline as a string, like the sequence number. Note that the specific fence passed to this function should not matter, drivers should only use it to look up the corresponding timeline structures.
Parameters
struct dma_fence *fence
fence to reduce refcount of
Parameters
struct dma_fence *fence
fence to increase refcount of
Description
Returns the same fence, with refcount increased by 1.
-
struct dma_fence *dma_fence_get_rcu(struct dma_fence *fence)¶
get a fence from a dma_resv_list with rcu read lock
Parameters
struct dma_fence *fence
fence to increase refcount of
Description
Function returns NULL if no refcount could be obtained, or the fence.
-
struct dma_fence *dma_fence_get_rcu_safe(struct dma_fence __rcu **fencep)¶
acquire a reference to an RCU tracked fence
Parameters
struct dma_fence __rcu **fencep
pointer to fence to increase refcount of
Description
Function returns NULL if no refcount could be obtained, or the fence. This function handles acquiring a reference to a fence that may be reallocated within the RCU grace period (such as with SLAB_TYPESAFE_BY_RCU), so long as the caller is using RCU on the pointer to the fence.
An alternative mechanism is to employ a seqlock to protect a bunch of
fences, such as used by struct dma_resv
. When using a seqlock,
the seqlock must be taken before and checked after a reference to the
fence is acquired (as shown here).
The caller is required to hold the RCU read lock.
-
bool dma_fence_is_signaled_locked(struct dma_fence *fence)¶
Return an indication if the fence is signaled yet.
Parameters
struct dma_fence *fence
the fence to check
Description
Returns true if the fence was already signaled, false if not. Since this
function doesn’t enable signaling, it is not guaranteed to ever return
true if dma_fence_add_callback()
, dma_fence_wait()
or
dma_fence_enable_sw_signaling()
haven’t been called before.
This function requires dma_fence.lock
to be held.
See also dma_fence_is_signaled()
.
-
bool dma_fence_is_signaled(struct dma_fence *fence)¶
Return an indication if the fence is signaled yet.
Parameters
struct dma_fence *fence
the fence to check
Description
Returns true if the fence was already signaled, false if not. Since this
function doesn’t enable signaling, it is not guaranteed to ever return
true if dma_fence_add_callback()
, dma_fence_wait()
or
dma_fence_enable_sw_signaling()
haven’t been called before.
It’s recommended for seqno fences to call dma_fence_signal when the operation is complete, it makes it possible to prevent issues from wraparound between time of issue and time of use by checking the return value of this function before calling hardware-specific wait instructions.
See also dma_fence_is_signaled_locked()
.
-
bool __dma_fence_is_later(u64 f1, u64 f2, const struct dma_fence_ops *ops)¶
return if f1 is chronologically later than f2
Parameters
u64 f1
the first fence’s seqno
u64 f2
the second fence’s seqno from the same context
const struct dma_fence_ops *ops
dma_fence_ops associated with the seqno
Description
Returns true if f1 is chronologically later than f2. Both fences must be from the same context, since a seqno is not common across contexts.
-
bool dma_fence_is_later(struct dma_fence *f1, struct dma_fence *f2)¶
return if f1 is chronologically later than f2
Parameters
struct dma_fence *f1
the first fence from the same context
struct dma_fence *f2
the second fence from the same context
Description
Returns true if f1 is chronologically later than f2. Both fences must be from the same context, since a seqno is not re-used across contexts.
-
struct dma_fence *dma_fence_later(struct dma_fence *f1, struct dma_fence *f2)¶
return the chronologically later fence
Parameters
struct dma_fence *f1
the first fence from the same context
struct dma_fence *f2
the second fence from the same context
Description
Returns NULL if both fences are signaled, otherwise the fence that would be signaled last. Both fences must be from the same context, since a seqno is not re-used across contexts.
Parameters
struct dma_fence *fence
the dma_fence to query
Description
Drivers can supply an optional error status condition before they signal
the fence (to indicate whether the fence was completed due to an error
rather than success). The value of the status condition is only valid
if the fence has been signaled, dma_fence_get_status_locked()
first checks
the signal state before reporting the error status.
Returns 0 if the fence has not yet been signaled, 1 if the fence has been signaled without an error condition, or a negative error code if the fence has been completed in err.
Parameters
struct dma_fence *fence
the dma_fence
int error
the error to store
Description
Drivers can supply an optional error status condition before they signal the fence, to indicate that the fence was completed due to an error rather than success. This must be set before signaling (so that the value is visible before any waiters on the signal callback are woken). This helper exists to help catching erroneous setting of #dma_fence.error.
Parameters
struct dma_fence *fence
the fence to wait on
bool intr
if true, do an interruptible wait
Description
This function will return -ERESTARTSYS if interrupted by a signal, or 0 if the fence was signaled. Other error values may be returned on custom implementations.
Performs a synchronous wait on this fence. It is assumed the caller directly or indirectly holds a reference to the fence, otherwise the fence might be freed before return, resulting in undefined behavior.
See also dma_fence_wait_timeout()
and dma_fence_wait_any_timeout()
.
DMA Fence Array¶
-
struct dma_fence_array *dma_fence_array_create(int num_fences, struct dma_fence **fences, u64 context, unsigned seqno, bool signal_on_any)¶
Create a custom fence array
Parameters
int num_fences
[in] number of fences to add in the array
struct dma_fence **fences
[in] array containing the fences
u64 context
[in] fence context to use
unsigned seqno
[in] sequence number to use
bool signal_on_any
[in] signal on any fence in the array
Description
Allocate a dma_fence_array object and initialize the base fence with
dma_fence_init()
.
In case of error it returns NULL.
The caller should allocate the fences array with num_fences size
and fill it with the fences it wants to add to the object. Ownership of this
array is taken and dma_fence_put()
is used on each fence on release.
If signal_on_any is true the fence array signals if any fence in the array signals, otherwise it signals when all fences in the array signal.
-
bool dma_fence_match_context(struct dma_fence *fence, u64 context)¶
Check if all fences are from the given context
Parameters
struct dma_fence *fence
[in] fence or fence array
u64 context
[in] fence context to check all fences against
Description
Checks the provided fence or, for a fence array, all fences in the array against the given context. Returns false if any fence is from a different context.
-
struct dma_fence_array_cb¶
callback helper for fence array
Definition
struct dma_fence_array_cb {
struct dma_fence_cb cb;
struct dma_fence_array *array;
};
Members
cb
fence callback structure for signaling
array
reference to the parent fence array object
-
struct dma_fence_array¶
fence to represent an array of fences
Definition
struct dma_fence_array {
struct dma_fence base;
spinlock_t lock;
unsigned num_fences;
atomic_t num_pending;
struct dma_fence **fences;
struct irq_work work;
};
Members
base
fence base class
lock
spinlock for fence handling
num_fences
number of fences in the array
num_pending
fences in the array still pending
fences
array of the fences
work
internal irq_work function
Parameters
struct dma_fence *fence
fence to test
Description
Return true if it is a dma_fence_array and false otherwise.
-
struct dma_fence_array *to_dma_fence_array(struct dma_fence *fence)¶
cast a fence to a dma_fence_array
Parameters
struct dma_fence *fence
fence to cast to a dma_fence_array
Description
Returns NULL if the fence is not a dma_fence_array, or the dma_fence_array otherwise.
DMA Fence Chain¶
Parameters
struct dma_fence *fence
current chain node
Description
Walk the chain to the next node. Returns the next fence or NULL if we are at the end of the chain. Garbage collects chain nodes which are already signaled.
-
int dma_fence_chain_find_seqno(struct dma_fence **pfence, uint64_t seqno)¶
find fence chain node by seqno
Parameters
struct dma_fence **pfence
pointer to the chain node where to start
uint64_t seqno
the sequence number to search for
Description
Advance the fence pointer to the chain node which will signal this sequence number. If no sequence number is provided then this is a no-op.
Returns EINVAL if the fence is not a chain node or the sequence number has not yet advanced far enough.
-
void dma_fence_chain_init(struct dma_fence_chain *chain, struct dma_fence *prev, struct dma_fence *fence, uint64_t seqno)¶
initialize a fence chain
Parameters
struct dma_fence_chain *chain
the chain node to initialize
struct dma_fence *prev
the previous fence
struct dma_fence *fence
the current fence
uint64_t seqno
the sequence number to use for the fence chain
Description
Initialize a new chain node and either start a new chain or add the node to the existing chain of the previous fence.
-
struct dma_fence_chain¶
fence to represent an node of a fence chain
Definition
struct dma_fence_chain {
struct dma_fence base;
struct dma_fence __rcu *prev;
u64 prev_seqno;
struct dma_fence *fence;
union {
struct dma_fence_cb cb;
struct irq_work work;
};
spinlock_t lock;
};
Members
base
fence base class
prev
previous fence of the chain
prev_seqno
original previous seqno before garbage collection
fence
encapsulated fence
{unnamed_union}
anonymous
cb
callback for signaling
This is used to add the callback for signaling the complection of the fence chain. Never used at the same time as the irq work.
work
irq work item for signaling
Irq work structure to allow us to add the callback without running into lock inversion. Never used at the same time as the callback.
lock
spinlock for fence handling
-
struct dma_fence_chain *to_dma_fence_chain(struct dma_fence *fence)¶
cast a fence to a dma_fence_chain
Parameters
struct dma_fence *fence
fence to cast to a dma_fence_array
Description
Returns NULL if the fence is not a dma_fence_chain, or the dma_fence_chain otherwise.
-
struct dma_fence_chain *dma_fence_chain_alloc(void)¶
Parameters
void
no arguments
Description
Returns a new struct dma_fence_chain
object or NULL on failure.
-
void dma_fence_chain_free(struct dma_fence_chain *chain)¶
Parameters
struct dma_fence_chain *chain
chain node to free
Description
Frees up an allocated but not used struct dma_fence_chain
object. This
doesn’t need an RCU grace period since the fence was never initialized nor
published. After dma_fence_chain_init()
has been called the fence must be
released by calling dma_fence_put()
, and not through this function.
-
dma_fence_chain_for_each¶
dma_fence_chain_for_each (iter, head)
iterate over all fences in chain
Parameters
iter
current fence
head
starting point
Description
Iterate over all fences in the chain. We keep a reference to the current fence while inside the loop which must be dropped when breaking out.
DMA Fence uABI/Sync File¶
Parameters
struct dma_fence *fence
fence to add to the sync_fence
Description
Creates a sync_file containg fence. This function acquires and additional
reference of fence for the newly-created sync_file
, if it succeeds. The
sync_file can be released with fput(sync_file->file). Returns the
sync_file or NULL in case of error.
Parameters
int fd
sync_file fd to get the fence from
Description
Ensures fd references a valid sync_file and returns a fence that represents all fence in the sync_file. On error NULL is returned.
-
struct sync_file¶
sync file to export to the userspace
Definition
struct sync_file {
struct file *file;
char user_name[32];
#ifdef CONFIG_DEBUG_FS;
struct list_head sync_file_list;
#endif;
wait_queue_head_t wq;
unsigned long flags;
struct dma_fence *fence;
struct dma_fence_cb cb;
};
Members
file
file representing this fence
user_name
Name of the sync file provided by userspace, for merged fences. Otherwise generated through driver callbacks (in which case the entire array is 0).
sync_file_list
membership in global file list
wq
wait queue for fence signaling
flags
flags for the sync_file
fence
fence with the fences in the sync_file
cb
fence callback information
Description
flags: POLL_ENABLED: whether userspace is currently poll()’ing or not
Indefinite DMA Fences¶
At various times struct dma_fence
with an indefinite time until dma_fence_wait()
finishes have been proposed. Examples include:
Future fences, used in HWC1 to signal when a buffer isn’t used by the display any longer, and created with the screen update that makes the buffer visible. The time this fence completes is entirely under userspace’s control.
Proxy fences, proposed to handle &drm_syncobj for which the fence has not yet been set. Used to asynchronously delay command submission.
Userspace fences or gpu futexes, fine-grained locking within a command buffer that userspace uses for synchronization across engines or with the CPU, which are then imported as a DMA fence for integration into existing winsys protocols.
Long-running compute command buffers, while still using traditional end of batch DMA fences for memory management instead of context preemption DMA fences which get reattached when the compute job is rescheduled.
Common to all these schemes is that userspace controls the dependencies of these fences and controls when they fire. Mixing indefinite fences with normal in-kernel DMA fences does not work, even when a fallback timeout is included to protect against malicious userspace:
Only the kernel knows about all DMA fence dependencies, userspace is not aware of dependencies injected due to memory management or scheduler decisions.
Only userspace knows about all dependencies in indefinite fences and when exactly they will complete, the kernel has no visibility.
Furthermore the kernel has to be able to hold up userspace command submission for memory management needs, which means we must support indefinite fences being dependent upon DMA fences. If the kernel also support indefinite fences in the kernel like a DMA fence, like any of the above proposal would, there is the potential for deadlocks.
This means that the kernel might accidentally create deadlocks through memory management dependencies which userspace is unaware of, which randomly hangs workloads until the timeout kicks in. Workloads, which from userspace’s perspective, do not contain a deadlock. In such a mixed fencing architecture there is no single entity with knowledge of all dependencies. Thefore preventing such deadlocks from within the kernel is not possible.
The only solution to avoid dependencies loops is by not allowing indefinite fences in the kernel. This means:
No future fences, proxy fences or userspace fences imported as DMA fences, with or without a timeout.
No DMA fences that signal end of batchbuffer for command submission where userspace is allowed to use userspace fencing or long running compute workloads. This also means no implicit fencing for shared buffers in these cases.
Recoverable Hardware Page Faults Implications¶
Modern hardware supports recoverable page faults, which has a lot of implications for DMA fences.
First, a pending page fault obviously holds up the work that’s running on the accelerator and a memory allocation is usually required to resolve the fault. But memory allocations are not allowed to gate completion of DMA fences, which means any workload using recoverable page faults cannot use DMA fences for synchronization. Synchronization fences controlled by userspace must be used instead.
On GPUs this poses a problem, because current desktop compositor protocols on Linux rely on DMA fences, which means without an entirely new userspace stack built on top of userspace fences, they cannot benefit from recoverable page faults. Specifically this means implicit synchronization will not be possible. The exception is when page faults are only used as migration hints and never to on-demand fill a memory request. For now this means recoverable page faults on GPUs are limited to pure compute workloads.
Furthermore GPUs usually have shared resources between the 3D rendering and compute side, like compute units or command submission engines. If both a 3D job with a DMA fence and a compute workload using recoverable page faults are pending they could deadlock:
The 3D workload might need to wait for the compute job to finish and release hardware resources first.
The compute workload might be stuck in a page fault, because the memory allocation is waiting for the DMA fence of the 3D workload to complete.
There are a few options to prevent this problem, one of which drivers need to ensure:
Compute workloads can always be preempted, even when a page fault is pending and not yet repaired. Not all hardware supports this.
DMA fence workloads and workloads which need page fault handling have independent hardware resources to guarantee forward progress. This could be achieved through e.g. through dedicated engines and minimal compute unit reservations for DMA fence workloads.
The reservation approach could be further refined by only reserving the hardware resources for DMA fence workloads when they are in-flight. This must cover the time from when the DMA fence is visible to other threads up to moment when fence is completed through
dma_fence_signal()
.As a last resort, if the hardware provides no useful reservation mechanics, all workloads must be flushed from the GPU when switching between jobs requiring DMA fences or jobs requiring page fault handling: This means all DMA fences must complete before a compute job with page fault handling can be inserted into the scheduler queue. And vice versa, before a DMA fence can be made visible anywhere in the system, all compute workloads must be preempted to guarantee all pending GPU page faults are flushed.
Only a fairly theoretical option would be to untangle these dependencies when allocating memory to repair hardware page faults, either through separate memory blocks or runtime tracking of the full dependency graph of all DMA fences. This results very wide impact on the kernel, since resolving the page on the CPU side can itself involve a page fault. It is much more feasible and robust to limit the impact of handling hardware page faults to the specific driver.
Note that workloads that run on independent hardware like copy engines or other GPUs do not have any impact. This allows us to keep using DMA fences internally in the kernel even for resolving hardware page faults, e.g. by using copy engines to clear or copy memory needed to resolve the page fault.
In some ways this page fault problem is a special case of the Infinite DMA Fences discussions: Infinite fences from compute workloads are allowed to depend on DMA fences, but not the other way around. And not even the page fault problem is new, because some other CPU thread in userspace might hit a page fault which holds up a userspace fence - supporting page faults on GPUs doesn’t anything fundamentally new.