Merge tag 'drm-intel-next-2019-11-01-1' of git://anongit.freedesktop.org/drm/drm-intel into drm-next

UAPI Changes: - Make context persistence optional Allow userspace to tie the context lifetime to FD lifetime, effectively allowing Ctrl-C killing of a process to also clean up the hardware immediately. Compute changes: https://github.com/intel/compute-runtime/pull/228 The compute driver is shipping in Ubuntu. uAPI acked by Mesa folks. - Put future HW and their uAPIs under STAGING & BROKEN Introduces DRM_I915_UNSTABLE Kconfig menu for working on the new uAPI for future HW in upstream. We already disable driver loading by default the platform is deemed ready. This is a second level of protection based on compile time switch (STAGING & BROKEN). - Under DRM_I915_UNSTABLE: Add the fake lmem region on iGFX Fake local memory region on integrated GPU through cmdline: memmap=2G$16G i915.fake_lmem_start=0x400000000 Currently allows testing non-mappable GGTT behavior and running kernel selftest for local memory. Driver Changes: - Fix Bugzilla #112084: VGA external monitor not working (Ville) - Add support for half float framebuffers (Ville) - Add perf support on TGL (Lionel) - Replace hangcheck by heartbeats (Chris) - Allow SPT PCH on all AML devices (James) - Add new CNL PCH for CML platform (Imre) - Allow 100 ms (Kconfig) for workloads to exit before reset (Chris, Jon, Joonas) - Forcibly pre-empt a context after 100 ms (Kconfig) of delay (Chris) - Make timeslice duration Kconfig configurable (Chris) - Whitelist PS_(DEPTH|INVOCATION)_COUNT for Tigerlake (Tapani) - Support creating LMEM objects in kernel (Matt A) - Adjust the location of RING_MI_MODE in the context image for TGL (Chris) - Handle AUX interrupts for TC ports (Matt R) - Add support for devices without mappable GGTT aperture (Daniele) - Rename "inject_load_failure" module parameter to "inject_probe_failure" (Janusz) - Handle fused off HDCP, FBC, DMC and DSC (Jose) - Add support to one DP-MST stream on Tigerlake (Lucas) - Add HuC firmware (and GuC) for TGL (Daniele) - Allow ICL+ DSI on any pipe (Ville) - Check some transcoder timing minimum limits (Ville) - Don't set queue_priority_hint if we don't kick the submission (Chris) - Introduce barrier pulses along engines to flush idle/in-flight requests (Chris) - Drop assertion that ce->pin_mutex guards state updates (Chris) - Cancel banned contexts on schedule-out (Chris) - Cancel contexts when hangchecking is disabled (Chris) - Catch GTT fault errors for gen11+ planes (Matt R) - Print in debugfs if PSR is not enabled because of sink (Jose) - Do not set MOCS control values on dgfx (Lucas) - Setup io-mapping for LMEM (Abdiel) - Support kernel mapping of LMEM objects (Abdiel) - Add LMEM selftests (Matt A) - Initialise PMU spinlock before registering (Chris) - Clear DKL_TX_PMD_LANE_SUS before program TC voltage swing (Jose) - Flip interpretation of ips fmin/fmax to max rps (Chris) - Add VBT compression parameter block definition (Jani) - Limit the blitter sizes to ensure low preemption latency (Chris) - Fixup block_size rounding on BLT (Matt A) - Don't try to place HWS in non-existing mappable region (Michal Wa) - Don't allocate the ring in stolen if we lack aperture (Matt A) - Add AUX B & C to DC_OFF_POWER_DOMAINS for Tigerlake (Matt R) - Avoid HPD poll detect triggering a new detect cycle (Imre) - Document the userspace fail with possible_crtcs (Ville) - Drop lrc header page now unused by GuC (Daniele) - Do not switch aux to TBT mode for non-TC ports (Jose) - Restructure code to avoid depending on i915 but smaller structs (Chris, Tvrtko, Andi) - Remove pm park/unpark notifications (Chris) - Avoid lockdep cross-contamination between object types (Chris) - Restructure DSC code (Jani) - Fix dead locking in early workload shadow (Zhenyu) - Split the legacy submission backend from the common CS ring buffer (Chris) - Move intel_engine_context_in/out into intel_lrc.c (Tvrtko) - Describe perf/wakeref structure members in documentation (Anna) - Update renamed header files names in documentation (Anna) - Add debugs to distingiush a cd2x update from a full cdclk pll update (Ville) - Rework atomic global state locking (Ville) - Allow planes to declare their minimum acceptable cdclk (Ville) - Eliminate skl_check_pipe_max_pixel_rate() and simplify skl_max_scale() (Ville) - Making loglevel of PSR2/SU logs same (Ap) - Capture aux page table error register (Lionel) - Add is_dgfx to device info (Jose) - Split gen11_irq_handler to make it shareable (Lucas) - Encapsulate kconfig constant values inside boolean predicates (Chris) - Split memory_region initialisation into its own file (Chris) - Use _PICK() for CHICKEN_TRANS() and add CHICKEN_TRANS_D (Ville) - Add perf helper macros for comparing with whitelisted registers (Umesh) - Fix i915_inject_load_error() name to read *_probe_* (Janusz) - Drop unused AUX register offsets (Matt R) - Provide more information on DP AUX failures (Matt R) - Add GAM/SFC instdone to error state (Mika) - Always track callers to intel_rps_mark_interactive() (Chris) - Nuke 'mode' argument to intel_get_load_detect_pipe() (Ville) - Simplify LVDS crtc_mask and pipe_mask setup (Ville) - Stop frobbing crtc->base.mode (Ville) - Do s/crtc_mask/pipe_mask/ (Ville) - Split detaching and removing the vma (Chris) - Selftest improvements (Chris, Tvrtko, Mika, Matt A, Lionel) - GuC code improvements (Rob, Andi, Daniele) - Check against i915_selftest only under CONFIG_SELFTEST (Chris) - Refine occupancy test in kill_context() (Chris) - Start kthreads before stopping (Chris) Signed-off-by: Dave Airlie <airlied@redhat.com> From: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191101104718.GA14323@jlahtine-desk.ger.corp.intel.com
2025-01-09 14:43:16 +00:00 · 2019-11-04 09:56:25 +10:00 · 2019-11-04 09:56:25 +10:00 · 2ef4144d1e
commit 2ef4144d1e
parent 904ce198dd 1883e2999f
172 changed files with 8787 additions and 5014 deletions
--- a/Documentation/gpu/i915.rst
+++ b/Documentation/gpu/i915.rst
@ -550,9 +550,9 @@ i915 Perf Stream
 This section covers the stream-semantics-agnostic structures and functions
 for representing an i915 perf stream FD and associated file operations.

-.. kernel-doc:: drivers/gpu/drm/i915/i915_drv.h
+.. kernel-doc:: drivers/gpu/drm/i915/i915_perf_types.h
   :functions: i915_perf_stream
-.. kernel-doc:: drivers/gpu/drm/i915/i915_drv.h
+.. kernel-doc:: drivers/gpu/drm/i915/i915_perf_types.h
   :functions: i915_perf_stream_ops

 .. kernel-doc:: drivers/gpu/drm/i915/i915_perf.c
@ -577,7 +577,7 @@ for representing an i915 perf stream FD and associated file operations.
 i915 Perf Observation Architecture Stream
 -----------------------------------------

-.. kernel-doc:: drivers/gpu/drm/i915/i915_drv.h
+.. kernel-doc:: drivers/gpu/drm/i915/i915_perf_types.h
   :functions: i915_oa_ops

 .. kernel-doc:: drivers/gpu/drm/i915/i915_perf.c
--- a/drivers/gpu/drm/i915/Kconfig
+++ b/drivers/gpu/drm/i915/Kconfig
@ -148,3 +148,9 @@ menu "drm/i915 Profile Guided Optimisation"
 	depends on DRM_I915
 	source "drivers/gpu/drm/i915/Kconfig.profile"
 endmenu
+
+menu "drm/i915 Unstable Evolution"
+	visible if EXPERT && STAGING && BROKEN
+	depends on DRM_I915
+	source "drivers/gpu/drm/i915/Kconfig.unstable"
+endmenu
--- a/drivers/gpu/drm/i915/Kconfig.debug
+++ b/drivers/gpu/drm/i915/Kconfig.debug
@ -36,6 +36,7 @@ config DRM_I915_DEBUG
 	select DRM_I915_SELFTEST
 	select DRM_I915_DEBUG_RUNTIME_PM
 	select DRM_I915_DEBUG_MMIO
+	select BROKEN # for prototype uAPI
 	default n
 	help
 	  Choose this option to turn on extra driver debugging that may affect
--- a/drivers/gpu/drm/i915/Kconfig.profile
+++ b/drivers/gpu/drm/i915/Kconfig.profile
@ -12,6 +12,29 @@ config DRM_I915_USERFAULT_AUTOSUSPEND
 	  May be 0 to disable the extra delay and solely use the device level
 	  runtime pm autosuspend delay tunable.

+config DRM_I915_HEARTBEAT_INTERVAL
+	int "Interval between heartbeat pulses (ms)"
+	default 2500 # milliseconds
+	help
+	  The driver sends a periodic heartbeat down all active engines to
+	  check the health of the GPU and undertake regular house-keeping of
+	  internal driver state.
+
+	  May be 0 to disable heartbeats and therefore disable automatic GPU
+	  hang detection.
+
+config DRM_I915_PREEMPT_TIMEOUT
+	int "Preempt timeout (ms, jiffy granularity)"
+	default 100 # milliseconds
+	help
+	  How long to wait (in milliseconds) for a preemption event to occur
+	  when submitting a new context via execlists. If the current context
+	  does not hit an arbitration point and yield to HW before the timer
+	  expires, the HW will be reset to allow the more important context
+	  to execute.
+
+	  May be 0 to disable the timeout.
+
 config DRM_I915_SPIN_REQUEST
 	int "Busywait for request completion (us)"
 	default 5 # microseconds
@ -25,3 +48,29 @@ config DRM_I915_SPIN_REQUEST
 	  May be 0 to disable the initial spin. In practice, we estimate
 	  the cost of enabling the interrupt (if currently disabled) to be
 	  a few microseconds.
+
+config DRM_I915_STOP_TIMEOUT
+	int "How long to wait for an engine to quiesce gracefully before reset (ms)"
+	default 100 # milliseconds
+	help
+	  By stopping submission and sleeping for a short time before resetting
+	  the GPU, we allow the innocent contexts also on the system to quiesce.
+	  It is then less likely for a hanging context to cause collateral
+	  damage as the system is reset in order to recover. The corollary is
+	  that the reset itself may take longer and so be more disruptive to
+	  interactive or low latency workloads.
+
+config DRM_I915_TIMESLICE_DURATION
+	int "Scheduling quantum for userspace batches (ms, jiffy granularity)"
+	default 1 # milliseconds
+	help
+	  When two user batches of equal priority are executing, we will
+	  alternate execution of each batch to ensure forward progress of
+	  all users. This is necessary in some cases where there may be
+	  an implicit dependency between those batches that requires
+	  concurrent execution in order for them to proceed, e.g. they
+	  interact with each other via userspace semaphores. Each context
+	  is scheduled for execution for the timeslice duration, before
+	  switching to the next context.
+
+	  May be 0 to disable timeslicing.
--- a/drivers/gpu/drm/i915/Kconfig.unstable
+++ b/drivers/gpu/drm/i915/Kconfig.unstable
@ -0,0 +1,29 @@
+# SPDX-License-Identifier: GPL-2.0-only
+config DRM_I915_UNSTABLE
+	bool "Enable unstable API for early prototype development"
+	depends on EXPERT
+	depends on STAGING
+	depends on BROKEN # should never be enabled by distros!
+	# We use the dependency on !COMPILE_TEST to not be enabled in
+	# allmodconfig or allyesconfig configurations
+	depends on !COMPILE_TEST
+	default n
+	help
+	  Enable prototype uAPI under general discussion before they are
+	  finalized. Such prototypes may be withdrawn or substantially
+	  changed before release. They are only enabled here so that a wide
+	  number of interested parties (userspace driver developers) can
+	  verify that the uAPI meet their expectations. These uAPI should
+	  never be used in production.
+
+	  Recommended for driver developers _only_.
+
+	  If in the slightest bit of doubt, say "N".
+
+config DRM_I915_UNSTABLE_FAKE_LMEM
+	bool "Enable the experimental fake lmem"
+	depends on DRM_I915_UNSTABLE
+	default n
+	help
+	  Convert some system memory into a fake local memory region for
+	  testing.
--- a/drivers/gpu/drm/i915/Makefile
+++ b/drivers/gpu/drm/i915/Makefile
@ -78,22 +78,24 @@ gt-y += \
 	gt/intel_breadcrumbs.o \
 	gt/intel_context.o \
 	gt/intel_engine_cs.o \
-	gt/intel_engine_pool.o \
+	gt/intel_engine_heartbeat.o \
 	gt/intel_engine_pm.o \
+	gt/intel_engine_pool.o \
 	gt/intel_engine_user.o \
 	gt/intel_gt.o \
 	gt/intel_gt_irq.o \
 	gt/intel_gt_pm.o \
 	gt/intel_gt_pm_irq.o \
 	gt/intel_gt_requests.o \
-	gt/intel_hangcheck.o \
 	gt/intel_llc.o \
 	gt/intel_lrc.o \
+	gt/intel_mocs.o \
 	gt/intel_rc6.o \
 	gt/intel_renderstate.o \
 	gt/intel_reset.o \
-	gt/intel_ringbuffer.o \
-	gt/intel_mocs.o \
+	gt/intel_ring.o \
+	gt/intel_ring_submission.o \
+	gt/intel_rps.o \
 	gt/intel_sseu.o \
 	gt/intel_timeline.o \
 	gt/intel_workarounds.o
@ -119,6 +121,7 @@ gem-y += \
 	gem/i915_gem_internal.o \
 	gem/i915_gem_object.o \
 	gem/i915_gem_object_blt.o \
+	gem/i915_gem_lmem.o \
 	gem/i915_gem_mman.o \
 	gem/i915_gem_pages.o \
 	gem/i915_gem_phys.o \
@ -147,6 +150,7 @@ i915-y += \
 	  i915_scheduler.o \
 	  i915_trace_points.o \
 	  i915_vma.o \
+	  intel_region_lmem.o \
 	  intel_wopcm.o

 # general-purpose microcontroller (GuC) support
@ -243,7 +247,8 @@ i915-y += \
 	oa/i915_oa_cflgt2.o \
 	oa/i915_oa_cflgt3.o \
 	oa/i915_oa_cnl.o \
-	oa/i915_oa_icl.o
+	oa/i915_oa_icl.o \
+	oa/i915_oa_tgl.o
 i915-y += i915_perf.o

 # Post-mortem debug and GPU hang state capture
--- a/drivers/gpu/drm/i915/display/icl_dsi.c
+++ b/drivers/gpu/drm/i915/display/icl_dsi.c
@ -1584,7 +1584,7 @@ void icl_dsi_init(struct drm_i915_private *dev_priv)
 	encoder->get_hw_state = gen11_dsi_get_hw_state;
 	encoder->type = INTEL_OUTPUT_DSI;
 	encoder->cloneable = 0;
-	encoder->crtc_mask = BIT(PIPE_A) | BIT(PIPE_B) | BIT(PIPE_C);
+	encoder->pipe_mask = ~0;
 	encoder->power_domain = POWER_DOMAIN_PORT_DSI;
 	encoder->get_power_domains = gen11_dsi_get_power_domains;

--- a/drivers/gpu/drm/i915/display/intel_atomic.c
+++ b/drivers/gpu/drm/i915/display/intel_atomic.c
@ -429,6 +429,13 @@ void intel_atomic_state_clear(struct drm_atomic_state *s)
 	struct intel_atomic_state *state = to_intel_atomic_state(s);
 	drm_atomic_state_default_clear(&state->base);
 	state->dpll_set = state->modeset = false;
+	state->global_state_changed = false;
+	state->active_pipes = 0;
+	memset(&state->min_cdclk, 0, sizeof(state->min_cdclk));
+	memset(&state->min_voltage_level, 0, sizeof(state->min_voltage_level));
+	memset(&state->cdclk.logical, 0, sizeof(state->cdclk.logical));
+	memset(&state->cdclk.actual, 0, sizeof(state->cdclk.actual));
+	state->cdclk.pipe = INVALID_PIPE;
 }

 struct intel_crtc_state *
@ -442,3 +449,40 @@ intel_atomic_get_crtc_state(struct drm_atomic_state *state,

 	return to_intel_crtc_state(crtc_state);
 }
+
+int intel_atomic_lock_global_state(struct intel_atomic_state *state)
+{
+	struct drm_i915_private *dev_priv = to_i915(state->base.dev);
+	struct intel_crtc *crtc;
+
+	state->global_state_changed = true;
+
+	for_each_intel_crtc(&dev_priv->drm, crtc) {
+		int ret;
+
+		ret = drm_modeset_lock(&crtc->base.mutex,
+				       state->base.acquire_ctx);
+		if (ret)
+			return ret;
+	}
+
+	return 0;
+}
+
+int intel_atomic_serialize_global_state(struct intel_atomic_state *state)
+{
+	struct drm_i915_private *dev_priv = to_i915(state->base.dev);
+	struct intel_crtc *crtc;
+
+	state->global_state_changed = true;
+
+	for_each_intel_crtc(&dev_priv->drm, crtc) {
+		struct intel_crtc_state *crtc_state;
+
+		crtc_state = intel_atomic_get_crtc_state(&state->base, crtc);
+		if (IS_ERR(crtc_state))
+			return PTR_ERR(crtc_state);
+	}
+
+	return 0;
+}
--- a/drivers/gpu/drm/i915/display/intel_atomic.h
+++ b/drivers/gpu/drm/i915/display/intel_atomic.h
@ -16,6 +16,7 @@ struct drm_crtc_state;
 struct drm_device;
 struct drm_i915_private;
 struct drm_property;
+struct intel_atomic_state;
 struct intel_crtc;
 struct intel_crtc_state;

@ -46,4 +47,8 @@ int intel_atomic_setup_scalers(struct drm_i915_private *dev_priv,
 			       struct intel_crtc *intel_crtc,
 			       struct intel_crtc_state *crtc_state);

+int intel_atomic_lock_global_state(struct intel_atomic_state *state);
+
+int intel_atomic_serialize_global_state(struct intel_atomic_state *state);
+
 #endif /* __INTEL_ATOMIC_H__ */
--- a/drivers/gpu/drm/i915/display/intel_atomic_plane.c
+++ b/drivers/gpu/drm/i915/display/intel_atomic_plane.c
@ -138,6 +138,44 @@ unsigned int intel_plane_data_rate(const struct intel_crtc_state *crtc_state,
 	return cpp * crtc_state->pixel_rate;
 }

+bool intel_plane_calc_min_cdclk(struct intel_atomic_state *state,
+				struct intel_plane *plane)
+{
+	struct drm_i915_private *dev_priv = to_i915(plane->base.dev);
+	const struct intel_plane_state *plane_state =
+		intel_atomic_get_new_plane_state(state, plane);
+	struct intel_crtc *crtc = to_intel_crtc(plane_state->base.crtc);
+	struct intel_crtc_state *crtc_state;
+
+	if (!plane_state->base.visible || !plane->min_cdclk)
+		return false;
+
+	crtc_state = intel_atomic_get_new_crtc_state(state, crtc);
+
+	crtc_state->min_cdclk[plane->id] =
+		plane->min_cdclk(crtc_state, plane_state);
+
+	/*
+	 * Does the cdclk need to be bumbed up?
+	 *
+	 * Note: we obviously need to be called before the new
+	 * cdclk frequency is calculated so state->cdclk.logical
+	 * hasn't been populated yet. Hence we look at the old
+	 * cdclk state under dev_priv->cdclk.logical. This is
+	 * safe as long we hold at least one crtc mutex (which
+	 * must be true since we have crtc_state).
+	 */
+	if (crtc_state->min_cdclk[plane->id] > dev_priv->cdclk.logical.cdclk) {
+		DRM_DEBUG_KMS("[PLANE:%d:%s] min_cdclk (%d kHz) > logical cdclk (%d kHz)\n",
+			      plane->base.base.id, plane->base.name,
+			      crtc_state->min_cdclk[plane->id],
+			      dev_priv->cdclk.logical.cdclk);
+		return true;
+	}
+
+	return false;
+}
+
 int intel_plane_atomic_check_with_state(const struct intel_crtc_state *old_crtc_state,
 					struct intel_crtc_state *new_crtc_state,
 					const struct intel_plane_state *old_plane_state,
@ -151,6 +189,7 @@ int intel_plane_atomic_check_with_state(const struct intel_crtc_state *old_crtc_
 	new_crtc_state->nv12_planes &= ~BIT(plane->id);
 	new_crtc_state->c8_planes &= ~BIT(plane->id);
 	new_crtc_state->data_rate[plane->id] = 0;
+	new_crtc_state->min_cdclk[plane->id] = 0;
 	new_plane_state->base.visible = false;

 	if (!new_plane_state->base.crtc && !old_plane_state->base.crtc)
--- a/drivers/gpu/drm/i915/display/intel_atomic_plane.h
+++ b/drivers/gpu/drm/i915/display/intel_atomic_plane.h
@ -47,5 +47,7 @@ int intel_plane_atomic_calc_changes(const struct intel_crtc_state *old_crtc_stat
 				    struct intel_crtc_state *crtc_state,
 				    const struct intel_plane_state *old_plane_state,
 				    struct intel_plane_state *plane_state);
+bool intel_plane_calc_min_cdclk(struct intel_atomic_state *state,
+				struct intel_plane *plane);

 #endif /* __INTEL_ATOMIC_PLANE_H__ */
--- a/drivers/gpu/drm/i915/display/intel_audio.c
+++ b/drivers/gpu/drm/i915/display/intel_audio.c
@ -28,6 +28,7 @@
 #include <drm/i915_component.h>

 #include "i915_drv.h"
+#include "intel_atomic.h"
 #include "intel_audio.h"
 #include "intel_display_types.h"
 #include "intel_lpe_audio.h"
@ -818,13 +819,8 @@ retry:
 	to_intel_atomic_state(state)->cdclk.force_min_cdclk =
 		enable ? 2 * 96000 : 0;

-	/*
-	 * Protects dev_priv->cdclk.force_min_cdclk
-	 * Need to lock this here in case we have no active pipes
-	 * and thus wouldn't lock it during the commit otherwise.
-	 */
-	ret = drm_modeset_lock(&dev_priv->drm.mode_config.connection_mutex,
-			       &ctx);
+	/* Protects dev_priv->cdclk.force_min_cdclk */
+	ret = intel_atomic_lock_global_state(to_intel_atomic_state(state));
 	if (!ret)
 		ret = drm_atomic_commit(state);

--- a/drivers/gpu/drm/i915/display/intel_cdclk.c
+++ b/drivers/gpu/drm/i915/display/intel_cdclk.c
@ -1918,6 +1918,19 @@ static int intel_pixel_rate_to_cdclk(const struct intel_crtc_state *crtc_state)
 		return DIV_ROUND_UP(pixel_rate * 100, 90);
 }

+static int intel_planes_min_cdclk(const struct intel_crtc_state *crtc_state)
+{
+	struct intel_crtc *crtc = to_intel_crtc(crtc_state->base.crtc);
+	struct drm_i915_private *dev_priv = to_i915(crtc->base.dev);
+	struct intel_plane *plane;
+	int min_cdclk = 0;
+
+	for_each_intel_plane_on_crtc(&dev_priv->drm, crtc, plane)
+		min_cdclk = max(crtc_state->min_cdclk[plane->id], min_cdclk);
+
+	return min_cdclk;
+}
+
 int intel_crtc_compute_min_cdclk(const struct intel_crtc_state *crtc_state)
 {
 	struct drm_i915_private *dev_priv =
@ -1986,6 +1999,9 @@ int intel_crtc_compute_min_cdclk(const struct intel_crtc_state *crtc_state)
 	    IS_GEMINILAKE(dev_priv))
 		min_cdclk = max(158400, min_cdclk);

+	/* Account for additional needs from the planes */
+	min_cdclk = max(intel_planes_min_cdclk(crtc_state), min_cdclk);
+
 	if (min_cdclk > dev_priv->max_cdclk_freq) {
 		DRM_DEBUG_KMS("required cdclk (%d kHz) exceeds max (%d kHz)\n",
 			      min_cdclk, dev_priv->max_cdclk_freq);
@ -2007,11 +2023,20 @@ static int intel_compute_min_cdclk(struct intel_atomic_state *state)
 	       sizeof(state->min_cdclk));

 	for_each_new_intel_crtc_in_state(state, crtc, crtc_state, i) {
+		int ret;
+
 		min_cdclk = intel_crtc_compute_min_cdclk(crtc_state);
 		if (min_cdclk < 0)
 			return min_cdclk;

+		if (state->min_cdclk[i] == min_cdclk)
+			continue;
+
 		state->min_cdclk[i] = min_cdclk;
+
+		ret = intel_atomic_lock_global_state(state);
+		if (ret)
+			return ret;
 	}

 	min_cdclk = state->cdclk.force_min_cdclk;
@ -2034,7 +2059,7 @@ static int intel_compute_min_cdclk(struct intel_atomic_state *state)
 * future platforms this code will need to be
 * adjusted.
 */
-static u8 bxt_compute_min_voltage_level(struct intel_atomic_state *state)
+static int bxt_compute_min_voltage_level(struct intel_atomic_state *state)
 {
 	struct drm_i915_private *dev_priv = to_i915(state->base.dev);
 	struct intel_crtc *crtc;
@ -2047,11 +2072,21 @@ static u8 bxt_compute_min_voltage_level(struct intel_atomic_state *state)
 	       sizeof(state->min_voltage_level));

 	for_each_new_intel_crtc_in_state(state, crtc, crtc_state, i) {
+		int ret;
+
 		if (crtc_state->base.enable)
-			state->min_voltage_level[i] =
-				crtc_state->min_voltage_level;
+			min_voltage_level = crtc_state->min_voltage_level;
 		else
-			state->min_voltage_level[i] = 0;
+			min_voltage_level = 0;
+
+		if (state->min_voltage_level[i] == min_voltage_level)
+			continue;
+
+		state->min_voltage_level[i] = min_voltage_level;
+
+		ret = intel_atomic_lock_global_state(state);
+		if (ret)
+			return ret;
 	}

 	min_voltage_level = 0;
@ -2195,20 +2230,24 @@ static int skl_modeset_calc_cdclk(struct intel_atomic_state *state)
 static int bxt_modeset_calc_cdclk(struct intel_atomic_state *state)
 {
 	struct drm_i915_private *dev_priv = to_i915(state->base.dev);
-	int min_cdclk, cdclk, vco;
+	int min_cdclk, min_voltage_level, cdclk, vco;

 	min_cdclk = intel_compute_min_cdclk(state);
 	if (min_cdclk < 0)
 		return min_cdclk;

+	min_voltage_level = bxt_compute_min_voltage_level(state);
+	if (min_voltage_level < 0)
+		return min_voltage_level;
+
 	cdclk = bxt_calc_cdclk(dev_priv, min_cdclk);
 	vco = bxt_calc_cdclk_pll_vco(dev_priv, cdclk);

 	state->cdclk.logical.vco = vco;
 	state->cdclk.logical.cdclk = cdclk;
 	state->cdclk.logical.voltage_level =
-		max(dev_priv->display.calc_voltage_level(cdclk),
-		    bxt_compute_min_voltage_level(state));
+		max_t(int, min_voltage_level,
+		      dev_priv->display.calc_voltage_level(cdclk));

 	if (!state->active_pipes) {
 		cdclk = bxt_calc_cdclk(dev_priv, state->cdclk.force_min_cdclk);
@ -2225,23 +2264,6 @@ static int bxt_modeset_calc_cdclk(struct intel_atomic_state *state)
 	return 0;
 }

-static int intel_lock_all_pipes(struct intel_atomic_state *state)
-{
-	struct drm_i915_private *dev_priv = to_i915(state->base.dev);
-	struct intel_crtc *crtc;
-
-	/* Add all pipes to the state */
-	for_each_intel_crtc(&dev_priv->drm, crtc) {
-		struct intel_crtc_state *crtc_state;
-
-		crtc_state = intel_atomic_get_crtc_state(&state->base, crtc);
-		if (IS_ERR(crtc_state))
-			return PTR_ERR(crtc_state);
-	}
-
-	return 0;
-}
-
 static int intel_modeset_all_pipes(struct intel_atomic_state *state)
 {
 	struct drm_i915_private *dev_priv = to_i915(state->base.dev);
@ -2308,48 +2330,63 @@ int intel_modeset_calc_cdclk(struct intel_atomic_state *state)
 		return ret;

 	/*
-	 * Writes to dev_priv->cdclk.logical must protected by
-	 * holding all the crtc locks, even if we don't end up
+	 * Writes to dev_priv->cdclk.{actual,logical} must protected
+	 * by holding all the crtc mutexes even if we don't end up
 	 * touching the hardware
 	 */
-	if (intel_cdclk_changed(&dev_priv->cdclk.logical,
-				&state->cdclk.logical)) {
-		ret = intel_lock_all_pipes(state);
-		if (ret < 0)
+	if (intel_cdclk_changed(&dev_priv->cdclk.actual,
+				&state->cdclk.actual)) {
+		/*
+		 * Also serialize commits across all crtcs
+		 * if the actual hw needs to be poked.
+		 */
+		ret = intel_atomic_serialize_global_state(state);
+		if (ret)
 			return ret;
+	} else if (intel_cdclk_changed(&dev_priv->cdclk.logical,
+				       &state->cdclk.logical)) {
+		ret = intel_atomic_lock_global_state(state);
+		if (ret)
+			return ret;
+	} else {
+		return 0;
 	}

-	if (is_power_of_2(state->active_pipes)) {
+	if (is_power_of_2(state->active_pipes) &&
+	    intel_cdclk_needs_cd2x_update(dev_priv,
+					  &dev_priv->cdclk.actual,
+					  &state->cdclk.actual)) {
 		struct intel_crtc *crtc;
 		struct intel_crtc_state *crtc_state;

 		pipe = ilog2(state->active_pipes);
 		crtc = intel_get_crtc_for_pipe(dev_priv, pipe);
-		crtc_state = intel_atomic_get_new_crtc_state(state, crtc);
-		if (crtc_state &&
-		    drm_atomic_crtc_needs_modeset(&crtc_state->base))
+
+		crtc_state = intel_atomic_get_crtc_state(&state->base, crtc);
+		if (IS_ERR(crtc_state))
+			return PTR_ERR(crtc_state);
+
+		if (drm_atomic_crtc_needs_modeset(&crtc_state->base))
 			pipe = INVALID_PIPE;
 	} else {
 		pipe = INVALID_PIPE;
 	}

-	/* All pipes must be switched off while we change the cdclk. */
-	if (pipe != INVALID_PIPE &&
-	    intel_cdclk_needs_cd2x_update(dev_priv,
-					  &dev_priv->cdclk.actual,
-					  &state->cdclk.actual)) {
-		ret = intel_lock_all_pipes(state);
-		if (ret)
-			return ret;
-
+	if (pipe != INVALID_PIPE) {
 		state->cdclk.pipe = pipe;
+
+		DRM_DEBUG_KMS("Can change cdclk with pipe %c active\n",
+			      pipe_name(pipe));
 	} else if (intel_cdclk_needs_modeset(&dev_priv->cdclk.actual,
 					     &state->cdclk.actual)) {
+		/* All pipes must be switched off while we change the cdclk. */
 		ret = intel_modeset_all_pipes(state);
 		if (ret)
 			return ret;

 		state->cdclk.pipe = INVALID_PIPE;
+
+		DRM_DEBUG_KMS("Modeset required for cdclk change\n");
 	}

 	DRM_DEBUG_KMS("New cdclk calculated to be logical %u kHz, actual %u kHz\n",
--- a/drivers/gpu/drm/i915/display/intel_crt.c
+++ b/drivers/gpu/drm/i915/display/intel_crt.c
@ -844,7 +844,7 @@ load_detect:
 	}

 	/* for pre-945g platforms use load detect */
-	ret = intel_get_load_detect_pipe(connector, NULL, &tmp, ctx);
+	ret = intel_get_load_detect_pipe(connector, &tmp, ctx);
 	if (ret > 0) {
 		if (intel_crt_detect_ddc(connector))
 			status = connector_status_connected;
@ -864,6 +864,13 @@ load_detect:

 out:
 	intel_display_power_put(dev_priv, intel_encoder->power_domain, wakeref);
+
+	/*
+	 * Make sure the refs for power wells enabled during detect are
+	 * dropped to avoid a new detect cycle triggered by HPD polling.
+	 */
+	intel_display_power_flush_work(dev_priv);
+
 	return status;
 }

@ -994,9 +1001,9 @@ void intel_crt_init(struct drm_i915_private *dev_priv)
 	crt->base.type = INTEL_OUTPUT_ANALOG;
 	crt->base.cloneable = (1 << INTEL_OUTPUT_DVO) | (1 << INTEL_OUTPUT_HDMI);
 	if (IS_I830(dev_priv))
-		crt->base.crtc_mask = BIT(PIPE_A);
+		crt->base.pipe_mask = BIT(PIPE_A);
 	else
-		crt->base.crtc_mask = BIT(PIPE_A) | BIT(PIPE_B) | BIT(PIPE_C);
+		crt->base.pipe_mask = ~0;

 	if (IS_GEN(dev_priv, 2))
 		connector->interlace_allowed = 0;
--- a/drivers/gpu/drm/i915/display/intel_ddi.c
+++ b/drivers/gpu/drm/i915/display/intel_ddi.c
@ -1905,6 +1905,9 @@ intel_ddi_transcoder_func_reg_val_get(const struct intel_crtc_state *crtc_state)
 	} else if (intel_crtc_has_type(crtc_state, INTEL_OUTPUT_DP_MST)) {
 		temp |= TRANS_DDI_MODE_SELECT_DP_MST;
 		temp |= DDI_PORT_WIDTH(crtc_state->lane_count);
+
+		if (INTEL_GEN(dev_priv) >= 12)
+			temp |= TRANS_DDI_MST_TRANSPORT_SELECT(crtc_state->cpu_transcoder);
 	} else {
 		temp |= TRANS_DDI_MODE_SELECT_DP_SST;
 		temp |= DDI_PORT_WIDTH(crtc_state->lane_count);
@ -2234,7 +2237,7 @@ static void intel_ddi_get_power_domains(struct intel_encoder *encoder,
 	/*
 	 * VDSC power is needed when DSC is enabled
 	 */
-	if (crtc_state->dsc_params.compression_enable)
+	if (crtc_state->dsc.compression_enable)
 		intel_display_power_get(dev_priv,
 					intel_dsc_power_domain(crtc_state));
 }
@ -2838,6 +2841,8 @@ tgl_dkl_phy_ddi_vswing_sequence(struct intel_encoder *encoder, int link_clock,
 	for (ln = 0; ln < 2; ln++) {
 		I915_WRITE(HIP_INDEX_REG(tc_port), HIP_INDEX_VAL(tc_port, ln));

+		I915_WRITE(DKL_TX_PMD_LANE_SUS(tc_port), 0);
+
 		/* All the registers are RMW */
 		val = I915_READ(DKL_TX_DPCNTL0(tc_port));
 		val &= ~dpcnt_mask;
@ -3870,12 +3875,12 @@ static i915_reg_t
 gen9_chicken_trans_reg_by_port(struct drm_i915_private *dev_priv,
 			       enum port port)
 {
-	static const i915_reg_t regs[] = {
-		[PORT_A] = CHICKEN_TRANS_EDP,
-		[PORT_B] = CHICKEN_TRANS_A,
-		[PORT_C] = CHICKEN_TRANS_B,
-		[PORT_D] = CHICKEN_TRANS_C,
-		[PORT_E] = CHICKEN_TRANS_A,
+	static const enum transcoder trans[] = {
+		[PORT_A] = TRANSCODER_EDP,
+		[PORT_B] = TRANSCODER_A,
+		[PORT_C] = TRANSCODER_B,
+		[PORT_D] = TRANSCODER_C,
+		[PORT_E] = TRANSCODER_A,
 	};

 	WARN_ON(INTEL_GEN(dev_priv) < 9);
@ -3883,7 +3888,7 @@ gen9_chicken_trans_reg_by_port(struct drm_i915_private *dev_priv,
 	if (WARN_ON(port < PORT_A || port > PORT_E))
 		port = PORT_A;

-	return regs[port];
+	return CHICKEN_TRANS(trans[port]);
 }

 static void intel_enable_ddi_hdmi(struct intel_encoder *encoder,
@ -4683,7 +4688,6 @@ void intel_ddi_init(struct drm_i915_private *dev_priv, enum port port)
 	struct intel_encoder *intel_encoder;
 	struct drm_encoder *encoder;
 	bool init_hdmi, init_dp, init_lspcon = false;
-	enum pipe pipe;
 	enum phy phy = intel_port_to_phy(dev_priv, port);

 	init_hdmi = port_info->supports_dvi || port_info->supports_hdmi;
@ -4735,8 +4739,7 @@ void intel_ddi_init(struct drm_i915_private *dev_priv, enum port port)
 	intel_encoder->power_domain = intel_port_to_power_domain(port);
 	intel_encoder->port = port;
 	intel_encoder->cloneable = 0;
-	for_each_pipe(dev_priv, pipe)
-		intel_encoder->crtc_mask |= BIT(pipe);
+	intel_encoder->pipe_mask = ~0;

 	if (INTEL_GEN(dev_priv) >= 11)
 		intel_dig_port->saved_port_bits = I915_READ(DDI_BUF_CTL(port)) &
--- a/drivers/gpu/drm/i915/display/intel_display.c
+++ b/drivers/gpu/drm/i915/display/intel_display.c
@ -55,6 +55,8 @@
 #include "display/intel_tv.h"
 #include "display/intel_vdsc.h"

+#include "gt/intel_rps.h"
+
 #include "i915_drv.h"
 #include "i915_trace.h"
 #include "intel_acpi.h"
@ -88,7 +90,17 @@ static const u32 i8xx_primary_formats[] = {
 	DRM_FORMAT_XRGB8888,
 };

-/* Primary plane formats for gen >= 4 */
+/* Primary plane formats for ivb (no fp16 due to hw issue) */
+static const u32 ivb_primary_formats[] = {
+	DRM_FORMAT_C8,
+	DRM_FORMAT_RGB565,
+	DRM_FORMAT_XRGB8888,
+	DRM_FORMAT_XBGR8888,
+	DRM_FORMAT_XRGB2101010,
+	DRM_FORMAT_XBGR2101010,
+};
+
+/* Primary plane formats for gen >= 4, except ivb */
 static const u32 i965_primary_formats[] = {
 	DRM_FORMAT_C8,
 	DRM_FORMAT_RGB565,
@ -96,6 +108,7 @@ static const u32 i965_primary_formats[] = {
 	DRM_FORMAT_XBGR8888,
 	DRM_FORMAT_XRGB2101010,
 	DRM_FORMAT_XBGR2101010,
+	DRM_FORMAT_XBGR16161616F,
 };

 static const u64 i9xx_format_modifiers[] = {
@ -2971,6 +2984,8 @@ static int i9xx_format_to_fourcc(int format)
 		return DRM_FORMAT_XRGB2101010;
 	case DISPPLANE_RGBX101010:
 		return DRM_FORMAT_XBGR2101010;
+	case DISPPLANE_RGBX161616:
+		return DRM_FORMAT_XBGR16161616F;
 	}
 }

@ -3154,6 +3169,7 @@ static void intel_plane_disable_noatomic(struct intel_crtc *crtc,
 	intel_set_plane_visible(crtc_state, plane_state, false);
 	fixup_active_planes(crtc_state);
 	crtc_state->data_rate[plane->id] = 0;
+	crtc_state->min_cdclk[plane->id] = 0;

 	if (plane->id == PLANE_PRIMARY)
 		intel_pre_disable_primary_noatomic(&crtc->base);
@ -3577,6 +3593,53 @@ int skl_check_plane_surface(struct intel_plane_state *plane_state)
 	return 0;
 }

+static void i9xx_plane_ratio(const struct intel_crtc_state *crtc_state,
+			     const struct intel_plane_state *plane_state,
+			     unsigned int *num, unsigned int *den)
+{
+	const struct drm_framebuffer *fb = plane_state->base.fb;
+	unsigned int cpp = fb->format->cpp[0];
+
+	/*
+	 * g4x bspec says 64bpp pixel rate can't exceed 80%
+	 * of cdclk when the sprite plane is enabled on the
+	 * same pipe. ilk/snb bspec says 64bpp pixel rate is
+	 * never allowed to exceed 80% of cdclk. Let's just go
+	 * with the ilk/snb limit always.
+	 */
+	if (cpp == 8) {
+		*num = 10;
+		*den = 8;
+	} else {
+		*num = 1;
+		*den = 1;
+	}
+}
+
+static int i9xx_plane_min_cdclk(const struct intel_crtc_state *crtc_state,
+				const struct intel_plane_state *plane_state)
+{
+	unsigned int pixel_rate;
+	unsigned int num, den;
+
+	/*
+	 * Note that crtc_state->pixel_rate accounts for both
+	 * horizontal and vertical panel fitter downscaling factors.
+	 * Pre-HSW bspec tells us to only consider the horizontal
+	 * downscaling factor here. We ignore that and just consider
+	 * both for simplicity.
+	 */
+	pixel_rate = crtc_state->pixel_rate;
+
+	i9xx_plane_ratio(crtc_state, plane_state, &num, &den);
+
+	/* two pixels per clock with double wide pipe */
+	if (crtc_state->double_wide)
+		den *= 2;
+
+	return DIV_ROUND_UP(pixel_rate * num, den);
+}
+
 unsigned int
 i9xx_plane_max_stride(struct intel_plane *plane,
 		      u32 pixel_format, u64 modifier,
@ -3659,6 +3722,9 @@ static u32 i9xx_plane_ctl(const struct intel_crtc_state *crtc_state,
 	case DRM_FORMAT_XBGR2101010:
 		dspcntr |= DISPPLANE_RGBX101010;
 		break;
+	case DRM_FORMAT_XBGR16161616F:
+		dspcntr |= DISPPLANE_RGBX161616;
+		break;
 	default:
 		MISSING_CASE(fb->format->format);
 		return 0;
@ -3681,7 +3747,8 @@ int i9xx_check_plane_surface(struct intel_plane_state *plane_state)
 {
 	struct drm_i915_private *dev_priv =
 		to_i915(plane_state->base.plane->dev);
-	int src_x, src_y;
+	const struct drm_framebuffer *fb = plane_state->base.fb;
+	int src_x, src_y, src_w;
 	u32 offset;
 	int ret;

@ -3692,9 +3759,14 @@ int i9xx_check_plane_surface(struct intel_plane_state *plane_state)
 	if (!plane_state->base.visible)
 		return 0;

+	src_w = drm_rect_width(&plane_state->base.src) >> 16;
 	src_x = plane_state->base.src.x1 >> 16;
 	src_y = plane_state->base.src.y1 >> 16;

+	/* Undocumented hardware limit on i965/g4x/vlv/chv */
+	if (HAS_GMCH(dev_priv) && fb->format->cpp[0] == 8 && src_w > 2048)
+		return -EINVAL;
+
 	intel_add_fb_offsets(&src_x, &src_y, plane_state, 0);

 	if (INTEL_GEN(dev_priv) >= 4)
@ -5592,10 +5664,6 @@ static int skl_update_scaler_plane(struct intel_crtc_state *crtc_state,
 	case DRM_FORMAT_ARGB8888:
 	case DRM_FORMAT_XRGB2101010:
 	case DRM_FORMAT_XBGR2101010:
-	case DRM_FORMAT_XBGR16161616F:
-	case DRM_FORMAT_ABGR16161616F:
-	case DRM_FORMAT_XRGB16161616F:
-	case DRM_FORMAT_ARGB16161616F:
 	case DRM_FORMAT_YUYV:
 	case DRM_FORMAT_YVYU:
 	case DRM_FORMAT_UYVY:
@ -5611,6 +5679,13 @@ static int skl_update_scaler_plane(struct intel_crtc_state *crtc_state,
 	case DRM_FORMAT_XVYU12_16161616:
 	case DRM_FORMAT_XVYU16161616:
 		break;
+	case DRM_FORMAT_XBGR16161616F:
+	case DRM_FORMAT_ABGR16161616F:
+	case DRM_FORMAT_XRGB16161616F:
+	case DRM_FORMAT_ARGB16161616F:
+		if (INTEL_GEN(dev_priv) >= 11)
+			break;
+		/* fall through */
 	default:
 		DRM_DEBUG_KMS("[PLANE:%d:%s] FB:%d unsupported scaling format 0x%x\n",
 			      intel_plane->base.base.id, intel_plane->base.name,
@ -9359,7 +9434,6 @@ static bool wrpll_uses_pch_ssc(struct drm_i915_private *dev_priv,
 static void lpt_init_pch_refclk(struct drm_i915_private *dev_priv)
 {
 	struct intel_encoder *encoder;
-	bool pch_ssc_in_use = false;
 	bool has_fdi = false;

 	for_each_intel_encoder(&dev_priv->drm, encoder) {
@ -9387,22 +9461,24 @@ static void lpt_init_pch_refclk(struct drm_i915_private *dev_priv)
 	 * clock hierarchy. That would also allow us to do
 	 * clock bending finally.
 	 */
+	dev_priv->pch_ssc_use = 0;
+
 	if (spll_uses_pch_ssc(dev_priv)) {
 		DRM_DEBUG_KMS("SPLL using PCH SSC\n");
-		pch_ssc_in_use = true;
+		dev_priv->pch_ssc_use |= BIT(DPLL_ID_SPLL);
 	}

 	if (wrpll_uses_pch_ssc(dev_priv, DPLL_ID_WRPLL1)) {
 		DRM_DEBUG_KMS("WRPLL1 using PCH SSC\n");
-		pch_ssc_in_use = true;
+		dev_priv->pch_ssc_use |= BIT(DPLL_ID_WRPLL1);
 	}

 	if (wrpll_uses_pch_ssc(dev_priv, DPLL_ID_WRPLL2)) {
 		DRM_DEBUG_KMS("WRPLL2 using PCH SSC\n");
-		pch_ssc_in_use = true;
+		dev_priv->pch_ssc_use |= BIT(DPLL_ID_WRPLL2);
 	}

-	if (pch_ssc_in_use)
+	if (dev_priv->pch_ssc_use)
 		return;

 	if (has_fdi) {
@ -10871,7 +10947,7 @@ static void i845_update_cursor(struct intel_plane *plane,
 	unsigned long irqflags;

 	if (plane_state && plane_state->base.visible) {
-		unsigned int width = drm_rect_width(&plane_state->base.src);
+		unsigned int width = drm_rect_width(&plane_state->base.dst);
 		unsigned int height = drm_rect_height(&plane_state->base.dst);

 		cntl = plane_state->ctl |
@ -11252,7 +11328,6 @@ static int intel_modeset_disable_planes(struct drm_atomic_state *state,
 }

 int intel_get_load_detect_pipe(struct drm_connector *connector,
-			       const struct drm_display_mode *mode,
 			       struct intel_load_detect_pipe *old,
 			       struct drm_modeset_acquire_ctx *ctx)
 {
@ -11359,10 +11434,8 @@ found:

 	crtc_state->base.active = crtc_state->base.enable = true;

-	if (!mode)
-		mode = &load_detect_mode;
-
-	ret = drm_atomic_set_mode_for_crtc(&crtc_state->base, mode);
+	ret = drm_atomic_set_mode_for_crtc(&crtc_state->base,
+					   &load_detect_mode);
 	if (ret)
 		goto fail;

@ -11706,6 +11779,7 @@ int intel_plane_atomic_calc_changes(const struct intel_crtc_state *old_crtc_stat
 		plane_state->base.visible = visible = false;
 		crtc_state->active_planes &= ~BIT(plane->id);
 		crtc_state->data_rate[plane->id] = 0;
+		crtc_state->min_cdclk[plane->id] = 0;
 	}

 	if (!was_visible && !visible)
@ -12072,11 +12146,6 @@ static int intel_crtc_atomic_check(struct intel_atomic_state *state,
 	if (INTEL_GEN(dev_priv) >= 9) {
 		if (mode_changed || crtc_state->update_pipe)
 			ret = skl_update_scaler_crtc(crtc_state);
-
-		if (!ret)
-			ret = icl_check_nv12_planes(crtc_state);
-		if (!ret)
-			ret = skl_check_pipe_max_pixel_rate(crtc, crtc_state);
 		if (!ret)
 			ret = intel_atomic_setup_scalers(dev_priv, crtc,
 							 crtc_state);
@ -12425,6 +12494,12 @@ static bool check_digital_port_conflicts(struct intel_atomic_state *state)
 	unsigned int used_mst_ports = 0;
 	bool ret = true;

+	/*
+	 * We're going to peek into connector->state,
+	 * hence connection_mutex must be held.
+	 */
+	drm_modeset_lock_assert_held(&dev->mode_config.connection_mutex);
+
 	/*
 	 * Walk the connector list instead of the encoder
 	 * list to detect the problem on ddi platforms
@ -13712,11 +13787,6 @@ static int intel_modeset_checks(struct intel_atomic_state *state)
 	struct intel_crtc *crtc;
 	int ret, i;

-	if (!check_digital_port_conflicts(state)) {
-		DRM_DEBUG_KMS("rejecting conflicting digital port configuration\n");
-		return -EINVAL;
-	}
-
 	/* keep the current setting */
 	if (!state->cdclk.force_min_cdclk_changed)
 		state->cdclk.force_min_cdclk = dev_priv->cdclk.force_min_cdclk;
@ -13725,7 +13795,6 @@ static int intel_modeset_checks(struct intel_atomic_state *state)
 	state->active_pipes = dev_priv->active_pipes;
 	state->cdclk.logical = dev_priv->cdclk.logical;
 	state->cdclk.actual = dev_priv->cdclk.actual;
-	state->cdclk.pipe = INVALID_PIPE;

 	for_each_oldnew_intel_crtc_in_state(state, crtc, old_crtc_state,
 					    new_crtc_state, i) {
@ -13738,6 +13807,12 @@ static int intel_modeset_checks(struct intel_atomic_state *state)
 			state->active_pipe_changes |= BIT(crtc->pipe);
 	}

+	if (state->active_pipe_changes) {
+		ret = intel_atomic_lock_global_state(state);
+		if (ret)
+			return ret;
+	}
+
 	ret = intel_modeset_calc_cdclk(state);
 	if (ret)
 		return ret;
@ -13790,12 +13865,49 @@ static void intel_crtc_check_fastset(const struct intel_crtc_state *old_crtc_sta
 	new_crtc_state->has_drrs = old_crtc_state->has_drrs;
 }

-static int intel_atomic_check_planes(struct intel_atomic_state *state)
+static int intel_crtc_add_planes_to_state(struct intel_atomic_state *state,
+					  struct intel_crtc *crtc,
+					  u8 plane_ids_mask)
 {
+	struct drm_i915_private *dev_priv = to_i915(state->base.dev);
+	struct intel_plane *plane;
+
+	for_each_intel_plane_on_crtc(&dev_priv->drm, crtc, plane) {
+		struct intel_plane_state *plane_state;
+
+		if ((plane_ids_mask & BIT(plane->id)) == 0)
+			continue;
+
+		plane_state = intel_atomic_get_plane_state(state, plane);
+		if (IS_ERR(plane_state))
+			return PTR_ERR(plane_state);
+	}
+
+	return 0;
+}
+
+static bool active_planes_affects_min_cdclk(struct drm_i915_private *dev_priv)
+{
+	/* See {hsw,vlv,ivb}_plane_ratio() */
+	return IS_BROADWELL(dev_priv) || IS_HASWELL(dev_priv) ||
+		IS_CHERRYVIEW(dev_priv) || IS_VALLEYVIEW(dev_priv) ||
+		IS_IVYBRIDGE(dev_priv);
+}
+
+static int intel_atomic_check_planes(struct intel_atomic_state *state,
+				     bool *need_modeset)
+{
+	struct drm_i915_private *dev_priv = to_i915(state->base.dev);
+	struct intel_crtc_state *old_crtc_state, *new_crtc_state;
 	struct intel_plane_state *plane_state;
 	struct intel_plane *plane;
+	struct intel_crtc *crtc;
 	int i, ret;

+	ret = icl_add_linked_planes(state);
+	if (ret)
+		return ret;
+
 	for_each_new_intel_plane_in_state(state, plane, plane_state, i) {
 		ret = intel_plane_atomic_check(state, plane);
 		if (ret) {
@ -13805,6 +13917,41 @@ static int intel_atomic_check_planes(struct intel_atomic_state *state)
 		}
 	}

+	for_each_oldnew_intel_crtc_in_state(state, crtc, old_crtc_state,
+					    new_crtc_state, i) {
+		u8 old_active_planes, new_active_planes;
+
+		ret = icl_check_nv12_planes(new_crtc_state);
+		if (ret)
+			return ret;
+
+		/*
+		 * On some platforms the number of active planes affects
+		 * the planes' minimum cdclk calculation. Add such planes
+		 * to the state before we compute the minimum cdclk.
+		 */
+		if (!active_planes_affects_min_cdclk(dev_priv))
+			continue;
+
+		old_active_planes = old_crtc_state->active_planes & ~BIT(PLANE_CURSOR);
+		new_active_planes = new_crtc_state->active_planes & ~BIT(PLANE_CURSOR);
+
+		if (hweight8(old_active_planes) == hweight8(new_active_planes))
+			continue;
+
+		ret = intel_crtc_add_planes_to_state(state, crtc, new_active_planes);
+		if (ret)
+			return ret;
+	}
+
+	/*
+	 * active_planes bitmask has been updated, and potentially
+	 * affected planes are part of the state. We can now
+	 * compute the minimum cdclk for each plane.
+	 */
+	for_each_new_intel_plane_in_state(state, plane, plane_state, i)
+		*need_modeset |= intel_plane_calc_min_cdclk(state, plane);
+
 	return 0;
 }

@ -13839,7 +13986,7 @@ static int intel_atomic_check(struct drm_device *dev,
 	struct intel_crtc_state *old_crtc_state, *new_crtc_state;
 	struct intel_crtc *crtc;
 	int ret, i;
-	bool any_ms = state->cdclk.force_min_cdclk_changed;
+	bool any_ms = false;

 	/* Catch I915_MODE_FLAG_INHERITED */
 	for_each_oldnew_intel_crtc_in_state(state, crtc, old_crtc_state,
@ -13873,10 +14020,22 @@ static int intel_atomic_check(struct drm_device *dev,
 			any_ms = true;
 	}

+	if (any_ms && !check_digital_port_conflicts(state)) {
+		DRM_DEBUG_KMS("rejecting conflicting digital port configuration\n");
+		ret = EINVAL;
+		goto fail;
+	}
+
 	ret = drm_dp_mst_atomic_check(&state->base);
 	if (ret)
 		goto fail;

+	any_ms |= state->cdclk.force_min_cdclk_changed;
+
+	ret = intel_atomic_check_planes(state, &any_ms);
+	if (ret)
+		goto fail;
+
 	if (any_ms) {
 		ret = intel_modeset_checks(state);
 		if (ret)
@ -13885,14 +14044,6 @@ static int intel_atomic_check(struct drm_device *dev,
 		state->cdclk.logical = dev_priv->cdclk.logical;
 	}

-	ret = icl_add_linked_planes(state);
-	if (ret)
-		goto fail;
-
-	ret = intel_atomic_check_planes(state);
-	if (ret)
-		goto fail;
-
 	ret = intel_atomic_check_crtcs(state);
 	if (ret)
 		goto fail;
@ -13973,9 +14124,6 @@ static void intel_pipe_fastset(const struct intel_crtc_state *old_crtc_state,
 	struct intel_crtc *crtc = to_intel_crtc(new_crtc_state->base.crtc);
 	struct drm_i915_private *dev_priv = to_i915(crtc->base.dev);

-	/* drm_atomic_helper_update_legacy_modeset_state might not be called. */
-	crtc->base.mode = new_crtc_state->base.mode;
-
 	/*
 	 * Update pipe size and adjust fitter if needed: the reason for this is
 	 * that in compute_mode_changes we check the native mode (not the pfit
@ -14237,8 +14385,8 @@ static void intel_crtc_enable_trans_port_sync(struct intel_crtc *crtc,
 static void intel_set_dp_tp_ctl_normal(struct intel_crtc *crtc,
 				       struct intel_atomic_state *state)
 {
+	struct drm_connector *uninitialized_var(conn);
 	struct drm_connector_state *conn_state;
-	struct drm_connector *conn;
 	struct intel_dp *intel_dp;
 	int i;

@ -14670,6 +14818,14 @@ static void intel_atomic_track_fbs(struct intel_atomic_state *state)
 					plane->frontbuffer_bit);
 }

+static void assert_global_state_locked(struct drm_i915_private *dev_priv)
+{
+	struct intel_crtc *crtc;
+
+	for_each_intel_crtc(&dev_priv->drm, crtc)
+		drm_modeset_lock_assert_held(&crtc->base.mutex);
+}
+
 static int intel_atomic_commit(struct drm_device *dev,
 			       struct drm_atomic_state *_state,
 			       bool nonblock)
@ -14735,7 +14891,9 @@ static int intel_atomic_commit(struct drm_device *dev,
 	intel_shared_dpll_swap_state(state);
 	intel_atomic_track_fbs(state);

-	if (state->modeset) {
+	if (state->global_state_changed) {
+		assert_global_state_locked(dev_priv);
+
 		memcpy(dev_priv->min_cdclk, state->min_cdclk,
 		       sizeof(state->min_cdclk));
 		memcpy(dev_priv->min_voltage_level, state->min_voltage_level,
@ -14782,7 +14940,7 @@ static int do_rps_boost(struct wait_queue_entry *_wait,
 	 * vblank without our intervention, so leave RPS alone.
 	 */
 	if (!i915_request_started(rq))
-		gen6_rps_boost(rq);
+		intel_rps_boost(rq);
 	i915_request_put(rq);

 	drm_crtc_vblank_put(wait->crtc);
@ -14863,7 +15021,7 @@ static void intel_plane_unpin_fb(struct intel_plane_state *old_plane_state)
 static void fb_obj_bump_render_priority(struct drm_i915_gem_object *obj)
 {
 	struct i915_sched_attr attr = {
-		.priority = I915_PRIORITY_DISPLAY,
+		.priority = I915_USER_PRIORITY(I915_PRIORITY_DISPLAY),
 	};

 	i915_gem_object_wait_priority(obj, 0, &attr);
@ -14976,7 +15134,7 @@ intel_prepare_plane_fb(struct drm_plane *plane,
 	 * maximum clocks following a vblank miss (see do_rps_boost()).
 	 */
 	if (!intel_state->rps_interactive) {
-		intel_rps_mark_interactive(dev_priv, true);
+		intel_rps_mark_interactive(&dev_priv->gt.rps, true);
 		intel_state->rps_interactive = true;
 	}

@ -15001,7 +15159,7 @@ intel_cleanup_plane_fb(struct drm_plane *plane,
 	struct drm_i915_private *dev_priv = to_i915(plane->dev);

 	if (intel_state->rps_interactive) {
-		intel_rps_mark_interactive(dev_priv, false);
+		intel_rps_mark_interactive(&dev_priv->gt.rps, false);
 		intel_state->rps_interactive = false;
 	}

@ -15009,44 +15167,6 @@ intel_cleanup_plane_fb(struct drm_plane *plane,
 	intel_plane_unpin_fb(old_plane_state);
 }

-int
-skl_max_scale(const struct intel_crtc_state *crtc_state,
-	      const struct drm_format_info *format)
-{
-	struct intel_crtc *crtc = to_intel_crtc(crtc_state->base.crtc);
-	struct drm_i915_private *dev_priv = to_i915(crtc->base.dev);
-	int max_scale;
-	int crtc_clock, max_dotclk, tmpclk1, tmpclk2;
-
-	if (!crtc_state->base.enable)
-		return DRM_PLANE_HELPER_NO_SCALING;
-
-	crtc_clock = crtc_state->base.adjusted_mode.crtc_clock;
-	max_dotclk = to_intel_atomic_state(crtc_state->base.state)->cdclk.logical.cdclk;
-
-	if (IS_GEMINILAKE(dev_priv) || INTEL_GEN(dev_priv) >= 10)
-		max_dotclk *= 2;
-
-	if (WARN_ON_ONCE(!crtc_clock || max_dotclk < crtc_clock))
-		return DRM_PLANE_HELPER_NO_SCALING;
-
-	/*
-	 * skl max scale is lower of:
-	 *    close to 3 but not 3, -1 is for that purpose
-	 *            or
-	 *    cdclk/crtc_clock
-	 */
-	if (INTEL_GEN(dev_priv) >= 10 || IS_GEMINILAKE(dev_priv) ||
-	    !drm_format_info_is_yuv_semiplanar(format))
-		tmpclk1 = 0x30000 - 1;
-	else
-		tmpclk1 = 0x20000 - 1;
-	tmpclk2 = (1 << 8) * ((max_dotclk << 8) / crtc_clock);
-	max_scale = min(tmpclk1, tmpclk2);
-
-	return max_scale;
-}
-
 /**
 * intel_plane_destroy - destroy a plane
 * @plane: plane to destroy
@ -15101,6 +15221,7 @@ static bool i965_plane_format_mod_supported(struct drm_plane *_plane,
 	case DRM_FORMAT_XBGR8888:
 	case DRM_FORMAT_XRGB2101010:
 	case DRM_FORMAT_XBGR2101010:
+	case DRM_FORMAT_XBGR16161616F:
 		return modifier == DRM_FORMAT_MOD_LINEAR ||
 			modifier == I915_FORMAT_MOD_X_TILED;
 	default:
@ -15321,8 +15442,26 @@ intel_primary_plane_create(struct drm_i915_private *dev_priv, enum pipe pipe)
 	}

 	if (INTEL_GEN(dev_priv) >= 4) {
-		formats = i965_primary_formats;
-		num_formats = ARRAY_SIZE(i965_primary_formats);
+		/*
+		 * WaFP16GammaEnabling:ivb
+		 * "Workaround : When using the 64-bit format, the plane
+		 *  output on each color channel has one quarter amplitude.
+		 *  It can be brought up to full amplitude by using pipe
+		 *  gamma correction or pipe color space conversion to
+		 *  multiply the plane output by four."
+		 *
+		 * There is no dedicated plane gamma for the primary plane,
+		 * and using the pipe gamma/csc could conflict with other
+		 * planes, so we choose not to expose fp16 on IVB primary
+		 * planes. HSW primary planes no longer have this problem.
+		 */
+		if (IS_IVYBRIDGE(dev_priv)) {
+			formats = ivb_primary_formats;
+			num_formats = ARRAY_SIZE(ivb_primary_formats);
+		} else {
+			formats = i965_primary_formats;
+			num_formats = ARRAY_SIZE(i965_primary_formats);
+		}
 		modifiers = i9xx_format_modifiers;

 		plane->max_stride = i9xx_plane_max_stride;
@ -15331,6 +15470,15 @@ intel_primary_plane_create(struct drm_i915_private *dev_priv, enum pipe pipe)
 		plane->get_hw_state = i9xx_plane_get_hw_state;
 		plane->check_plane = i9xx_plane_check;

+		if (IS_BROADWELL(dev_priv) || IS_HASWELL(dev_priv))
+			plane->min_cdclk = hsw_plane_min_cdclk;
+		else if (IS_IVYBRIDGE(dev_priv))
+			plane->min_cdclk = ivb_plane_min_cdclk;
+		else if (IS_CHERRYVIEW(dev_priv) || IS_VALLEYVIEW(dev_priv))
+			plane->min_cdclk = vlv_plane_min_cdclk;
+		else
+			plane->min_cdclk = i9xx_plane_min_cdclk;
+
 		plane_funcs = &i965_plane_funcs;
 	} else {
 		formats = i8xx_primary_formats;
@ -15342,6 +15490,7 @@ intel_primary_plane_create(struct drm_i915_private *dev_priv, enum pipe pipe)
 		plane->disable_plane = i9xx_disable_plane;
 		plane->get_hw_state = i9xx_plane_get_hw_state;
 		plane->check_plane = i9xx_plane_check;
+		plane->min_cdclk = i9xx_plane_min_cdclk;

 		plane_funcs = &i8xx_plane_funcs;
 	}
@ -15693,7 +15842,7 @@ static u32 intel_encoder_possible_crtcs(struct intel_encoder *encoder)
 	u32 possible_crtcs = 0;

 	for_each_intel_crtc(dev, crtc) {
-		if (encoder->crtc_mask & BIT(crtc->pipe))
+		if (encoder->pipe_mask & BIT(crtc->pipe))
 			possible_crtcs |= drm_crtc_mask(&crtc->base);
 	}

@ -16294,6 +16443,21 @@ intel_mode_valid(struct drm_device *dev,
 	    mode->vtotal > vtotal_max)
 		return MODE_V_ILLEGAL;

+	if (INTEL_GEN(dev_priv) >= 5) {
+		if (mode->hdisplay < 64 ||
+		    mode->htotal - mode->hdisplay < 32)
+			return MODE_H_ILLEGAL;
+
+		if (mode->vtotal - mode->vdisplay < 5)
+			return MODE_V_ILLEGAL;
+	} else {
+		if (mode->htotal - mode->hdisplay < 32)
+			return MODE_H_ILLEGAL;
+
+		if (mode->vtotal - mode->vdisplay < 3)
+			return MODE_V_ILLEGAL;
+	}
+
 	return MODE_OK;
 }

@ -17224,13 +17388,16 @@ static void intel_modeset_readout_hw_state(struct drm_device *dev)
 		struct intel_plane *plane;
 		int min_cdclk = 0;

-		memset(&crtc->base.mode, 0, sizeof(crtc->base.mode));
 		if (crtc_state->base.active) {
-			intel_mode_from_pipe_config(&crtc->base.mode, crtc_state);
-			crtc->base.mode.hdisplay = crtc_state->pipe_src_w;
-			crtc->base.mode.vdisplay = crtc_state->pipe_src_h;
-			intel_mode_from_pipe_config(&crtc_state->base.adjusted_mode, crtc_state);
-			WARN_ON(drm_atomic_set_mode_for_crtc(&crtc_state->base, &crtc->base.mode));
+			struct drm_display_mode mode;
+
+			intel_mode_from_pipe_config(&crtc_state->base.adjusted_mode,
+						    crtc_state);
+
+			mode = crtc_state->base.adjusted_mode;
+			mode.hdisplay = crtc_state->pipe_src_w;
+			mode.vdisplay = crtc_state->pipe_src_h;
+			WARN_ON(drm_atomic_set_mode_for_crtc(&crtc_state->base, &mode));

 			/*
 			 * The initial mode needs to be set in order to keep
@ -17245,17 +17412,9 @@ static void intel_modeset_readout_hw_state(struct drm_device *dev)

 			intel_crtc_compute_pixel_rate(crtc_state);

-			min_cdclk = intel_crtc_compute_min_cdclk(crtc_state);
-			if (WARN_ON(min_cdclk < 0))
-				min_cdclk = 0;
-
 			intel_crtc_update_active_timings(crtc_state);
 		}

-		dev_priv->min_cdclk[crtc->pipe] = min_cdclk;
-		dev_priv->min_voltage_level[crtc->pipe] =
-			crtc_state->min_voltage_level;
-
 		for_each_intel_plane_on_crtc(&dev_priv->drm, crtc, plane) {
 			const struct intel_plane_state *plane_state =
 				to_intel_plane_state(plane->base.state);
@ -17267,8 +17426,34 @@ static void intel_modeset_readout_hw_state(struct drm_device *dev)
 			if (plane_state->base.visible)
 				crtc_state->data_rate[plane->id] =
 					4 * crtc_state->pixel_rate;
+			/*
+			 * FIXME don't have the fb yet, so can't
+			 * use plane->min_cdclk() :(
+			 */
+			if (plane_state->base.visible && plane->min_cdclk) {
+				if (crtc_state->double_wide ||
+				    INTEL_GEN(dev_priv) >= 10 || IS_GEMINILAKE(dev_priv))
+					crtc_state->min_cdclk[plane->id] =
+						DIV_ROUND_UP(crtc_state->pixel_rate, 2);
+				else
+					crtc_state->min_cdclk[plane->id] =
+						crtc_state->pixel_rate;
+			}
+			DRM_DEBUG_KMS("[PLANE:%d:%s] min_cdclk %d kHz\n",
+				      plane->base.base.id, plane->base.name,
+				      crtc_state->min_cdclk[plane->id]);
 		}

+		if (crtc_state->base.active) {
+			min_cdclk = intel_crtc_compute_min_cdclk(crtc_state);
+			if (WARN_ON(min_cdclk < 0))
+				min_cdclk = 0;
+		}
+
+		dev_priv->min_cdclk[crtc->pipe] = min_cdclk;
+		dev_priv->min_voltage_level[crtc->pipe] =
+			crtc_state->min_voltage_level;
+
 		intel_bw_crtc_update(bw_state, crtc_state);

 		intel_pipe_config_sanity_check(dev_priv, crtc_state);
--- a/drivers/gpu/drm/i915/display/intel_display.h
+++ b/drivers/gpu/drm/i915/display/intel_display.h
@ -509,7 +509,6 @@ void vlv_wait_port_ready(struct drm_i915_private *dev_priv,
 			 struct intel_digital_port *dport,
 			 unsigned int expected_mask);
 int intel_get_load_detect_pipe(struct drm_connector *connector,
-			       const struct drm_display_mode *mode,
 			       struct intel_load_detect_pipe *old,
 			       struct drm_modeset_acquire_ctx *ctx);
 void intel_release_load_detect_pipe(struct drm_connector *connector,
@ -563,8 +562,6 @@ void intel_crtc_arm_fifo_underrun(struct intel_crtc *crtc,

 u16 skl_scaler_calc_phase(int sub, int scale, bool chroma_center);
 int skl_update_scaler_crtc(struct intel_crtc_state *crtc_state);
-int skl_max_scale(const struct intel_crtc_state *crtc_state,
-		  const struct drm_format_info *format);
 u32 glk_plane_color_ctl(const struct intel_crtc_state *crtc_state,
 			const struct intel_plane_state *plane_state);
 u32 glk_plane_color_ctl_crtc(const struct intel_crtc_state *crtc_state);
--- a/drivers/gpu/drm/i915/display/intel_display_power.c
+++ b/drivers/gpu/drm/i915/display/intel_display_power.c
@ -2682,6 +2682,8 @@ void intel_display_power_put(struct drm_i915_private *dev_priv,
 	TGL_PW_2_POWER_DOMAINS |			\
 	BIT_ULL(POWER_DOMAIN_MODESET) |			\
 	BIT_ULL(POWER_DOMAIN_AUX_A) |			\
+	BIT_ULL(POWER_DOMAIN_AUX_B) |			\
+	BIT_ULL(POWER_DOMAIN_AUX_C) |			\
 	BIT_ULL(POWER_DOMAIN_INIT))

 #define TGL_DDI_IO_D_TC1_POWER_DOMAINS (	\
--- a/drivers/gpu/drm/i915/display/intel_display_types.h
+++ b/drivers/gpu/drm/i915/display/intel_display_types.h
@ -128,7 +128,8 @@ struct intel_encoder {

 	enum intel_output_type type;
 	enum port port;
-	unsigned int cloneable;
+	u16 cloneable;
+	u8 pipe_mask;
 	enum intel_hotplug_state (*hotplug)(struct intel_encoder *encoder,
 					    struct intel_connector *connector,
 					    bool irq_received);
@ -187,7 +188,6 @@ struct intel_encoder {
 	 * device interrupts are disabled.
 	 */
 	void (*suspend)(struct intel_encoder *);
-	int crtc_mask;
 	enum hpd_pin hpd_pin;
 	enum intel_display_power_domain power_domain;
 	/* for communication with audio component; protected by av_mutex */
@ -506,6 +506,14 @@ struct intel_atomic_state {

 	bool rps_interactive;

+	/*
+	 * active_pipes
+	 * min_cdclk[]
+	 * min_voltage_level[]
+	 * cdclk.*
+	 */
+	bool global_state_changed;
+
 	/* Gen9+ only */
 	struct skl_ddb_values wm_results;

@ -932,6 +940,8 @@ struct intel_crtc_state {

 	struct intel_crtc_wm_state wm;

+	int min_cdclk[I915_MAX_PLANES];
+
 	u32 data_rate[I915_MAX_PLANES];

 	/* Gamma mode programmed on the pipe */
@ -986,8 +996,8 @@ struct intel_crtc_state {
 		bool dsc_split;
 		u16 compressed_bpp;
 		u8 slice_count;
-	} dsc_params;
-	struct drm_dsc_config dp_dsc_cfg;
+		struct drm_dsc_config config;
+	} dsc;

 	/* Forward Error correction State */
 	bool fec_enable;
@ -1077,6 +1087,8 @@ struct intel_plane {
 	bool (*get_hw_state)(struct intel_plane *plane, enum pipe *pipe);
 	int (*check_plane)(struct intel_crtc_state *crtc_state,
 			   struct intel_plane_state *plane_state);
+	int (*min_cdclk)(const struct intel_crtc_state *crtc_state,
+			 const struct intel_plane_state *plane_state);
 };

 struct intel_watermark_params {
--- a/drivers/gpu/drm/i915/display/intel_dp.c
+++ b/drivers/gpu/drm/i915/display/intel_dp.c
@ -1179,18 +1179,20 @@ intel_dp_aux_wait_done(struct intel_dp *intel_dp)
 {
 	struct drm_i915_private *i915 = dp_to_i915(intel_dp);
 	i915_reg_t ch_ctl = intel_dp->aux_ch_ctl_reg(intel_dp);
+	const unsigned int timeout_ms = 10;
 	u32 status;
 	bool done;

 #define C (((status = intel_uncore_read_notrace(&i915->uncore, ch_ctl)) & DP_AUX_CH_CTL_SEND_BUSY) == 0)
 	done = wait_event_timeout(i915->gmbus_wait_queue, C,
-				  msecs_to_jiffies_timeout(10));
+				  msecs_to_jiffies_timeout(timeout_ms));

 	/* just trace the final value */
 	trace_i915_reg_rw(false, ch_ctl, status, sizeof(status), true);

 	if (!done)
-		DRM_ERROR("dp aux hw did not signal timeout!\n");
+		DRM_ERROR("%s did not complete or timeout within %ums (status 0x%08x)\n",
+			  intel_dp->aux.name, timeout_ms, status);
 #undef C

 	return status;
@ -1291,6 +1293,9 @@ static u32 skl_get_aux_send_ctl(struct intel_dp *intel_dp,
 				u32 unused)
 {
 	struct intel_digital_port *intel_dig_port = dp_to_dig_port(intel_dp);
+	struct drm_i915_private *i915 =
+			to_i915(intel_dig_port->base.base.dev);
+	enum phy phy = intel_port_to_phy(i915, intel_dig_port->base.port);
 	u32 ret;

 	ret = DP_AUX_CH_CTL_SEND_BUSY |
@ -1303,7 +1308,8 @@ static u32 skl_get_aux_send_ctl(struct intel_dp *intel_dp,
 	      DP_AUX_CH_CTL_FW_SYNC_PULSE_SKL(32) |
 	      DP_AUX_CH_CTL_SYNC_PULSE_SKL(32);

-	if (intel_dig_port->tc_mode == TC_PORT_TBT_ALT)
+	if (intel_phy_is_tc(i915, phy) &&
+	    intel_dig_port->tc_mode == TC_PORT_TBT_ALT)
 		ret |= DP_AUX_CH_CTL_TBT_IO;

 	return ret;
@ -1888,6 +1894,9 @@ static bool intel_dp_source_supports_dsc(struct intel_dp *intel_dp,
 {
 	struct drm_i915_private *dev_priv = dp_to_i915(intel_dp);

+	if (!INTEL_INFO(dev_priv)->display.has_dsc)
+		return false;
+
 	/* On TGL, DSC is supported on all Pipes */
 	if (INTEL_GEN(dev_priv) >= 12)
 		return true;
@ -2080,10 +2089,10 @@ static int intel_dp_dsc_compute_config(struct intel_dp *intel_dp,
 	pipe_config->lane_count = limits->max_lane_count;

 	if (intel_dp_is_edp(intel_dp)) {
-		pipe_config->dsc_params.compressed_bpp =
+		pipe_config->dsc.compressed_bpp =
 			min_t(u16, drm_edp_dsc_sink_output_bpp(intel_dp->dsc_dpcd) >> 4,
 			      pipe_config->pipe_bpp);
-		pipe_config->dsc_params.slice_count =
+		pipe_config->dsc.slice_count =
 			drm_dp_dsc_sink_max_slice_count(intel_dp->dsc_dpcd,
 							true);
 	} else {
@ -2104,10 +2113,10 @@ static int intel_dp_dsc_compute_config(struct intel_dp *intel_dp,
 			DRM_DEBUG_KMS("Compressed BPP/Slice Count not supported\n");
 			return -EINVAL;
 		}
-		pipe_config->dsc_params.compressed_bpp = min_t(u16,
+		pipe_config->dsc.compressed_bpp = min_t(u16,
 							       dsc_max_output_bpp >> 4,
 							       pipe_config->pipe_bpp);
-		pipe_config->dsc_params.slice_count = dsc_dp_slice_count;
+		pipe_config->dsc.slice_count = dsc_dp_slice_count;
 	}
 	/*
 	 * VDSC engine operates at 1 Pixel per clock, so if peak pixel rate
@ -2115,8 +2124,8 @@ static int intel_dp_dsc_compute_config(struct intel_dp *intel_dp,
 	 * then we need to use 2 VDSC instances.
 	 */
 	if (adjusted_mode->crtc_clock > dev_priv->max_cdclk_freq) {
-		if (pipe_config->dsc_params.slice_count > 1) {
-			pipe_config->dsc_params.dsc_split = true;
+		if (pipe_config->dsc.slice_count > 1) {
+			pipe_config->dsc.dsc_split = true;
 		} else {
 			DRM_DEBUG_KMS("Cannot split stream to use 2 VDSC instances\n");
 			return -EINVAL;
@ -2128,16 +2137,16 @@ static int intel_dp_dsc_compute_config(struct intel_dp *intel_dp,
 		DRM_DEBUG_KMS("Cannot compute valid DSC parameters for Input Bpp = %d "
 			      "Compressed BPP = %d\n",
 			      pipe_config->pipe_bpp,
-			      pipe_config->dsc_params.compressed_bpp);
+			      pipe_config->dsc.compressed_bpp);
 		return ret;
 	}

-	pipe_config->dsc_params.compression_enable = true;
+	pipe_config->dsc.compression_enable = true;
 	DRM_DEBUG_KMS("DP DSC computed with Input Bpp = %d "
 		      "Compressed Bpp = %d Slice Count = %d\n",
 		      pipe_config->pipe_bpp,
-		      pipe_config->dsc_params.compressed_bpp,
-		      pipe_config->dsc_params.slice_count);
+		      pipe_config->dsc.compressed_bpp,
+		      pipe_config->dsc.slice_count);

 	return 0;
 }
@ -2211,15 +2220,15 @@ intel_dp_compute_link_config(struct intel_encoder *encoder,
 			return ret;
 	}

-	if (pipe_config->dsc_params.compression_enable) {
+	if (pipe_config->dsc.compression_enable) {
 		DRM_DEBUG_KMS("DP lane count %d clock %d Input bpp %d Compressed bpp %d\n",
 			      pipe_config->lane_count, pipe_config->port_clock,
 			      pipe_config->pipe_bpp,
-			      pipe_config->dsc_params.compressed_bpp);
+			      pipe_config->dsc.compressed_bpp);

 		DRM_DEBUG_KMS("DP link rate required %i available %i\n",
 			      intel_dp_link_required(adjusted_mode->crtc_clock,
-						     pipe_config->dsc_params.compressed_bpp),
+						     pipe_config->dsc.compressed_bpp),
 			      intel_dp_max_data_rate(pipe_config->port_clock,
 						     pipe_config->lane_count));
 	} else {
@ -2377,8 +2386,8 @@ intel_dp_compute_config(struct intel_encoder *encoder,
 	pipe_config->limited_color_range =
 		intel_dp_limited_color_range(pipe_config, conn_state);

-	if (pipe_config->dsc_params.compression_enable)
-		output_bpp = pipe_config->dsc_params.compressed_bpp;
+	if (pipe_config->dsc.compression_enable)
+		output_bpp = pipe_config->dsc.compressed_bpp;
 	else
 		output_bpp = intel_dp_output_bpp(pipe_config, pipe_config->pipe_bpp);

@ -3102,7 +3111,7 @@ void intel_dp_sink_set_decompression_state(struct intel_dp *intel_dp,
 {
 	int ret;

-	if (!crtc_state->dsc_params.compression_enable)
+	if (!crtc_state->dsc.compression_enable)
 		return;

 	ret = drm_dp_dpcd_writeb(&intel_dp->aux, DP_DSC_ENABLE,
@ -5688,6 +5697,12 @@ out:
 	if (status != connector_status_connected && !intel_dp->is_mst)
 		intel_dp_unset_edid(intel_dp);

+	/*
+	 * Make sure the refs for power wells enabled during detect are
+	 * dropped to avoid a new detect cycle triggered by HPD polling.
+	 */
+	intel_display_power_flush_work(dev_priv);
+
 	return status;
 }

@ -7560,11 +7575,11 @@ bool intel_dp_init(struct drm_i915_private *dev_priv,
 	intel_encoder->power_domain = intel_port_to_power_domain(port);
 	if (IS_CHERRYVIEW(dev_priv)) {
 		if (port == PORT_D)
-			intel_encoder->crtc_mask = BIT(PIPE_C);
+			intel_encoder->pipe_mask = BIT(PIPE_C);
 		else
-			intel_encoder->crtc_mask = BIT(PIPE_A) | BIT(PIPE_B);
+			intel_encoder->pipe_mask = BIT(PIPE_A) | BIT(PIPE_B);
 	} else {
-		intel_encoder->crtc_mask = BIT(PIPE_A) | BIT(PIPE_B) | BIT(PIPE_C);
+		intel_encoder->pipe_mask = ~0;
 	}
 	intel_encoder->cloneable = 0;
 	intel_encoder->port = port;
--- a/drivers/gpu/drm/i915/display/intel_dp_mst.c
+++ b/drivers/gpu/drm/i915/display/intel_dp_mst.c
@ -600,8 +600,6 @@ intel_dp_create_fake_mst_encoder(struct intel_digital_port *intel_dig_port, enum
 	struct intel_dp_mst_encoder *intel_mst;
 	struct intel_encoder *intel_encoder;
 	struct drm_device *dev = intel_dig_port->base.base.dev;
-	struct drm_i915_private *dev_priv = to_i915(dev);
-	enum pipe pipe_iter;

 	intel_mst = kzalloc(sizeof(*intel_mst), GFP_KERNEL);

@ -619,8 +617,15 @@ intel_dp_create_fake_mst_encoder(struct intel_digital_port *intel_dig_port, enum
 	intel_encoder->power_domain = intel_dig_port->base.power_domain;
 	intel_encoder->port = intel_dig_port->base.port;
 	intel_encoder->cloneable = 0;
-	for_each_pipe(dev_priv, pipe_iter)
-		intel_encoder->crtc_mask |= BIT(pipe_iter);
+	/*
+	 * This is wrong, but broken userspace uses the intersection
+	 * of possible_crtcs of all the encoders of a given connector
+	 * to figure out which crtcs can drive said connector. What
+	 * should be used instead is the union of possible_crtcs.
+	 * To keep such userspace functioning we must misconfigure
+	 * this to make sure the intersection is not empty :(
+	 */
+	intel_encoder->pipe_mask = ~0;

 	intel_encoder->compute_config = intel_dp_mst_compute_config;
 	intel_encoder->disable = intel_mst_disable_dp;
--- a/drivers/gpu/drm/i915/display/intel_dpll_mgr.c
+++ b/drivers/gpu/drm/i915/display/intel_dpll_mgr.c
@ -526,16 +526,31 @@ static void hsw_ddi_wrpll_disable(struct drm_i915_private *dev_priv,
 	val = I915_READ(WRPLL_CTL(id));
 	I915_WRITE(WRPLL_CTL(id), val & ~WRPLL_PLL_ENABLE);
 	POSTING_READ(WRPLL_CTL(id));
+
+	/*
+	 * Try to set up the PCH reference clock once all DPLLs
+	 * that depend on it have been shut down.
+	 */
+	if (dev_priv->pch_ssc_use & BIT(id))
+		intel_init_pch_refclk(dev_priv);
 }

 static void hsw_ddi_spll_disable(struct drm_i915_private *dev_priv,
 				 struct intel_shared_dpll *pll)
 {
+	enum intel_dpll_id id = pll->info->id;
 	u32 val;

 	val = I915_READ(SPLL_CTL);
 	I915_WRITE(SPLL_CTL, val & ~SPLL_PLL_ENABLE);
 	POSTING_READ(SPLL_CTL);
+
+	/*
+	 * Try to set up the PCH reference clock once all DPLLs
+	 * that depend on it have been shut down.
+	 */
+	if (dev_priv->pch_ssc_use & BIT(id))
+		intel_init_pch_refclk(dev_priv);
 }

 static bool hsw_ddi_wrpll_get_hw_state(struct drm_i915_private *dev_priv,
--- a/drivers/gpu/drm/i915/display/intel_dpll_mgr.h
+++ b/drivers/gpu/drm/i915/display/intel_dpll_mgr.h
@ -147,11 +147,11 @@ enum intel_dpll_id {
 	 */
 	DPLL_ID_ICL_MGPLL4 = 6,
 	/**
-	 * @DPLL_ID_TGL_TCPLL5: TGL TC PLL port 5 (TC5)
+	 * @DPLL_ID_TGL_MGPLL5: TGL TC PLL port 5 (TC5)
 	 */
 	DPLL_ID_TGL_MGPLL5 = 7,
 	/**
-	 * @DPLL_ID_TGL_TCPLL6: TGL TC PLL port 6 (TC6)
+	 * @DPLL_ID_TGL_MGPLL6: TGL TC PLL port 6 (TC6)
 	 */
 	DPLL_ID_TGL_MGPLL6 = 8,
 };
@ -337,6 +337,11 @@ struct intel_shared_dpll {
 	 * @info: platform specific info
 	 */
 	const struct dpll_info *info;
+
+	/**
+	 * @wakeref: In some platforms a device-level runtime pm reference may
+	 * need to be grabbed to disable DC states while this DPLL is enabled
+	 */
 	intel_wakeref_t wakeref;
 };

--- a/drivers/gpu/drm/i915/display/intel_dvo.c
+++ b/drivers/gpu/drm/i915/display/intel_dvo.c
@ -505,7 +505,7 @@ void intel_dvo_init(struct drm_i915_private *dev_priv)
 		intel_encoder->type = INTEL_OUTPUT_DVO;
 		intel_encoder->power_domain = POWER_DOMAIN_PORT_OTHER;
 		intel_encoder->port = port;
-		intel_encoder->crtc_mask = BIT(PIPE_A) | BIT(PIPE_B);
+		intel_encoder->pipe_mask = ~0;

 		switch (dvo->type) {
 		case INTEL_DVO_CHIP_TMDS:
--- a/drivers/gpu/drm/i915/display/intel_hdcp.c
+++ b/drivers/gpu/drm/i915/display/intel_hdcp.c
@ -922,7 +922,7 @@ static void intel_hdcp_prop_work(struct work_struct *work)
 bool is_hdcp_supported(struct drm_i915_private *dev_priv, enum port port)
 {
 	/* PORT E doesn't have HDCP, and PORT F is disabled */
-	return INTEL_GEN(dev_priv) >= 9 && port < PORT_E;
+	return INTEL_INFO(dev_priv)->display.has_hdcp && port < PORT_E;
 }

 static int
--- a/drivers/gpu/drm/i915/display/intel_hdmi.c
+++ b/drivers/gpu/drm/i915/display/intel_hdmi.c
@ -2626,6 +2626,12 @@ out:
 	if (status != connector_status_connected)
 		cec_notifier_phys_addr_invalidate(intel_hdmi->cec_notifier);

+	/*
+	 * Make sure the refs for power wells enabled during detect are
+	 * dropped to avoid a new detect cycle triggered by HPD polling.
+	 */
+	intel_display_power_flush_work(dev_priv);
+
 	return status;
 }

@ -3277,11 +3283,11 @@ void intel_hdmi_init(struct drm_i915_private *dev_priv,
 	intel_encoder->port = port;
 	if (IS_CHERRYVIEW(dev_priv)) {
 		if (port == PORT_D)
-			intel_encoder->crtc_mask = BIT(PIPE_C);
+			intel_encoder->pipe_mask = BIT(PIPE_C);
 		else
-			intel_encoder->crtc_mask = BIT(PIPE_A) | BIT(PIPE_B);
+			intel_encoder->pipe_mask = BIT(PIPE_A) | BIT(PIPE_B);
 	} else {
-		intel_encoder->crtc_mask = BIT(PIPE_A) | BIT(PIPE_B) | BIT(PIPE_C);
+		intel_encoder->pipe_mask = ~0;
 	}
 	intel_encoder->cloneable = 1 << INTEL_OUTPUT_ANALOG;
 	/*
--- a/drivers/gpu/drm/i915/display/intel_lvds.c
+++ b/drivers/gpu/drm/i915/display/intel_lvds.c
@ -899,12 +899,10 @@ void intel_lvds_init(struct drm_i915_private *dev_priv)
 	intel_encoder->power_domain = POWER_DOMAIN_PORT_OTHER;
 	intel_encoder->port = PORT_NONE;
 	intel_encoder->cloneable = 0;
-	if (HAS_PCH_SPLIT(dev_priv))
-		intel_encoder->crtc_mask = BIT(PIPE_A) | BIT(PIPE_B) | BIT(PIPE_C);
-	else if (IS_GEN(dev_priv, 4))
-		intel_encoder->crtc_mask = BIT(PIPE_A) | BIT(PIPE_B);
+	if (INTEL_GEN(dev_priv) < 4)
+		intel_encoder->pipe_mask = BIT(PIPE_B);
 	else
-		intel_encoder->crtc_mask = BIT(PIPE_B);
+		intel_encoder->pipe_mask = ~0;

 	drm_connector_helper_add(connector, &intel_lvds_connector_helper_funcs);
 	connector->display_info.subpixel_order = SubPixelHorizontalRGB;
--- a/drivers/gpu/drm/i915/display/intel_overlay.c
+++ b/drivers/gpu/drm/i915/display/intel_overlay.c
@ -30,6 +30,7 @@
 #include <drm/i915_drm.h>

 #include "gem/i915_gem_pm.h"
+#include "gt/intel_ring.h"

 #include "i915_drv.h"
 #include "i915_reg.h"
--- a/drivers/gpu/drm/i915/display/intel_psr.c
+++ b/drivers/gpu/drm/i915/display/intel_psr.c
@ -76,7 +76,7 @@ static bool intel_psr2_enabled(struct drm_i915_private *dev_priv,
 			       const struct intel_crtc_state *crtc_state)
 {
 	/* Cannot enable DSC and PSR2 simultaneously */
-	WARN_ON(crtc_state->dsc_params.compression_enable &&
+	WARN_ON(crtc_state->dsc.compression_enable &&
 		crtc_state->has_psr2);

 	switch (dev_priv->psr.debug & I915_PSR_DEBUG_MODE_MASK) {
@ -623,7 +623,7 @@ static bool intel_psr2_config_valid(struct intel_dp *intel_dp,
 	 * resolution requires DSC to be enabled, priority is given to DSC
 	 * over PSR2.
 	 */
-	if (crtc_state->dsc_params.compression_enable) {
+	if (crtc_state->dsc.compression_enable) {
 		DRM_DEBUG_KMS("PSR2 cannot be enabled since DSC is enabled\n");
 		return false;
 	}
@ -740,25 +740,6 @@ static void intel_psr_activate(struct intel_dp *intel_dp)
 	dev_priv->psr.active = true;
 }

-static i915_reg_t gen9_chicken_trans_reg(struct drm_i915_private *dev_priv,
-					 enum transcoder cpu_transcoder)
-{
-	static const i915_reg_t regs[] = {
-		[TRANSCODER_A] = CHICKEN_TRANS_A,
-		[TRANSCODER_B] = CHICKEN_TRANS_B,
-		[TRANSCODER_C] = CHICKEN_TRANS_C,
-		[TRANSCODER_EDP] = CHICKEN_TRANS_EDP,
-	};
-
-	WARN_ON(INTEL_GEN(dev_priv) < 9);
-
-	if (WARN_ON(cpu_transcoder >= ARRAY_SIZE(regs) ||
-		    !regs[cpu_transcoder].reg))
-		cpu_transcoder = TRANSCODER_A;
-
-	return regs[cpu_transcoder];
-}
-
 static void intel_psr_enable_source(struct intel_dp *intel_dp,
 				    const struct intel_crtc_state *crtc_state)
 {
@ -774,8 +755,7 @@ static void intel_psr_enable_source(struct intel_dp *intel_dp,

 	if (dev_priv->psr.psr2_enabled && (IS_GEN(dev_priv, 9) &&
 					   !IS_GEMINILAKE(dev_priv))) {
-		i915_reg_t reg = gen9_chicken_trans_reg(dev_priv,
-							cpu_transcoder);
+		i915_reg_t reg = CHICKEN_TRANS(cpu_transcoder);
 		u32 chicken = I915_READ(reg);

 		chicken |= PSR2_VSC_ENABLE_PROG_HEADER |
@ -1437,7 +1417,7 @@ void intel_psr_short_pulse(struct intel_dp *intel_dp)
 	if (val & DP_PSR_VSC_SDP_UNCORRECTABLE_ERROR)
 		DRM_DEBUG_KMS("PSR VSC SDP uncorrectable error, disabling PSR\n");
 	if (val & DP_PSR_LINK_CRC_ERROR)
-		DRM_ERROR("PSR Link CRC error, disabling PSR\n");
+		DRM_DEBUG_KMS("PSR Link CRC error, disabling PSR\n");

 	if (val & ~errors)
 		DRM_ERROR("PSR_ERROR_STATUS unhandled errors %x\n",
--- a/drivers/gpu/drm/i915/display/intel_sdvo.c
+++ b/drivers/gpu/drm/i915/display/intel_sdvo.c
@ -2921,7 +2921,7 @@ intel_sdvo_output_setup(struct intel_sdvo *intel_sdvo, u16 flags)
 			      bytes[0], bytes[1]);
 		return false;
 	}
-	intel_sdvo->base.crtc_mask = BIT(PIPE_A) | BIT(PIPE_B) | BIT(PIPE_C);
+	intel_sdvo->base.pipe_mask = ~0;

 	return true;
 }
--- a/drivers/gpu/drm/i915/display/intel_sprite.c
+++ b/drivers/gpu/drm/i915/display/intel_sprite.c
@ -322,6 +322,55 @@ bool icl_is_hdr_plane(struct drm_i915_private *dev_priv, enum plane_id plane_id)
 		icl_hdr_plane_mask() & BIT(plane_id);
 }

+static void
+skl_plane_ratio(const struct intel_crtc_state *crtc_state,
+		const struct intel_plane_state *plane_state,
+		unsigned int *num, unsigned int *den)
+{
+	struct drm_i915_private *dev_priv = to_i915(plane_state->base.plane->dev);
+	const struct drm_framebuffer *fb = plane_state->base.fb;
+
+	if (fb->format->cpp[0] == 8) {
+		if (INTEL_GEN(dev_priv) >= 10 || IS_GEMINILAKE(dev_priv)) {
+			*num = 10;
+			*den = 8;
+		} else {
+			*num = 9;
+			*den = 8;
+		}
+	} else {
+		*num = 1;
+		*den = 1;
+	}
+}
+
+static int skl_plane_min_cdclk(const struct intel_crtc_state *crtc_state,
+			       const struct intel_plane_state *plane_state)
+{
+	struct drm_i915_private *dev_priv = to_i915(plane_state->base.plane->dev);
+	unsigned int pixel_rate = crtc_state->pixel_rate;
+	unsigned int src_w, src_h, dst_w, dst_h;
+	unsigned int num, den;
+
+	skl_plane_ratio(crtc_state, plane_state, &num, &den);
+
+	/* two pixels per clock on glk+ */
+	if (INTEL_GEN(dev_priv) >= 10 || IS_GEMINILAKE(dev_priv))
+		den *= 2;
+
+	src_w = drm_rect_width(&plane_state->base.src) >> 16;
+	src_h = drm_rect_height(&plane_state->base.src) >> 16;
+	dst_w = drm_rect_width(&plane_state->base.dst);
+	dst_h = drm_rect_height(&plane_state->base.dst);
+
+	/* Downscaling limits the maximum pixel rate */
+	dst_w = min(src_w, dst_w);
+	dst_h = min(src_h, dst_h);
+
+	return DIV64_U64_ROUND_UP(mul_u32_u32(pixel_rate * num, src_w * src_h),
+				  mul_u32_u32(den, dst_w * dst_h));
+}
+
 static unsigned int
 skl_plane_max_stride(struct intel_plane *plane,
 		     u32 pixel_format, u64 modifier,
@ -811,6 +860,85 @@ vlv_update_clrc(const struct intel_plane_state *plane_state)
 		      SP_SH_SIN(sh_sin) | SP_SH_COS(sh_cos));
 }

+static void
+vlv_plane_ratio(const struct intel_crtc_state *crtc_state,
+		const struct intel_plane_state *plane_state,
+		unsigned int *num, unsigned int *den)
+{
+	u8 active_planes = crtc_state->active_planes & ~BIT(PLANE_CURSOR);
+	const struct drm_framebuffer *fb = plane_state->base.fb;
+	unsigned int cpp = fb->format->cpp[0];
+
+	/*
+	 * VLV bspec only considers cases where all three planes are
+	 * enabled, and cases where the primary and one sprite is enabled.
+	 * Let's assume the case with just two sprites enabled also
+	 * maps to the latter case.
+	 */
+	if (hweight8(active_planes) == 3) {
+		switch (cpp) {
+		case 8:
+			*num = 11;
+			*den = 8;
+			break;
+		case 4:
+			*num = 18;
+			*den = 16;
+			break;
+		default:
+			*num = 1;
+			*den = 1;
+			break;
+		}
+	} else if (hweight8(active_planes) == 2) {
+		switch (cpp) {
+		case 8:
+			*num = 10;
+			*den = 8;
+			break;
+		case 4:
+			*num = 17;
+			*den = 16;
+			break;
+		default:
+			*num = 1;
+			*den = 1;
+			break;
+		}
+	} else {
+		switch (cpp) {
+		case 8:
+			*num = 10;
+			*den = 8;
+			break;
+		default:
+			*num = 1;
+			*den = 1;
+			break;
+		}
+	}
+}
+
+int vlv_plane_min_cdclk(const struct intel_crtc_state *crtc_state,
+			const struct intel_plane_state *plane_state)
+{
+	unsigned int pixel_rate;
+	unsigned int num, den;
+
+	/*
+	 * Note that crtc_state->pixel_rate accounts for both
+	 * horizontal and vertical panel fitter downscaling factors.
+	 * Pre-HSW bspec tells us to only consider the horizontal
+	 * downscaling factor here. We ignore that and just consider
+	 * both for simplicity.
+	 */
+	pixel_rate = crtc_state->pixel_rate;
+
+	vlv_plane_ratio(crtc_state, plane_state, &num, &den);
+
+	return DIV_ROUND_UP(pixel_rate * num, den);
+}
+
 static u32 vlv_sprite_ctl_crtc(const struct intel_crtc_state *crtc_state)
 {
 	u32 sprctl = 0;
@ -1017,6 +1145,164 @@ vlv_plane_get_hw_state(struct intel_plane *plane,
 	return ret;
 }

+static void ivb_plane_ratio(const struct intel_crtc_state *crtc_state,
+			    const struct intel_plane_state *plane_state,
+			    unsigned int *num, unsigned int *den)
+{
+	u8 active_planes = crtc_state->active_planes & ~BIT(PLANE_CURSOR);
+	const struct drm_framebuffer *fb = plane_state->base.fb;
+	unsigned int cpp = fb->format->cpp[0];
+
+	if (hweight8(active_planes) == 2) {
+		switch (cpp) {
+		case 8:
+			*num = 10;
+			*den = 8;
+			break;
+		case 4:
+			*num = 17;
+			*den = 16;
+			break;
+		default:
+			*num = 1;
+			*den = 1;
+			break;
+		}
+	} else {
+		switch (cpp) {
+		case 8:
+			*num = 9;
+			*den = 8;
+			break;
+		default:
+			*num = 1;
+			*den = 1;
+			break;
+		}
+	}
+}
+
+static void ivb_plane_ratio_scaling(const struct intel_crtc_state *crtc_state,
+				    const struct intel_plane_state *plane_state,
+				    unsigned int *num, unsigned int *den)
+{
+	const struct drm_framebuffer *fb = plane_state->base.fb;
+	unsigned int cpp = fb->format->cpp[0];
+
+	switch (cpp) {
+	case 8:
+		*num = 12;
+		*den = 8;
+		break;
+	case 4:
+		*num = 19;
+		*den = 16;
+		break;
+	case 2:
+		*num = 33;
+		*den = 32;
+		break;
+	default:
+		*num = 1;
+		*den = 1;
+		break;
+	}
+}
+
+int ivb_plane_min_cdclk(const struct intel_crtc_state *crtc_state,
+			const struct intel_plane_state *plane_state)
+{
+	unsigned int pixel_rate;
+	unsigned int num, den;
+
+	/*
+	 * Note that crtc_state->pixel_rate accounts for both
+	 * horizontal and vertical panel fitter downscaling factors.
+	 * Pre-HSW bspec tells us to only consider the horizontal
+	 * downscaling factor here. We ignore that and just consider
+	 * both for simplicity.
+	 */
+	pixel_rate = crtc_state->pixel_rate;
+
+	ivb_plane_ratio(crtc_state, plane_state, &num, &den);
+
+	return DIV_ROUND_UP(pixel_rate * num, den);
+}
+
+static int ivb_sprite_min_cdclk(const struct intel_crtc_state *crtc_state,
+				const struct intel_plane_state *plane_state)
+{
+	unsigned int src_w, dst_w, pixel_rate;
+	unsigned int num, den;
+
+	/*
+	 * Note that crtc_state->pixel_rate accounts for both
+	 * horizontal and vertical panel fitter downscaling factors.
+	 * Pre-HSW bspec tells us to only consider the horizontal
+	 * downscaling factor here. We ignore that and just consider
+	 * both for simplicity.
+	 */
+	pixel_rate = crtc_state->pixel_rate;
+
+	src_w = drm_rect_width(&plane_state->base.src) >> 16;
+	dst_w = drm_rect_width(&plane_state->base.dst);
+
+	if (src_w != dst_w)
+		ivb_plane_ratio_scaling(crtc_state, plane_state, &num, &den);
+	else
+		ivb_plane_ratio(crtc_state, plane_state, &num, &den);
+
+	/* Horizontal downscaling limits the maximum pixel rate */
+	dst_w = min(src_w, dst_w);
+
+	return DIV_ROUND_UP_ULL(mul_u32_u32(pixel_rate, num * src_w),
+				den * dst_w);
+}
+
+static void hsw_plane_ratio(const struct intel_crtc_state *crtc_state,
+			    const struct intel_plane_state *plane_state,
+			    unsigned int *num, unsigned int *den)
+{
+	u8 active_planes = crtc_state->active_planes & ~BIT(PLANE_CURSOR);
+	const struct drm_framebuffer *fb = plane_state->base.fb;
+	unsigned int cpp = fb->format->cpp[0];
+
+	if (hweight8(active_planes) == 2) {
+		switch (cpp) {
+		case 8:
+			*num = 10;
+			*den = 8;
+			break;
+		default:
+			*num = 1;
+			*den = 1;
+			break;
+		}
+	} else {
+		switch (cpp) {
+		case 8:
+			*num = 9;
+			*den = 8;
+			break;
+		default:
+			*num = 1;
+			*den = 1;
+			break;
+		}
+	}
+}
+
+int hsw_plane_min_cdclk(const struct intel_crtc_state *crtc_state,
+			const struct intel_plane_state *plane_state)
+{
+	unsigned int pixel_rate = crtc_state->pixel_rate;
+	unsigned int num, den;
+
+	hsw_plane_ratio(crtc_state, plane_state, &num, &den);
+
+	return DIV_ROUND_UP(pixel_rate * num, den);
+}
+
 static u32 ivb_sprite_ctl_crtc(const struct intel_crtc_state *crtc_state)
 {
 	u32 sprctl = 0;
@ -1030,6 +1316,16 @@ static u32 ivb_sprite_ctl_crtc(const struct intel_crtc_state *crtc_state)
 	return sprctl;
 }

+static bool ivb_need_sprite_gamma(const struct intel_plane_state *plane_state)
+{
+	struct drm_i915_private *dev_priv =
+		to_i915(plane_state->base.plane->dev);
+	const struct drm_framebuffer *fb = plane_state->base.fb;
+
+	return fb->format->cpp[0] == 8 &&
+		(IS_IVYBRIDGE(dev_priv) || IS_HASWELL(dev_priv));
+}
+
 static u32 ivb_sprite_ctl(const struct intel_crtc_state *crtc_state,
 			  const struct intel_plane_state *plane_state)
 {
@ -1052,6 +1348,12 @@ static u32 ivb_sprite_ctl(const struct intel_crtc_state *crtc_state,
 	case DRM_FORMAT_XRGB8888:
 		sprctl |= SPRITE_FORMAT_RGBX888;
 		break;
+	case DRM_FORMAT_XBGR16161616F:
+		sprctl |= SPRITE_FORMAT_RGBX161616 | SPRITE_RGB_ORDER_RGBX;
+		break;
+	case DRM_FORMAT_XRGB16161616F:
+		sprctl |= SPRITE_FORMAT_RGBX161616;
+		break;
 	case DRM_FORMAT_YUYV:
 		sprctl |= SPRITE_FORMAT_YUV422 | SPRITE_YUV_ORDER_YUYV;
 		break;
@ -1069,7 +1371,8 @@ static u32 ivb_sprite_ctl(const struct intel_crtc_state *crtc_state,
 		return 0;
 	}

-	sprctl |= SPRITE_INT_GAMMA_DISABLE;
+	if (!ivb_need_sprite_gamma(plane_state))
+		sprctl |= SPRITE_INT_GAMMA_DISABLE;

 	if (plane_state->base.color_encoding == DRM_COLOR_YCBCR_BT709)
 		sprctl |= SPRITE_YUV_TO_RGB_CSC_FORMAT_BT709;
@ -1091,12 +1394,26 @@ static u32 ivb_sprite_ctl(const struct intel_crtc_state *crtc_state,
 	return sprctl;
 }

-static void ivb_sprite_linear_gamma(u16 gamma[18])
+static void ivb_sprite_linear_gamma(const struct intel_plane_state *plane_state,
+				    u16 gamma[18])
 {
-	int i;
+	int scale, i;

-	for (i = 0; i < 17; i++)
-		gamma[i] = (i << 10) / 16;
+	/*
+	 * WaFP16GammaEnabling:ivb,hsw
+	 * "Workaround : When using the 64-bit format, the sprite output
+	 *  on each color channel has one quarter amplitude. It can be
+	 *  brought up to full amplitude by using sprite internal gamma
+	 *  correction, pipe gamma correction, or pipe color space
+	 *  conversion to multiply the sprite output by four."
+	 */
+	scale = 4;
+
+	for (i = 0; i < 16; i++)
+		gamma[i] = min((scale * i << 10) / 16, (1 << 10) - 1);
+
+	gamma[i] = min((scale * i << 10) / 16, 1 << 10);
+	i++;

 	gamma[i] = 3 << 10;
 	i++;
@ -1110,7 +1427,10 @@ static void ivb_update_gamma(const struct intel_plane_state *plane_state)
 	u16 gamma[18];
 	int i;

-	ivb_sprite_linear_gamma(gamma);
+	if (!ivb_need_sprite_gamma(plane_state))
+		return;
+
+	ivb_sprite_linear_gamma(plane_state, gamma);

 	/* FIXME these register are single buffered :( */
 	for (i = 0; i < 16; i++)
@ -1243,6 +1563,53 @@ ivb_plane_get_hw_state(struct intel_plane *plane,
 	return ret;
 }

+static int g4x_sprite_min_cdclk(const struct intel_crtc_state *crtc_state,
+				const struct intel_plane_state *plane_state)
+{
+	const struct drm_framebuffer *fb = plane_state->base.fb;
+	unsigned int hscale, pixel_rate;
+	unsigned int limit, decimate;
+
+	/*
+	 * Note that crtc_state->pixel_rate accounts for both
+	 * horizontal and vertical panel fitter downscaling factors.
+	 * Pre-HSW bspec tells us to only consider the horizontal
+	 * downscaling factor here. We ignore that and just consider
+	 * both for simplicity.
+	 */
+	pixel_rate = crtc_state->pixel_rate;
+
+	/* Horizontal downscaling limits the maximum pixel rate */
+	hscale = drm_rect_calc_hscale(&plane_state->base.src,
+				      &plane_state->base.dst,
+				      0, INT_MAX);
+	if (hscale < 0x10000)
+		return pixel_rate;
+
+	/* Decimation steps at 2x,4x,8x,16x */
+	decimate = ilog2(hscale >> 16);
+	hscale >>= decimate;
+
+	/* Starting limit is 90% of cdclk */
+	limit = 9;
+
+	/* -10% per decimation step */
+	limit -= decimate;
+
+	/* -10% for RGB */
+	if (fb->format->cpp[0] >= 4)
+		limit--; /* -10% for RGB */
+
+	/*
+	 * We should also do -10% if sprite scaling is enabled
+	 * on the other pipe, but we can't really check for that,
+	 * so we ignore it.
+	 */
+
+	return DIV_ROUND_UP_ULL(mul_u32_u32(pixel_rate, 10 * hscale),
+				limit << 16);
+}
+
 static unsigned int
 g4x_sprite_max_stride(struct intel_plane *plane,
 		      u32 pixel_format, u64 modifier,
@ -1286,6 +1653,12 @@ static u32 g4x_sprite_ctl(const struct intel_crtc_state *crtc_state,
 	case DRM_FORMAT_XRGB8888:
 		dvscntr |= DVS_FORMAT_RGBX888;
 		break;
+	case DRM_FORMAT_XBGR16161616F:
+		dvscntr |= DVS_FORMAT_RGBX161616 | DVS_RGB_ORDER_XBGR;
+		break;
+	case DRM_FORMAT_XRGB16161616F:
+		dvscntr |= DVS_FORMAT_RGBX161616;
+		break;
 	case DRM_FORMAT_YUYV:
 		dvscntr |= DVS_FORMAT_YUV422 | DVS_YUV_ORDER_YUYV;
 		break;
@ -1499,6 +1872,11 @@ static bool intel_fb_scalable(const struct drm_framebuffer *fb)
 	switch (fb->format->format) {
 	case DRM_FORMAT_C8:
 		return false;
+	case DRM_FORMAT_XRGB16161616F:
+	case DRM_FORMAT_ARGB16161616F:
+	case DRM_FORMAT_XBGR16161616F:
+	case DRM_FORMAT_ABGR16161616F:
+		return INTEL_GEN(to_i915(fb->dev)) >= 11;
 	default:
 		return true;
 	}
@ -1787,6 +2165,22 @@ static int skl_plane_check_nv12_rotation(const struct intel_plane_state *plane_s
 	return 0;
 }

+static int skl_plane_max_scale(struct drm_i915_private *dev_priv,
+			       const struct drm_framebuffer *fb)
+{
+	/*
+	 * We don't yet know the final source width nor
+	 * whether we can use the HQ scaler mode. Assume
+	 * the best case.
+	 * FIXME need to properly check this later.
+	 */
+	if (INTEL_GEN(dev_priv) >= 10 || IS_GEMINILAKE(dev_priv) ||
+	    !drm_format_info_is_yuv_semiplanar(fb->format))
+		return 0x30000 - 1;
+	else
+		return 0x20000 - 1;
+}
+
 static int skl_plane_check(struct intel_crtc_state *crtc_state,
 			   struct intel_plane_state *plane_state)
 {
@ -1804,7 +2198,7 @@ static int skl_plane_check(struct intel_crtc_state *crtc_state,
 	/* use scaler when colorkey is not required */
 	if (!plane_state->ckey.flags && intel_fb_scalable(fb)) {
 		min_scale = 1;
-		max_scale = skl_max_scale(crtc_state, fb->format);
+		max_scale = skl_plane_max_scale(dev_priv, fb);
 	}

 	ret = drm_atomic_helper_check_plane_state(&plane_state->base,
@ -1979,8 +2373,10 @@ static const u64 i9xx_plane_format_modifiers[] = {
 };

 static const u32 snb_plane_formats[] = {
-	DRM_FORMAT_XBGR8888,
 	DRM_FORMAT_XRGB8888,
+	DRM_FORMAT_XBGR8888,
+	DRM_FORMAT_XRGB16161616F,
+	DRM_FORMAT_XBGR16161616F,
 	DRM_FORMAT_YUYV,
 	DRM_FORMAT_YVYU,
 	DRM_FORMAT_UYVY,
@ -2010,6 +2406,8 @@ static const u32 skl_plane_formats[] = {
 	DRM_FORMAT_ABGR8888,
 	DRM_FORMAT_XRGB2101010,
 	DRM_FORMAT_XBGR2101010,
+	DRM_FORMAT_XRGB16161616F,
+	DRM_FORMAT_XBGR16161616F,
 	DRM_FORMAT_YUYV,
 	DRM_FORMAT_YVYU,
 	DRM_FORMAT_UYVY,
@ -2025,6 +2423,8 @@ static const u32 skl_planar_formats[] = {
 	DRM_FORMAT_ABGR8888,
 	DRM_FORMAT_XRGB2101010,
 	DRM_FORMAT_XBGR2101010,
+	DRM_FORMAT_XRGB16161616F,
+	DRM_FORMAT_XBGR16161616F,
 	DRM_FORMAT_YUYV,
 	DRM_FORMAT_YVYU,
 	DRM_FORMAT_UYVY,
@ -2041,6 +2441,8 @@ static const u32 glk_planar_formats[] = {
 	DRM_FORMAT_ABGR8888,
 	DRM_FORMAT_XRGB2101010,
 	DRM_FORMAT_XBGR2101010,
+	DRM_FORMAT_XRGB16161616F,
+	DRM_FORMAT_XBGR16161616F,
 	DRM_FORMAT_YUYV,
 	DRM_FORMAT_YVYU,
 	DRM_FORMAT_UYVY,
@ -2191,6 +2593,8 @@ static bool snb_sprite_format_mod_supported(struct drm_plane *_plane,
 	switch (format) {
 	case DRM_FORMAT_XRGB8888:
 	case DRM_FORMAT_XBGR8888:
+	case DRM_FORMAT_XRGB16161616F:
+	case DRM_FORMAT_XBGR16161616F:
 	case DRM_FORMAT_YUYV:
 	case DRM_FORMAT_YVYU:
 	case DRM_FORMAT_UYVY:
@ -2511,6 +2915,7 @@ skl_universal_plane_create(struct drm_i915_private *dev_priv,
 	plane->disable_plane = skl_disable_plane;
 	plane->get_hw_state = skl_plane_get_hw_state;
 	plane->check_plane = skl_plane_check;
+	plane->min_cdclk = skl_plane_min_cdclk;
 	if (icl_is_nv12_y_plane(plane_id))
 		plane->update_slave = icl_update_slave;

@ -2618,6 +3023,7 @@ intel_sprite_plane_create(struct drm_i915_private *dev_priv,
 		plane->disable_plane = vlv_disable_plane;
 		plane->get_hw_state = vlv_plane_get_hw_state;
 		plane->check_plane = vlv_sprite_check;
+		plane->min_cdclk = vlv_plane_min_cdclk;

 		formats = vlv_plane_formats;
 		num_formats = ARRAY_SIZE(vlv_plane_formats);
@ -2631,6 +3037,11 @@ intel_sprite_plane_create(struct drm_i915_private *dev_priv,
 		plane->get_hw_state = ivb_plane_get_hw_state;
 		plane->check_plane = g4x_sprite_check;

+		if (IS_BROADWELL(dev_priv) || IS_HASWELL(dev_priv))
+			plane->min_cdclk = hsw_plane_min_cdclk;
+		else
+			plane->min_cdclk = ivb_sprite_min_cdclk;
+
 		formats = snb_plane_formats;
 		num_formats = ARRAY_SIZE(snb_plane_formats);
 		modifiers = i9xx_plane_format_modifiers;
@ -2642,6 +3053,7 @@ intel_sprite_plane_create(struct drm_i915_private *dev_priv,
 		plane->disable_plane = g4x_disable_plane;
 		plane->get_hw_state = g4x_plane_get_hw_state;
 		plane->check_plane = g4x_sprite_check;
+		plane->min_cdclk = g4x_sprite_min_cdclk;

 		modifiers = i9xx_plane_format_modifiers;
 		if (IS_GEN(dev_priv, 6)) {
--- a/drivers/gpu/drm/i915/display/intel_sprite.h
+++ b/drivers/gpu/drm/i915/display/intel_sprite.h
@ -49,4 +49,11 @@ static inline u8 icl_hdr_plane_mask(void)

 bool icl_is_hdr_plane(struct drm_i915_private *dev_priv, enum plane_id plane_id);

+int ivb_plane_min_cdclk(const struct intel_crtc_state *crtc_state,
+			const struct intel_plane_state *plane_state);
+int hsw_plane_min_cdclk(const struct intel_crtc_state *crtc_state,
+			const struct intel_plane_state *plane_state);
+int vlv_plane_min_cdclk(const struct intel_crtc_state *crtc_state,
+			const struct intel_plane_state *plane_state);
+
 #endif /* __INTEL_SPRITE_H__ */
--- a/drivers/gpu/drm/i915/display/intel_tv.c
+++ b/drivers/gpu/drm/i915/display/intel_tv.c
@ -1701,7 +1701,7 @@ intel_tv_detect(struct drm_connector *connector,
 		struct intel_load_detect_pipe tmp;
 		int ret;

-		ret = intel_get_load_detect_pipe(connector, NULL, &tmp, ctx);
+		ret = intel_get_load_detect_pipe(connector, &tmp, ctx);
 		if (ret < 0)
 			return ret;

@ -1947,7 +1947,7 @@ intel_tv_init(struct drm_i915_private *dev_priv)
 	intel_encoder->type = INTEL_OUTPUT_TVOUT;
 	intel_encoder->power_domain = POWER_DOMAIN_PORT_OTHER;
 	intel_encoder->port = PORT_NONE;
-	intel_encoder->crtc_mask = BIT(PIPE_A) | BIT(PIPE_B);
+	intel_encoder->pipe_mask = ~0;
 	intel_encoder->cloneable = 0;
 	intel_tv->type = DRM_MODE_CONNECTOR_Unknown;

--- a/drivers/gpu/drm/i915/display/intel_vbt_defs.h
+++ b/drivers/gpu/drm/i915/display/intel_vbt_defs.h
@ -114,6 +114,7 @@ enum bdb_block_id {
 	BDB_LVDS_POWER			= 44,
 	BDB_MIPI_CONFIG			= 52,
 	BDB_MIPI_SEQUENCE		= 53,
+	BDB_COMPRESSION_PARAMETERS	= 56,
 	BDB_SKIP			= 254, /* VBIOS private block, ignore */
 };

@ -811,4 +812,55 @@ struct bdb_mipi_sequence {
 	u8 data[0]; /* up to 6 variable length blocks */
 } __packed;

+/*
+ * Block 56 - Compression Parameters
+ */
+
+#define VBT_RC_BUFFER_BLOCK_SIZE_1KB	0
+#define VBT_RC_BUFFER_BLOCK_SIZE_4KB	1
+#define VBT_RC_BUFFER_BLOCK_SIZE_16KB	2
+#define VBT_RC_BUFFER_BLOCK_SIZE_64KB	3
+
+#define VBT_DSC_LINE_BUFFER_DEPTH(vbt_value)	((vbt_value) + 8) /* bits */
+#define VBT_DSC_MAX_BPP(vbt_value)		(6 + (vbt_value) * 2)
+
+struct dsc_compression_parameters_entry {
+	u8 version_major:4;
+	u8 version_minor:4;
+
+	u8 rc_buffer_block_size:2;
+	u8 reserved1:6;
+
+	/*
+	 * Buffer size in bytes:
+	 *
+	 * 4 ^ rc_buffer_block_size * 1024 * (rc_buffer_size + 1) bytes
+	 */
+	u8 rc_buffer_size;
+	u32 slices_per_line;
+
+	u8 line_buffer_depth:4;
+	u8 reserved2:4;
+
+	/* Flag Bits 1 */
+	u8 block_prediction_enable:1;
+	u8 reserved3:7;
+
+	u8 max_bpp; /* mapping */
+
+	/* Color depth capabilities */
+	u8 reserved4:1;
+	u8 support_8bpc:1;
+	u8 support_10bpc:1;
+	u8 support_12bpc:1;
+	u8 reserved5:4;
+
+	u16 slice_height;
+} __packed;
+
+struct bdb_compression_parameters {
+	u16 entry_size;
+	struct dsc_compression_parameters_entry data[16];
+} __packed;
+
 #endif /* _INTEL_VBT_DEFS_H_ */
--- a/drivers/gpu/drm/i915/display/intel_vdsc.c
+++ b/drivers/gpu/drm/i915/display/intel_vdsc.c
@ -322,8 +322,8 @@ static int get_column_index_for_rc_params(u8 bits_per_component)
 int intel_dp_compute_dsc_params(struct intel_dp *intel_dp,
 				struct intel_crtc_state *pipe_config)
 {
-	struct drm_dsc_config *vdsc_cfg = &pipe_config->dp_dsc_cfg;
-	u16 compressed_bpp = pipe_config->dsc_params.compressed_bpp;
+	struct drm_dsc_config *vdsc_cfg = &pipe_config->dsc.config;
+	u16 compressed_bpp = pipe_config->dsc.compressed_bpp;
 	u8 i = 0;
 	int row_index = 0;
 	int column_index = 0;
@ -332,7 +332,7 @@ int intel_dp_compute_dsc_params(struct intel_dp *intel_dp,
 	vdsc_cfg->pic_width = pipe_config->base.adjusted_mode.crtc_hdisplay;
 	vdsc_cfg->pic_height = pipe_config->base.adjusted_mode.crtc_vdisplay;
 	vdsc_cfg->slice_width = DIV_ROUND_UP(vdsc_cfg->pic_width,
-					     pipe_config->dsc_params.slice_count);
+					     pipe_config->dsc.slice_count);
 	/*
 	 * Slice Height of 8 works for all currently available panels. So start
 	 * with that if pic_height is an integral multiple of 8.
@ -485,13 +485,13 @@ static void intel_configure_pps_for_dsc_encoder(struct intel_encoder *encoder,
 {
 	struct intel_crtc *crtc = to_intel_crtc(crtc_state->base.crtc);
 	struct drm_i915_private *dev_priv = to_i915(encoder->base.dev);
-	const struct drm_dsc_config *vdsc_cfg = &crtc_state->dp_dsc_cfg;
+	const struct drm_dsc_config *vdsc_cfg = &crtc_state->dsc.config;
 	enum pipe pipe = crtc->pipe;
 	enum transcoder cpu_transcoder = crtc_state->cpu_transcoder;
 	u32 pps_val = 0;
 	u32 rc_buf_thresh_dword[4];
 	u32 rc_range_params_dword[8];
-	u8 num_vdsc_instances = (crtc_state->dsc_params.dsc_split) ? 2 : 1;
+	u8 num_vdsc_instances = (crtc_state->dsc.dsc_split) ? 2 : 1;
 	int i = 0;

 	/* Populate PICTURE_PARAMETER_SET_0 registers */
@ -514,11 +514,11 @@ static void intel_configure_pps_for_dsc_encoder(struct intel_encoder *encoder,
 		 * If 2 VDSC instances are needed, configure PPS for second
 		 * VDSC
 		 */
-		if (crtc_state->dsc_params.dsc_split)
+		if (crtc_state->dsc.dsc_split)
 			I915_WRITE(DSCC_PICTURE_PARAMETER_SET_0, pps_val);
 	} else {
 		I915_WRITE(ICL_DSC0_PICTURE_PARAMETER_SET_0(pipe), pps_val);
-		if (crtc_state->dsc_params.dsc_split)
+		if (crtc_state->dsc.dsc_split)
 			I915_WRITE(ICL_DSC1_PICTURE_PARAMETER_SET_0(pipe),
 				   pps_val);
 	}
@ -533,11 +533,11 @@ static void intel_configure_pps_for_dsc_encoder(struct intel_encoder *encoder,
 		 * If 2 VDSC instances are needed, configure PPS for second
 		 * VDSC
 		 */
-		if (crtc_state->dsc_params.dsc_split)
+		if (crtc_state->dsc.dsc_split)
 			I915_WRITE(DSCC_PICTURE_PARAMETER_SET_1, pps_val);
 	} else {
 		I915_WRITE(ICL_DSC0_PICTURE_PARAMETER_SET_1(pipe), pps_val);
-		if (crtc_state->dsc_params.dsc_split)
+		if (crtc_state->dsc.dsc_split)
 			I915_WRITE(ICL_DSC1_PICTURE_PARAMETER_SET_1(pipe),
 				   pps_val);
 	}
@ -553,11 +553,11 @@ static void intel_configure_pps_for_dsc_encoder(struct intel_encoder *encoder,
 		 * If 2 VDSC instances are needed, configure PPS for second
 		 * VDSC
 		 */
-		if (crtc_state->dsc_params.dsc_split)
+		if (crtc_state->dsc.dsc_split)
 			I915_WRITE(DSCC_PICTURE_PARAMETER_SET_2, pps_val);
 	} else {
 		I915_WRITE(ICL_DSC0_PICTURE_PARAMETER_SET_2(pipe), pps_val);
-		if (crtc_state->dsc_params.dsc_split)
+		if (crtc_state->dsc.dsc_split)
 			I915_WRITE(ICL_DSC1_PICTURE_PARAMETER_SET_2(pipe),
 				   pps_val);
 	}
@ -573,11 +573,11 @@ static void intel_configure_pps_for_dsc_encoder(struct intel_encoder *encoder,
 		 * If 2 VDSC instances are needed, configure PPS for second
 		 * VDSC
 		 */
-		if (crtc_state->dsc_params.dsc_split)
+		if (crtc_state->dsc.dsc_split)
 			I915_WRITE(DSCC_PICTURE_PARAMETER_SET_3, pps_val);
 	} else {
 		I915_WRITE(ICL_DSC0_PICTURE_PARAMETER_SET_3(pipe), pps_val);
-		if (crtc_state->dsc_params.dsc_split)
+		if (crtc_state->dsc.dsc_split)
 			I915_WRITE(ICL_DSC1_PICTURE_PARAMETER_SET_3(pipe),
 				   pps_val);
 	}
@ -593,11 +593,11 @@ static void intel_configure_pps_for_dsc_encoder(struct intel_encoder *encoder,
 		 * If 2 VDSC instances are needed, configure PPS for second
 		 * VDSC
 		 */
-		if (crtc_state->dsc_params.dsc_split)
+		if (crtc_state->dsc.dsc_split)
 			I915_WRITE(DSCC_PICTURE_PARAMETER_SET_4, pps_val);
 	} else {
 		I915_WRITE(ICL_DSC0_PICTURE_PARAMETER_SET_4(pipe), pps_val);
-		if (crtc_state->dsc_params.dsc_split)
+		if (crtc_state->dsc.dsc_split)
 			I915_WRITE(ICL_DSC1_PICTURE_PARAMETER_SET_4(pipe),
 				   pps_val);
 	}
@ -613,11 +613,11 @@ static void intel_configure_pps_for_dsc_encoder(struct intel_encoder *encoder,
 		 * If 2 VDSC instances are needed, configure PPS for second
 		 * VDSC
 		 */
-		if (crtc_state->dsc_params.dsc_split)
+		if (crtc_state->dsc.dsc_split)
 			I915_WRITE(DSCC_PICTURE_PARAMETER_SET_5, pps_val);
 	} else {
 		I915_WRITE(ICL_DSC0_PICTURE_PARAMETER_SET_5(pipe), pps_val);
-		if (crtc_state->dsc_params.dsc_split)
+		if (crtc_state->dsc.dsc_split)
 			I915_WRITE(ICL_DSC1_PICTURE_PARAMETER_SET_5(pipe),
 				   pps_val);
 	}
@ -635,11 +635,11 @@ static void intel_configure_pps_for_dsc_encoder(struct intel_encoder *encoder,
 		 * If 2 VDSC instances are needed, configure PPS for second
 		 * VDSC
 		 */
-		if (crtc_state->dsc_params.dsc_split)
+		if (crtc_state->dsc.dsc_split)
 			I915_WRITE(DSCC_PICTURE_PARAMETER_SET_6, pps_val);
 	} else {
 		I915_WRITE(ICL_DSC0_PICTURE_PARAMETER_SET_6(pipe), pps_val);
-		if (crtc_state->dsc_params.dsc_split)
+		if (crtc_state->dsc.dsc_split)
 			I915_WRITE(ICL_DSC1_PICTURE_PARAMETER_SET_6(pipe),
 				   pps_val);
 	}
@ -655,11 +655,11 @@ static void intel_configure_pps_for_dsc_encoder(struct intel_encoder *encoder,
 		 * If 2 VDSC instances are needed, configure PPS for second
 		 * VDSC
 		 */
-		if (crtc_state->dsc_params.dsc_split)
+		if (crtc_state->dsc.dsc_split)
 			I915_WRITE(DSCC_PICTURE_PARAMETER_SET_7, pps_val);
 	} else {
 		I915_WRITE(ICL_DSC0_PICTURE_PARAMETER_SET_7(pipe), pps_val);
-		if (crtc_state->dsc_params.dsc_split)
+		if (crtc_state->dsc.dsc_split)
 			I915_WRITE(ICL_DSC1_PICTURE_PARAMETER_SET_7(pipe),
 				   pps_val);
 	}
@ -675,11 +675,11 @@ static void intel_configure_pps_for_dsc_encoder(struct intel_encoder *encoder,
 		 * If 2 VDSC instances are needed, configure PPS for second
 		 * VDSC
 		 */
-		if (crtc_state->dsc_params.dsc_split)
+		if (crtc_state->dsc.dsc_split)
 			I915_WRITE(DSCC_PICTURE_PARAMETER_SET_8, pps_val);
 	} else {
 		I915_WRITE(ICL_DSC0_PICTURE_PARAMETER_SET_8(pipe), pps_val);
-		if (crtc_state->dsc_params.dsc_split)
+		if (crtc_state->dsc.dsc_split)
 			I915_WRITE(ICL_DSC1_PICTURE_PARAMETER_SET_8(pipe),
 				   pps_val);
 	}
@ -695,11 +695,11 @@ static void intel_configure_pps_for_dsc_encoder(struct intel_encoder *encoder,
 		 * If 2 VDSC instances are needed, configure PPS for second
 		 * VDSC
 		 */
-		if (crtc_state->dsc_params.dsc_split)
+		if (crtc_state->dsc.dsc_split)
 			I915_WRITE(DSCC_PICTURE_PARAMETER_SET_9, pps_val);
 	} else {
 		I915_WRITE(ICL_DSC0_PICTURE_PARAMETER_SET_9(pipe), pps_val);
-		if (crtc_state->dsc_params.dsc_split)
+		if (crtc_state->dsc.dsc_split)
 			I915_WRITE(ICL_DSC1_PICTURE_PARAMETER_SET_9(pipe),
 				   pps_val);
 	}
@ -717,11 +717,11 @@ static void intel_configure_pps_for_dsc_encoder(struct intel_encoder *encoder,
 		 * If 2 VDSC instances are needed, configure PPS for second
 		 * VDSC
 		 */
-		if (crtc_state->dsc_params.dsc_split)
+		if (crtc_state->dsc.dsc_split)
 			I915_WRITE(DSCC_PICTURE_PARAMETER_SET_10, pps_val);
 	} else {
 		I915_WRITE(ICL_DSC0_PICTURE_PARAMETER_SET_10(pipe), pps_val);
-		if (crtc_state->dsc_params.dsc_split)
+		if (crtc_state->dsc.dsc_split)
 			I915_WRITE(ICL_DSC1_PICTURE_PARAMETER_SET_10(pipe),
 				   pps_val);
 	}
@ -740,11 +740,11 @@ static void intel_configure_pps_for_dsc_encoder(struct intel_encoder *encoder,
 		 * If 2 VDSC instances are needed, configure PPS for second
 		 * VDSC
 		 */
-		if (crtc_state->dsc_params.dsc_split)
+		if (crtc_state->dsc.dsc_split)
 			I915_WRITE(DSCC_PICTURE_PARAMETER_SET_16, pps_val);
 	} else {
 		I915_WRITE(ICL_DSC0_PICTURE_PARAMETER_SET_16(pipe), pps_val);
-		if (crtc_state->dsc_params.dsc_split)
+		if (crtc_state->dsc.dsc_split)
 			I915_WRITE(ICL_DSC1_PICTURE_PARAMETER_SET_16(pipe),
 				   pps_val);
 	}
@ -763,7 +763,7 @@ static void intel_configure_pps_for_dsc_encoder(struct intel_encoder *encoder,
 		I915_WRITE(DSCA_RC_BUF_THRESH_0_UDW, rc_buf_thresh_dword[1]);
 		I915_WRITE(DSCA_RC_BUF_THRESH_1, rc_buf_thresh_dword[2]);
 		I915_WRITE(DSCA_RC_BUF_THRESH_1_UDW, rc_buf_thresh_dword[3]);
-		if (crtc_state->dsc_params.dsc_split) {
+		if (crtc_state->dsc.dsc_split) {
 			I915_WRITE(DSCC_RC_BUF_THRESH_0,
 				   rc_buf_thresh_dword[0]);
 			I915_WRITE(DSCC_RC_BUF_THRESH_0_UDW,
@ -782,7 +782,7 @@ static void intel_configure_pps_for_dsc_encoder(struct intel_encoder *encoder,
 			   rc_buf_thresh_dword[2]);
 		I915_WRITE(ICL_DSC0_RC_BUF_THRESH_1_UDW(pipe),
 			   rc_buf_thresh_dword[3]);
-		if (crtc_state->dsc_params.dsc_split) {
+		if (crtc_state->dsc.dsc_split) {
 			I915_WRITE(ICL_DSC1_RC_BUF_THRESH_0(pipe),
 				   rc_buf_thresh_dword[0]);
 			I915_WRITE(ICL_DSC1_RC_BUF_THRESH_0_UDW(pipe),
@ -824,7 +824,7 @@ static void intel_configure_pps_for_dsc_encoder(struct intel_encoder *encoder,
 			   rc_range_params_dword[6]);
 		I915_WRITE(DSCA_RC_RANGE_PARAMETERS_3_UDW,
 			   rc_range_params_dword[7]);
-		if (crtc_state->dsc_params.dsc_split) {
+		if (crtc_state->dsc.dsc_split) {
 			I915_WRITE(DSCC_RC_RANGE_PARAMETERS_0,
 				   rc_range_params_dword[0]);
 			I915_WRITE(DSCC_RC_RANGE_PARAMETERS_0_UDW,
@ -859,7 +859,7 @@ static void intel_configure_pps_for_dsc_encoder(struct intel_encoder *encoder,
 			   rc_range_params_dword[6]);
 		I915_WRITE(ICL_DSC0_RC_RANGE_PARAMETERS_3_UDW(pipe),
 			   rc_range_params_dword[7]);
-		if (crtc_state->dsc_params.dsc_split) {
+		if (crtc_state->dsc.dsc_split) {
 			I915_WRITE(ICL_DSC1_RC_RANGE_PARAMETERS_0(pipe),
 				   rc_range_params_dword[0]);
 			I915_WRITE(ICL_DSC1_RC_RANGE_PARAMETERS_0_UDW(pipe),
@ -885,7 +885,7 @@ static void intel_dp_write_dsc_pps_sdp(struct intel_encoder *encoder,
 {
 	struct intel_dp *intel_dp = enc_to_intel_dp(&encoder->base);
 	struct intel_digital_port *intel_dig_port = dp_to_dig_port(intel_dp);
-	const struct drm_dsc_config *vdsc_cfg = &crtc_state->dp_dsc_cfg;
+	const struct drm_dsc_config *vdsc_cfg = &crtc_state->dsc.config;
 	struct drm_dsc_pps_infoframe dp_dsc_pps_sdp;

 	/* Prepare DP SDP PPS header as per DP 1.4 spec, Table 2-123 */
@ -909,7 +909,7 @@ void intel_dsc_enable(struct intel_encoder *encoder,
 	u32 dss_ctl1_val = 0;
 	u32 dss_ctl2_val = 0;

-	if (!crtc_state->dsc_params.compression_enable)
+	if (!crtc_state->dsc.compression_enable)
 		return;

 	/* Enable Power wells for VDSC/joining */
@ -928,7 +928,7 @@ void intel_dsc_enable(struct intel_encoder *encoder,
 		dss_ctl2_reg = ICL_PIPE_DSS_CTL2(pipe);
 	}
 	dss_ctl2_val |= LEFT_BRANCH_VDSC_ENABLE;
-	if (crtc_state->dsc_params.dsc_split) {
+	if (crtc_state->dsc.dsc_split) {
 		dss_ctl2_val |= RIGHT_BRANCH_VDSC_ENABLE;
 		dss_ctl1_val |= JOINER_ENABLE;
 	}
@ -944,7 +944,7 @@ void intel_dsc_disable(const struct intel_crtc_state *old_crtc_state)
 	i915_reg_t dss_ctl1_reg, dss_ctl2_reg;
 	u32 dss_ctl1_val = 0, dss_ctl2_val = 0;

-	if (!old_crtc_state->dsc_params.compression_enable)
+	if (!old_crtc_state->dsc.compression_enable)
 		return;

 	if (old_crtc_state->cpu_transcoder == TRANSCODER_EDP) {
--- a/drivers/gpu/drm/i915/display/vlv_dsi.c
+++ b/drivers/gpu/drm/i915/display/vlv_dsi.c
@ -1870,11 +1870,11 @@ void vlv_dsi_init(struct drm_i915_private *dev_priv)
 	 * port C. BXT isn't limited like this.
 	 */
 	if (IS_GEN9_LP(dev_priv))
-		intel_encoder->crtc_mask = BIT(PIPE_A) | BIT(PIPE_B) | BIT(PIPE_C);
+		intel_encoder->pipe_mask = ~0;
 	else if (port == PORT_A)
-		intel_encoder->crtc_mask = BIT(PIPE_A);
+		intel_encoder->pipe_mask = BIT(PIPE_A);
 	else
-		intel_encoder->crtc_mask = BIT(PIPE_B);
+		intel_encoder->pipe_mask = BIT(PIPE_B);

 	if (dev_priv->vbt.dsi.config->dual_link)
 		intel_dsi->ports = BIT(PORT_A) | BIT(PORT_C);
--- a/drivers/gpu/drm/i915/gem/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c
@ -69,8 +69,10 @@

 #include <drm/i915_drm.h>

-#include "gt/intel_lrc_reg.h"
+#include "gt/intel_engine_heartbeat.h"
 #include "gt/intel_engine_user.h"
+#include "gt/intel_lrc_reg.h"
+#include "gt/intel_ring.h"

 #include "i915_gem_context.h"
 #include "i915_globals.h"
@ -276,6 +278,153 @@ void i915_gem_context_release(struct kref *ref)
 		schedule_work(&gc->free_work);
 }

+static inline struct i915_gem_engines *
+__context_engines_static(const struct i915_gem_context *ctx)
+{
+	return rcu_dereference_protected(ctx->engines, true);
+}
+
+static bool __reset_engine(struct intel_engine_cs *engine)
+{
+	struct intel_gt *gt = engine->gt;
+	bool success = false;
+
+	if (!intel_has_reset_engine(gt))
+		return false;
+
+	if (!test_and_set_bit(I915_RESET_ENGINE + engine->id,
+			      &gt->reset.flags)) {
+		success = intel_engine_reset(engine, NULL) == 0;
+		clear_and_wake_up_bit(I915_RESET_ENGINE + engine->id,
+				      &gt->reset.flags);
+	}
+
+	return success;
+}
+
+static void __reset_context(struct i915_gem_context *ctx,
+			    struct intel_engine_cs *engine)
+{
+	intel_gt_handle_error(engine->gt, engine->mask, 0,
+			      "context closure in %s", ctx->name);
+}
+
+static bool __cancel_engine(struct intel_engine_cs *engine)
+{
+	/*
+	 * Send a "high priority pulse" down the engine to cause the
+	 * current request to be momentarily preempted. (If it fails to
+	 * be preempted, it will be reset). As we have marked our context
+	 * as banned, any incomplete request, including any running, will
+	 * be skipped following the preemption.
+	 *
+	 * If there is no hangchecking (one of the reasons why we try to
+	 * cancel the context) and no forced preemption, there may be no
+	 * means by which we reset the GPU and evict the persistent hog.
+	 * Ergo if we are unable to inject a preemptive pulse that can
+	 * kill the banned context, we fallback to doing a local reset
+	 * instead.
+	 */
+	if (IS_ACTIVE(CONFIG_DRM_I915_PREEMPT_TIMEOUT) &&
+	    !intel_engine_pulse(engine))
+		return true;
+
+	/* If we are unable to send a pulse, try resetting this engine. */
+	return __reset_engine(engine);
+}
+
+static struct intel_engine_cs *__active_engine(struct i915_request *rq)
+{
+	struct intel_engine_cs *engine, *locked;
+
+	/*
+	 * Serialise with __i915_request_submit() so that it sees
+	 * is-banned?, or we know the request is already inflight.
+	 */
+	locked = READ_ONCE(rq->engine);
+	spin_lock_irq(&locked->active.lock);
+	while (unlikely(locked != (engine = READ_ONCE(rq->engine)))) {
+		spin_unlock(&locked->active.lock);
+		spin_lock(&engine->active.lock);
+		locked = engine;
+	}
+
+	engine = NULL;
+	if (i915_request_is_active(rq) && !rq->fence.error)
+		engine = rq->engine;
+
+	spin_unlock_irq(&locked->active.lock);
+
+	return engine;
+}
+
+static struct intel_engine_cs *active_engine(struct intel_context *ce)
+{
+	struct intel_engine_cs *engine = NULL;
+	struct i915_request *rq;
+
+	if (!ce->timeline)
+		return NULL;
+
+	rcu_read_lock();
+	list_for_each_entry_reverse(rq, &ce->timeline->requests, link) {
+		if (i915_request_completed(rq))
+			break;
+
+		/* Check with the backend if the request is inflight */
+		engine = __active_engine(rq);
+		if (engine)
+			break;
+	}
+	rcu_read_unlock();
+
+	return engine;
+}
+
+static void kill_context(struct i915_gem_context *ctx)
+{
+	struct i915_gem_engines_iter it;
+	struct intel_context *ce;
+
+	/*
+	 * If we are already banned, it was due to a guilty request causing
+	 * a reset and the entire context being evicted from the GPU.
+	 */
+	if (i915_gem_context_is_banned(ctx))
+		return;
+
+	i915_gem_context_set_banned(ctx);
+
+	/*
+	 * Map the user's engine back to the actual engines; one virtual
+	 * engine will be mapped to multiple engines, and using ctx->engine[]
+	 * the same engine may be have multiple instances in the user's map.
+	 * However, we only care about pending requests, so only include
+	 * engines on which there are incomplete requests.
+	 */
+	for_each_gem_engine(ce, __context_engines_static(ctx), it) {
+		struct intel_engine_cs *engine;
+
+		/*
+		 * Check the current active state of this context; if we
+		 * are currently executing on the GPU we need to evict
+		 * ourselves. On the other hand, if we haven't yet been
+		 * submitted to the GPU or if everything is complete,
+		 * we have nothing to do.
+		 */
+		engine = active_engine(ce);
+
+		/* First attempt to gracefully cancel the context */
+		if (engine && !__cancel_engine(engine))
+			/*
+			 * If we are unable to send a preemptive pulse to bump
+			 * the context from the GPU, we have to resort to a full
+			 * reset. We hope the collateral damage is worth it.
+			 */
+			__reset_context(ctx, engine);
+	}
+}
+
 static void context_close(struct i915_gem_context *ctx)
 {
 	struct i915_address_space *vm;
@ -298,9 +447,47 @@ static void context_close(struct i915_gem_context *ctx)
 	lut_close(ctx);

 	mutex_unlock(&ctx->mutex);
+
+	/*
+	 * If the user has disabled hangchecking, we can not be sure that
+	 * the batches will ever complete after the context is closed,
+	 * keeping the context and all resources pinned forever. So in this
+	 * case we opt to forcibly kill off all remaining requests on
+	 * context close.
+	 */
+	if (!i915_gem_context_is_persistent(ctx) ||
+	    !i915_modparams.enable_hangcheck)
+		kill_context(ctx);
+
 	i915_gem_context_put(ctx);
 }

+static int __context_set_persistence(struct i915_gem_context *ctx, bool state)
+{
+	if (i915_gem_context_is_persistent(ctx) == state)
+		return 0;
+
+	if (state) {
+		/*
+		 * Only contexts that are short-lived [that will expire or be
+		 * reset] are allowed to survive past termination. We require
+		 * hangcheck to ensure that the persistent requests are healthy.
+		 */
+		if (!i915_modparams.enable_hangcheck)
+			return -EINVAL;
+
+		i915_gem_context_set_persistence(ctx);
+	} else {
+		/* To cancel a context we use "preempt-to-idle" */
+		if (!(ctx->i915->caps.scheduler & I915_SCHEDULER_CAP_PREEMPTION))
+			return -ENODEV;
+
+		i915_gem_context_clear_persistence(ctx);
+	}
+
+	return 0;
+}
+
 static struct i915_gem_context *
 __create_context(struct drm_i915_private *i915)
 {
@ -335,6 +522,7 @@ __create_context(struct drm_i915_private *i915)

 	i915_gem_context_set_bannable(ctx);
 	i915_gem_context_set_recoverable(ctx);
+	__context_set_persistence(ctx, true /* cgroup hook? */);

 	for (i = 0; i < ARRAY_SIZE(ctx->hang_timestamp); i++)
 		ctx->hang_timestamp[i] = jiffies - CONTEXT_FAST_HANG_JIFFIES;
@ -491,6 +679,7 @@ i915_gem_context_create_kernel(struct drm_i915_private *i915, int prio)
 		return ctx;

 	i915_gem_context_clear_bannable(ctx);
+	i915_gem_context_set_persistence(ctx);
 	ctx->sched.priority = I915_USER_PRIORITY(prio);

 	GEM_BUG_ON(!i915_gem_context_is_kernel(ctx));
@ -1601,6 +1790,16 @@ err_free:
 	return err;
 }

+static int
+set_persistence(struct i915_gem_context *ctx,
+		const struct drm_i915_gem_context_param *args)
+{
+	if (args->size)
+		return -EINVAL;
+
+	return __context_set_persistence(ctx, args->value);
+}
+
 static int ctx_setparam(struct drm_i915_file_private *fpriv,
 			struct i915_gem_context *ctx,
 			struct drm_i915_gem_context_param *args)
@ -1678,6 +1877,10 @@ static int ctx_setparam(struct drm_i915_file_private *fpriv,
 		ret = set_engines(ctx, args);
 		break;

+	case I915_CONTEXT_PARAM_PERSISTENCE:
+		ret = set_persistence(ctx, args);
+		break;
+
 	case I915_CONTEXT_PARAM_BAN_PERIOD:
 	default:
 		ret = -EINVAL;
@ -2130,6 +2333,11 @@ int i915_gem_context_getparam_ioctl(struct drm_device *dev, void *data,
 		ret = get_engines(ctx, args);
 		break;

+	case I915_CONTEXT_PARAM_PERSISTENCE:
+		args->size = 0;
+		args->value = i915_gem_context_is_persistent(ctx);
+		break;
+
 	case I915_CONTEXT_PARAM_BAN_PERIOD:
 	default:
 		ret = -EINVAL;
--- a/drivers/gpu/drm/i915/gem/i915_gem_context.h
+++ b/drivers/gpu/drm/i915/gem/i915_gem_context.h
@ -76,6 +76,21 @@ static inline void i915_gem_context_clear_recoverable(struct i915_gem_context *c
 	clear_bit(UCONTEXT_RECOVERABLE, &ctx->user_flags);
 }

+static inline bool i915_gem_context_is_persistent(const struct i915_gem_context *ctx)
+{
+	return test_bit(UCONTEXT_PERSISTENCE, &ctx->user_flags);
+}
+
+static inline void i915_gem_context_set_persistence(struct i915_gem_context *ctx)
+{
+	set_bit(UCONTEXT_PERSISTENCE, &ctx->user_flags);
+}
+
+static inline void i915_gem_context_clear_persistence(struct i915_gem_context *ctx)
+{
+	clear_bit(UCONTEXT_PERSISTENCE, &ctx->user_flags);
+}
+
 static inline bool i915_gem_context_is_banned(const struct i915_gem_context *ctx)
 {
 	return test_bit(CONTEXT_BANNED, &ctx->flags);
--- a/drivers/gpu/drm/i915/gem/i915_gem_context_types.h
+++ b/drivers/gpu/drm/i915/gem/i915_gem_context_types.h
@ -137,6 +137,7 @@ struct i915_gem_context {
 #define UCONTEXT_NO_ERROR_CAPTURE	1
 #define UCONTEXT_BANNABLE		2
 #define UCONTEXT_RECOVERABLE		3
+#define UCONTEXT_PERSISTENCE		4

 	/**
 	 * @flags: small set of booleans
--- a/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c
@ -256,6 +256,7 @@ static const struct drm_i915_gem_object_ops i915_gem_object_dmabuf_ops = {
 struct drm_gem_object *i915_gem_prime_import(struct drm_device *dev,
 					     struct dma_buf *dma_buf)
 {
+	static struct lock_class_key lock_class;
 	struct dma_buf_attachment *attach;
 	struct drm_i915_gem_object *obj;
 	int ret;
@ -287,7 +288,7 @@ struct drm_gem_object *i915_gem_prime_import(struct drm_device *dev,
 	}

 	drm_gem_private_object_init(dev, &obj->base, dma_buf->size);
-	i915_gem_object_init(obj, &i915_gem_object_dmabuf_ops);
+	i915_gem_object_init(obj, &i915_gem_object_dmabuf_ops, &lock_class);
 	obj->base.import_attach = attach;
 	obj->base.resv = dma_buf->resv;

--- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
@ -19,6 +19,7 @@
 #include "gt/intel_engine_pool.h"
 #include "gt/intel_gt.h"
 #include "gt/intel_gt_pm.h"
+#include "gt/intel_ring.h"

 #include "i915_drv.h"
 #include "i915_gem_clflush.h"
--- a/drivers/gpu/drm/i915/gem/i915_gem_internal.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_internal.c
@ -164,6 +164,7 @@ struct drm_i915_gem_object *
 i915_gem_object_create_internal(struct drm_i915_private *i915,
 				phys_addr_t size)
 {
+	static struct lock_class_key lock_class;
 	struct drm_i915_gem_object *obj;
 	unsigned int cache_level;

@ -178,7 +179,7 @@ i915_gem_object_create_internal(struct drm_i915_private *i915,
 		return ERR_PTR(-ENOMEM);

 	drm_gem_private_object_init(&i915->drm, &obj->base, size);
-	i915_gem_object_init(obj, &i915_gem_object_internal_ops);
+	i915_gem_object_init(obj, &i915_gem_object_internal_ops, &lock_class);

 	/*
 	 * Mark the object as volatile, such that the pages are marked as
--- a/drivers/gpu/drm/i915/gem/i915_gem_lmem.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_lmem.c
@ -0,0 +1,99 @@
+// SPDX-License-Identifier: MIT
+/*
+ * Copyright © 2019 Intel Corporation
+ */
+
+#include "intel_memory_region.h"
+#include "gem/i915_gem_region.h"
+#include "gem/i915_gem_lmem.h"
+#include "i915_drv.h"
+
+const struct drm_i915_gem_object_ops i915_gem_lmem_obj_ops = {
+	.flags = I915_GEM_OBJECT_HAS_IOMEM,
+
+	.get_pages = i915_gem_object_get_pages_buddy,
+	.put_pages = i915_gem_object_put_pages_buddy,
+	.release = i915_gem_object_release_memory_region,
+};
+
+/* XXX: Time to vfunc your life up? */
+void __iomem *
+i915_gem_object_lmem_io_map_page(struct drm_i915_gem_object *obj,
+				 unsigned long n)
+{
+	resource_size_t offset;
+
+	offset = i915_gem_object_get_dma_address(obj, n);
+	offset -= obj->mm.region->region.start;
+
+	return io_mapping_map_wc(&obj->mm.region->iomap, offset, PAGE_SIZE);
+}
+
+void __iomem *
+i915_gem_object_lmem_io_map_page_atomic(struct drm_i915_gem_object *obj,
+					unsigned long n)
+{
+	resource_size_t offset;
+
+	offset = i915_gem_object_get_dma_address(obj, n);
+	offset -= obj->mm.region->region.start;
+
+	return io_mapping_map_atomic_wc(&obj->mm.region->iomap, offset);
+}
+
+void __iomem *
+i915_gem_object_lmem_io_map(struct drm_i915_gem_object *obj,
+			    unsigned long n,
+			    unsigned long size)
+{
+	resource_size_t offset;
+
+	GEM_BUG_ON(!i915_gem_object_is_contiguous(obj));
+
+	offset = i915_gem_object_get_dma_address(obj, n);
+	offset -= obj->mm.region->region.start;
+
+	return io_mapping_map_wc(&obj->mm.region->iomap, offset, size);
+}
+
+bool i915_gem_object_is_lmem(struct drm_i915_gem_object *obj)
+{
+	return obj->ops == &i915_gem_lmem_obj_ops;
+}
+
+struct drm_i915_gem_object *
+i915_gem_object_create_lmem(struct drm_i915_private *i915,
+			    resource_size_t size,
+			    unsigned int flags)
+{
+	return i915_gem_object_create_region(i915->mm.regions[INTEL_REGION_LMEM],
+					     size, flags);
+}
+
+struct drm_i915_gem_object *
+__i915_gem_lmem_object_create(struct intel_memory_region *mem,
+			      resource_size_t size,
+			      unsigned int flags)
+{
+	static struct lock_class_key lock_class;
+	struct drm_i915_private *i915 = mem->i915;
+	struct drm_i915_gem_object *obj;
+
+	if (size > BIT(mem->mm.max_order) * mem->mm.chunk_size)
+		return ERR_PTR(-E2BIG);
+
+	obj = i915_gem_object_alloc();
+	if (!obj)
+		return ERR_PTR(-ENOMEM);
+
+	drm_gem_private_object_init(&i915->drm, &obj->base, size);
+	i915_gem_object_init(obj, &i915_gem_lmem_obj_ops, &lock_class);
+
+	obj->read_domains = I915_GEM_DOMAIN_WC | I915_GEM_DOMAIN_GTT;
+
+	i915_gem_object_set_cache_coherency(obj, I915_CACHE_NONE);
+
+	i915_gem_object_init_memory_region(obj, mem, flags);
+
+	return obj;
+}
--- a/drivers/gpu/drm/i915/gem/i915_gem_lmem.h
+++ b/drivers/gpu/drm/i915/gem/i915_gem_lmem.h
@ -0,0 +1,37 @@
+/* SPDX-License-Identifier: MIT */
+/*
+ * Copyright © 2019 Intel Corporation
+ */
+
+#ifndef __I915_GEM_LMEM_H
+#define __I915_GEM_LMEM_H
+
+#include <linux/types.h>
+
+struct drm_i915_private;
+struct drm_i915_gem_object;
+struct intel_memory_region;
+
+extern const struct drm_i915_gem_object_ops i915_gem_lmem_obj_ops;
+
+void __iomem *i915_gem_object_lmem_io_map(struct drm_i915_gem_object *obj,
+					  unsigned long n, unsigned long size);
+void __iomem *i915_gem_object_lmem_io_map_page(struct drm_i915_gem_object *obj,
+					       unsigned long n);
+void __iomem *
+i915_gem_object_lmem_io_map_page_atomic(struct drm_i915_gem_object *obj,
+					unsigned long n);
+
+bool i915_gem_object_is_lmem(struct drm_i915_gem_object *obj);
+
+struct drm_i915_gem_object *
+i915_gem_object_create_lmem(struct drm_i915_private *i915,
+			    resource_size_t size,
+			    unsigned int flags);
+
+struct drm_i915_gem_object *
+__i915_gem_lmem_object_create(struct intel_memory_region *mem,
+			      resource_size_t size,
+			      unsigned int flags);
+
+#endif /* !__I915_GEM_LMEM_H */
--- a/drivers/gpu/drm/i915/gem/i915_gem_mman.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_mman.c
@ -312,7 +312,7 @@ vm_fault_t i915_gem_fault(struct vm_fault *vmf)
 		list_add(&obj->userfault_link, &i915->ggtt.userfault_list);
 	mutex_unlock(&i915->ggtt.vm.mutex);

-	if (CONFIG_DRM_I915_USERFAULT_AUTOSUSPEND)
+	if (IS_ACTIVE(CONFIG_DRM_I915_USERFAULT_AUTOSUSPEND))
 		intel_wakeref_auto(&i915->ggtt.userfault_wakeref,
 				   msecs_to_jiffies_timeout(CONFIG_DRM_I915_USERFAULT_AUTOSUSPEND));

--- a/drivers/gpu/drm/i915/gem/i915_gem_object.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_object.c
@ -47,9 +47,10 @@ void i915_gem_object_free(struct drm_i915_gem_object *obj)
 }

 void i915_gem_object_init(struct drm_i915_gem_object *obj,
-			  const struct drm_i915_gem_object_ops *ops)
+			  const struct drm_i915_gem_object_ops *ops,
+			  struct lock_class_key *key)
 {
-	mutex_init(&obj->mm.lock);
+	__mutex_init(&obj->mm.lock, "obj->mm.lock", key);

 	spin_lock_init(&obj->vma.lock);
 	INIT_LIST_HEAD(&obj->vma.list);
--- a/drivers/gpu/drm/i915/gem/i915_gem_object.h
+++ b/drivers/gpu/drm/i915/gem/i915_gem_object.h
@ -23,7 +23,8 @@ struct drm_i915_gem_object *i915_gem_object_alloc(void);
 void i915_gem_object_free(struct drm_i915_gem_object *obj);

 void i915_gem_object_init(struct drm_i915_gem_object *obj,
-			  const struct drm_i915_gem_object_ops *ops);
+			  const struct drm_i915_gem_object_ops *ops,
+			  struct lock_class_key *key);
 struct drm_i915_gem_object *
 i915_gem_object_create_shmem(struct drm_i915_private *i915,
 			     resource_size_t size);
@ -461,6 +462,5 @@ int i915_gem_object_wait(struct drm_i915_gem_object *obj,
 int i915_gem_object_wait_priority(struct drm_i915_gem_object *obj,
 				  unsigned int flags,
 				  const struct i915_sched_attr *attr);
-#define I915_PRIORITY_DISPLAY I915_USER_PRIORITY(I915_PRIORITY_MAX)

 #endif
--- a/drivers/gpu/drm/i915/gem/i915_gem_object_blt.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_object_blt.c
@ -8,6 +8,7 @@
 #include "gt/intel_engine_pm.h"
 #include "gt/intel_engine_pool.h"
 #include "gt/intel_gt.h"
+#include "gt/intel_ring.h"
 #include "i915_gem_clflush.h"
 #include "i915_gem_object_blt.h"

@ -16,7 +17,7 @@ struct i915_vma *intel_emit_vma_fill_blt(struct intel_context *ce,
 					 u32 value)
 {
 	struct drm_i915_private *i915 = ce->vm->i915;
-	const u32 block_size = S16_MAX * PAGE_SIZE;
+	const u32 block_size = SZ_8M; /* ~1ms at 8GiB/s preemption delay */
 	struct intel_engine_pool_node *pool;
 	struct i915_vma *batch;
 	u64 offset;
@ -29,7 +30,7 @@ struct i915_vma *intel_emit_vma_fill_blt(struct intel_context *ce,
 	GEM_BUG_ON(intel_engine_is_virtual(ce->engine));
 	intel_engine_pm_get(ce->engine);

-	count = div_u64(vma->size, block_size);
+	count = div_u64(round_up(vma->size, block_size), block_size);
 	size = (1 + 8 * count) * sizeof(u32);
 	size = round_up(size, PAGE_SIZE);
 	pool = intel_engine_get_pool(ce->engine, size);
@ -200,7 +201,7 @@ struct i915_vma *intel_emit_vma_copy_blt(struct intel_context *ce,
 					 struct i915_vma *dst)
 {
 	struct drm_i915_private *i915 = ce->vm->i915;
-	const u32 block_size = S16_MAX * PAGE_SIZE;
+	const u32 block_size = SZ_8M; /* ~1ms at 8GiB/s preemption delay */
 	struct intel_engine_pool_node *pool;
 	struct i915_vma *batch;
 	u64 src_offset, dst_offset;
@ -213,7 +214,7 @@ struct i915_vma *intel_emit_vma_copy_blt(struct intel_context *ce,
 	GEM_BUG_ON(intel_engine_is_virtual(ce->engine));
 	intel_engine_pm_get(ce->engine);

-	count = div_u64(dst->size, block_size);
+	count = div_u64(round_up(dst->size, block_size), block_size);
 	size = (1 + 11 * count) * sizeof(u32);
 	size = round_up(size, PAGE_SIZE);
 	pool = intel_engine_get_pool(ce->engine, size);
--- a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
+++ b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
@ -31,10 +31,11 @@ struct i915_lut_handle {
 struct drm_i915_gem_object_ops {
 	unsigned int flags;
 #define I915_GEM_OBJECT_HAS_STRUCT_PAGE	BIT(0)
-#define I915_GEM_OBJECT_IS_SHRINKABLE	BIT(1)
-#define I915_GEM_OBJECT_IS_PROXY	BIT(2)
-#define I915_GEM_OBJECT_NO_GGTT		BIT(3)
-#define I915_GEM_OBJECT_ASYNC_CANCEL	BIT(4)
+#define I915_GEM_OBJECT_HAS_IOMEM	BIT(1)
+#define I915_GEM_OBJECT_IS_SHRINKABLE	BIT(2)
+#define I915_GEM_OBJECT_IS_PROXY	BIT(3)
+#define I915_GEM_OBJECT_NO_GGTT		BIT(4)
+#define I915_GEM_OBJECT_ASYNC_CANCEL	BIT(5)

 	/* Interface between the GEM object and its backing storage.
 	 * get_pages() is called once prior to the use of the associated set
--- a/drivers/gpu/drm/i915/gem/i915_gem_pages.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_pages.c
@ -7,6 +7,7 @@
 #include "i915_drv.h"
 #include "i915_gem_object.h"
 #include "i915_scatterlist.h"
+#include "i915_gem_lmem.h"

 void __i915_gem_object_set_pages(struct drm_i915_gem_object *obj,
 				 struct sg_table *pages,
@ -154,6 +155,16 @@ static void __i915_gem_object_reset_page_iter(struct drm_i915_gem_object *obj)
 	rcu_read_unlock();
 }

+static void unmap_object(struct drm_i915_gem_object *obj, void *ptr)
+{
+	if (i915_gem_object_is_lmem(obj))
+		io_mapping_unmap((void __force __iomem *)ptr);
+	else if (is_vmalloc_addr(ptr))
+		vunmap(ptr);
+	else
+		kunmap(kmap_to_page(ptr));
+}
+
 struct sg_table *
 __i915_gem_object_unset_pages(struct drm_i915_gem_object *obj)
 {
@ -169,14 +180,7 @@ __i915_gem_object_unset_pages(struct drm_i915_gem_object *obj)
 	i915_gem_object_make_unshrinkable(obj);

 	if (obj->mm.mapping) {
-		void *ptr;
-
-		ptr = page_mask_bits(obj->mm.mapping);
-		if (is_vmalloc_addr(ptr))
-			vunmap(ptr);
-		else
-			kunmap(kmap_to_page(ptr));
-
+		unmap_object(obj, page_mask_bits(obj->mm.mapping));
 		obj->mm.mapping = NULL;
 	}

@ -231,7 +235,7 @@ unlock:
 }

 /* The 'mapping' part of i915_gem_object_pin_map() below */
-static void *i915_gem_object_map(const struct drm_i915_gem_object *obj,
+static void *i915_gem_object_map(struct drm_i915_gem_object *obj,
 				 enum i915_map_type type)
 {
 	unsigned long n_pages = obj->base.size >> PAGE_SHIFT;
@ -244,6 +248,16 @@ static void *i915_gem_object_map(const struct drm_i915_gem_object *obj,
 	pgprot_t pgprot;
 	void *addr;

+	if (i915_gem_object_is_lmem(obj)) {
+		void __iomem *io;
+
+		if (type != I915_MAP_WC)
+			return NULL;
+
+		io = i915_gem_object_lmem_io_map(obj, 0, obj->base.size);
+		return (void __force *)io;
+	}
+
 	/* A single page can always be kmapped */
 	if (n_pages == 1 && type == I915_MAP_WB)
 		return kmap(sg_page(sgt->sgl));
@ -285,11 +299,13 @@ void *i915_gem_object_pin_map(struct drm_i915_gem_object *obj,
 			      enum i915_map_type type)
 {
 	enum i915_map_type has_type;
+	unsigned int flags;
 	bool pinned;
 	void *ptr;
 	int err;

-	if (unlikely(!i915_gem_object_has_struct_page(obj)))
+	flags = I915_GEM_OBJECT_HAS_STRUCT_PAGE | I915_GEM_OBJECT_HAS_IOMEM;
+	if (!i915_gem_object_type_has(obj, flags))
 		return ERR_PTR(-ENXIO);

 	err = mutex_lock_interruptible(&obj->mm.lock);
@ -321,10 +337,7 @@ void *i915_gem_object_pin_map(struct drm_i915_gem_object *obj,
 			goto err_unpin;
 		}

-		if (is_vmalloc_addr(ptr))
-			vunmap(ptr);
-		else
-			kunmap(kmap_to_page(ptr));
+		unmap_object(obj, ptr);

 		ptr = obj->mm.mapping = NULL;
 	}
--- a/drivers/gpu/drm/i915/gem/i915_gem_pm.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_pm.c
@ -11,25 +11,6 @@

 #include "i915_drv.h"

-static int pm_notifier(struct notifier_block *nb,
-		       unsigned long action,
-		       void *data)
-{
-	struct drm_i915_private *i915 =
-		container_of(nb, typeof(*i915), gem.pm_notifier);
-
-	switch (action) {
-	case INTEL_GT_UNPARK:
-		break;
-
-	case INTEL_GT_PARK:
-		i915_vma_parked(i915);
-		break;
-	}
-
-	return NOTIFY_OK;
-}
-
 static bool switch_to_kernel_context_sync(struct intel_gt *gt)
 {
 	bool result = !intel_gt_is_wedged(gt);
@ -56,11 +37,6 @@ static bool switch_to_kernel_context_sync(struct intel_gt *gt)
 	return result;
 }

-bool i915_gem_load_power_context(struct drm_i915_private *i915)
-{
-	return switch_to_kernel_context_sync(&i915->gt);
-}
-
 static void user_forcewake(struct intel_gt *gt, bool suspend)
 {
 	int count = atomic_read(&gt->user_wakeref);
@ -100,8 +76,6 @@ void i915_gem_suspend(struct drm_i915_private *i915)
 	intel_gt_suspend(&i915->gt);
 	intel_uc_suspend(&i915->gt.uc);

-	cancel_delayed_work_sync(&i915->gt.hangcheck.work);
-
 	i915_gem_drain_freed_objects(i915);
 }

@ -190,7 +164,7 @@ void i915_gem_resume(struct drm_i915_private *i915)
 	intel_uc_resume(&i915->gt.uc);

 	/* Always reload a context for powersaving. */
-	if (!i915_gem_load_power_context(i915))
+	if (!switch_to_kernel_context_sync(&i915->gt))
 		goto err_wedged;

 	user_forcewake(&i915->gt, false);
@ -207,10 +181,3 @@ err_wedged:
 	}
 	goto out_unlock;
 }
-
-void i915_gem_init__pm(struct drm_i915_private *i915)
-{
-	i915->gem.pm_notifier.notifier_call = pm_notifier;
-	blocking_notifier_chain_register(&i915->gt.pm_notifications,
-					 &i915->gem.pm_notifier);
-}
--- a/drivers/gpu/drm/i915/gem/i915_gem_pm.h
+++ b/drivers/gpu/drm/i915/gem/i915_gem_pm.h
@ -12,9 +12,6 @@
 struct drm_i915_private;
 struct work_struct;

-void i915_gem_init__pm(struct drm_i915_private *i915);
-
-bool i915_gem_load_power_context(struct drm_i915_private *i915);
 void i915_gem_resume(struct drm_i915_private *i915);

 void i915_gem_idle_work_handler(struct work_struct *work);
--- a/drivers/gpu/drm/i915/gem/i915_gem_shmem.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_shmem.c
@ -465,6 +465,7 @@ create_shmem(struct intel_memory_region *mem,
 	     resource_size_t size,
 	     unsigned int flags)
 {
+	static struct lock_class_key lock_class;
 	struct drm_i915_private *i915 = mem->i915;
 	struct drm_i915_gem_object *obj;
 	struct address_space *mapping;
@ -491,7 +492,7 @@ create_shmem(struct intel_memory_region *mem,
 	mapping_set_gfp_mask(mapping, mask);
 	GEM_BUG_ON(!(mapping_gfp_mask(mapping) & __GFP_RECLAIM));

-	i915_gem_object_init(obj, &i915_gem_shmem_ops);
+	i915_gem_object_init(obj, &i915_gem_shmem_ops, &lock_class);

 	obj->write_domain = I915_GEM_DOMAIN_CPU;
 	obj->read_domains = I915_GEM_DOMAIN_CPU;
--- a/drivers/gpu/drm/i915/gem/i915_gem_stolen.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_stolen.c
@ -556,6 +556,7 @@ __i915_gem_object_create_stolen(struct drm_i915_private *dev_priv,
 				struct drm_mm_node *stolen,
 				struct intel_memory_region *mem)
 {
+	static struct lock_class_key lock_class;
 	struct drm_i915_gem_object *obj;
 	unsigned int cache_level;
 	int err = -ENOMEM;
@ -565,7 +566,7 @@ __i915_gem_object_create_stolen(struct drm_i915_private *dev_priv,
 		goto err;

 	drm_gem_private_object_init(&dev_priv->drm, &obj->base, stolen->size);
-	i915_gem_object_init(obj, &i915_gem_object_stolen_ops);
+	i915_gem_object_init(obj, &i915_gem_object_stolen_ops, &lock_class);

 	obj->stolen = stolen;
 	obj->read_domains = I915_GEM_DOMAIN_CPU | I915_GEM_DOMAIN_GTT;
--- a/drivers/gpu/drm/i915/gem/i915_gem_userptr.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_userptr.c
@ -725,6 +725,7 @@ i915_gem_userptr_ioctl(struct drm_device *dev,
 		       void *data,
 		       struct drm_file *file)
 {
+	static struct lock_class_key lock_class;
 	struct drm_i915_private *dev_priv = to_i915(dev);
 	struct drm_i915_gem_userptr *args = data;
 	struct drm_i915_gem_object *obj;
@ -769,7 +770,7 @@ i915_gem_userptr_ioctl(struct drm_device *dev,
 		return -ENOMEM;

 	drm_gem_private_object_init(dev, &obj->base, args->user_size);
-	i915_gem_object_init(obj, &i915_gem_userptr_ops);
+	i915_gem_object_init(obj, &i915_gem_userptr_ops, &lock_class);
 	obj->read_domains = I915_GEM_DOMAIN_CPU;
 	obj->write_domain = I915_GEM_DOMAIN_CPU;
 	i915_gem_object_set_cache_coherency(obj, I915_CACHE_LLC);
--- a/drivers/gpu/drm/i915/gem/selftests/huge_gem_object.c
+++ b/drivers/gpu/drm/i915/gem/selftests/huge_gem_object.c
@ -96,6 +96,7 @@ huge_gem_object(struct drm_i915_private *i915,
 		phys_addr_t phys_size,
 		dma_addr_t dma_size)
 {
+	static struct lock_class_key lock_class;
 	struct drm_i915_gem_object *obj;
 	unsigned int cache_level;

@ -111,7 +112,7 @@ huge_gem_object(struct drm_i915_private *i915,
 		return ERR_PTR(-ENOMEM);

 	drm_gem_private_object_init(&i915->drm, &obj->base, dma_size);
-	i915_gem_object_init(obj, &huge_ops);
+	i915_gem_object_init(obj, &huge_ops, &lock_class);

 	obj->read_domains = I915_GEM_DOMAIN_CPU;
 	obj->write_domain = I915_GEM_DOMAIN_CPU;
--- a/drivers/gpu/drm/i915/gem/selftests/huge_pages.c
+++ b/drivers/gpu/drm/i915/gem/selftests/huge_pages.c
@ -9,6 +9,7 @@
 #include "i915_selftest.h"

 #include "gem/i915_gem_region.h"
+#include "gem/i915_gem_lmem.h"
 #include "gem/i915_gem_pm.h"

 #include "gt/intel_gt.h"
@ -149,6 +150,7 @@ huge_pages_object(struct drm_i915_private *i915,
 		  u64 size,
 		  unsigned int page_mask)
 {
+	static struct lock_class_key lock_class;
 	struct drm_i915_gem_object *obj;

 	GEM_BUG_ON(!size);
@ -165,7 +167,7 @@ huge_pages_object(struct drm_i915_private *i915,
 		return ERR_PTR(-ENOMEM);

 	drm_gem_private_object_init(&i915->drm, &obj->base, size);
-	i915_gem_object_init(obj, &huge_page_ops);
+	i915_gem_object_init(obj, &huge_page_ops, &lock_class);

 	i915_gem_object_set_volatile(obj);

@ -295,6 +297,7 @@ static const struct drm_i915_gem_object_ops fake_ops_single = {
 static struct drm_i915_gem_object *
 fake_huge_pages_object(struct drm_i915_private *i915, u64 size, bool single)
 {
+	static struct lock_class_key lock_class;
 	struct drm_i915_gem_object *obj;

 	GEM_BUG_ON(!size);
@ -313,9 +316,9 @@ fake_huge_pages_object(struct drm_i915_private *i915, u64 size, bool single)
 	drm_gem_private_object_init(&i915->drm, &obj->base, size);

 	if (single)
-		i915_gem_object_init(obj, &fake_ops_single);
+		i915_gem_object_init(obj, &fake_ops_single, &lock_class);
 	else
-		i915_gem_object_init(obj, &fake_ops);
+		i915_gem_object_init(obj, &fake_ops, &lock_class);

 	i915_gem_object_set_volatile(obj);

@ -981,7 +984,8 @@ static int gpu_write(struct intel_context *ce,
 			       vma->size >> PAGE_SHIFT, val);
 }

-static int cpu_check(struct drm_i915_gem_object *obj, u32 dword, u32 val)
+static int
+__cpu_check_shmem(struct drm_i915_gem_object *obj, u32 dword, u32 val)
 {
 	unsigned int needs_flush;
 	unsigned long n;
@ -1013,6 +1017,51 @@ static int cpu_check(struct drm_i915_gem_object *obj, u32 dword, u32 val)
 	return err;
 }

+static int __cpu_check_lmem(struct drm_i915_gem_object *obj, u32 dword, u32 val)
+{
+	unsigned long n;
+	int err;
+
+	i915_gem_object_lock(obj);
+	err = i915_gem_object_set_to_wc_domain(obj, false);
+	i915_gem_object_unlock(obj);
+	if (err)
+		return err;
+
+	err = i915_gem_object_pin_pages(obj);
+	if (err)
+		return err;
+
+	for (n = 0; n < obj->base.size >> PAGE_SHIFT; ++n) {
+		u32 __iomem *base;
+		u32 read_val;
+
+		base = i915_gem_object_lmem_io_map_page_atomic(obj, n);
+
+		read_val = ioread32(base + dword);
+		io_mapping_unmap_atomic(base);
+		if (read_val != val) {
+			pr_err("n=%lu base[%u]=%u, val=%u\n",
+			       n, dword, read_val, val);
+			err = -EINVAL;
+			break;
+		}
+	}
+
+	i915_gem_object_unpin_pages(obj);
+	return err;
+}
+
+static int cpu_check(struct drm_i915_gem_object *obj, u32 dword, u32 val)
+{
+	if (i915_gem_object_has_struct_page(obj))
+		return __cpu_check_shmem(obj, dword, val);
+	else if (i915_gem_object_is_lmem(obj))
+		return __cpu_check_lmem(obj, dword, val);
+
+	return -ENODEV;
+}
+
 static int __igt_write_huge(struct intel_context *ce,
 			    struct drm_i915_gem_object *obj,
 			    u64 size, u64 offset,
@ -1268,131 +1317,235 @@ out_device:
 	return err;
 }

-static int igt_ppgtt_internal_huge(void *arg)
-{
-	struct i915_gem_context *ctx = arg;
-	struct drm_i915_private *i915 = ctx->i915;
-	struct drm_i915_gem_object *obj;
-	static const unsigned int sizes[] = {
-		SZ_64K,
-		SZ_128K,
-		SZ_256K,
-		SZ_512K,
-		SZ_1M,
-		SZ_2M,
-	};
-	int i;
-	int err;
-
-	/*
-	 * Sanity check that the HW uses huge pages correctly through internal
-	 * -- ensure that our writes land in the right place.
-	 */
-
-	for (i = 0; i < ARRAY_SIZE(sizes); ++i) {
-		unsigned int size = sizes[i];
-
-		obj = i915_gem_object_create_internal(i915, size);
-		if (IS_ERR(obj))
-			return PTR_ERR(obj);
-
-		err = i915_gem_object_pin_pages(obj);
-		if (err)
-			goto out_put;
-
-		if (obj->mm.page_sizes.phys < I915_GTT_PAGE_SIZE_64K) {
-			pr_info("internal unable to allocate huge-page(s) with size=%u\n",
-				size);
-			goto out_unpin;
-		}
-
-		err = igt_write_huge(ctx, obj);
-		if (err) {
-			pr_err("internal write-huge failed with size=%u\n",
-			       size);
-			goto out_unpin;
-		}
-
-		i915_gem_object_unpin_pages(obj);
-		__i915_gem_object_put_pages(obj, I915_MM_NORMAL);
-		i915_gem_object_put(obj);
-	}
-
-	return 0;
-
-out_unpin:
-	i915_gem_object_unpin_pages(obj);
-out_put:
-	i915_gem_object_put(obj);
-
-	return err;
-}
+typedef struct drm_i915_gem_object *
+(*igt_create_fn)(struct drm_i915_private *i915, u32 size, u32 flags);

 static inline bool igt_can_allocate_thp(struct drm_i915_private *i915)
 {
 	return i915->mm.gemfs && has_transparent_hugepage();
 }

-static int igt_ppgtt_gemfs_huge(void *arg)
+static struct drm_i915_gem_object *
+igt_create_shmem(struct drm_i915_private *i915, u32 size, u32 flags)
+{
+	if (!igt_can_allocate_thp(i915)) {
+		pr_info("%s missing THP support, skipping\n", __func__);
+		return ERR_PTR(-ENODEV);
+	}
+
+	return i915_gem_object_create_shmem(i915, size);
+}
+
+static struct drm_i915_gem_object *
+igt_create_internal(struct drm_i915_private *i915, u32 size, u32 flags)
+{
+	return i915_gem_object_create_internal(i915, size);
+}
+
+static struct drm_i915_gem_object *
+igt_create_system(struct drm_i915_private *i915, u32 size, u32 flags)
+{
+	return huge_pages_object(i915, size, size);
+}
+
+static struct drm_i915_gem_object *
+igt_create_local(struct drm_i915_private *i915, u32 size, u32 flags)
+{
+	return i915_gem_object_create_lmem(i915, size, flags);
+}
+
+static u32 igt_random_size(struct rnd_state *prng,
+			   u32 min_page_size,
+			   u32 max_page_size)
+{
+	u64 mask;
+	u32 size;
+
+	GEM_BUG_ON(!is_power_of_2(min_page_size));
+	GEM_BUG_ON(!is_power_of_2(max_page_size));
+	GEM_BUG_ON(min_page_size < PAGE_SIZE);
+	GEM_BUG_ON(min_page_size > max_page_size);
+
+	mask = ((max_page_size << 1ULL) - 1) & PAGE_MASK;
+	size = prandom_u32_state(prng) & mask;
+	if (size < min_page_size)
+		size |= min_page_size;
+
+	return size;
+}
+
+static int igt_ppgtt_smoke_huge(void *arg)
 {
 	struct i915_gem_context *ctx = arg;
 	struct drm_i915_private *i915 = ctx->i915;
 	struct drm_i915_gem_object *obj;
-	static const unsigned int sizes[] = {
-		SZ_2M,
-		SZ_4M,
-		SZ_8M,
-		SZ_16M,
-		SZ_32M,
+	I915_RND_STATE(prng);
+	struct {
+		igt_create_fn fn;
+		u32 min;
+		u32 max;
+	} backends[] = {
+		{ igt_create_internal, SZ_64K, SZ_2M,  },
+		{ igt_create_shmem,    SZ_64K, SZ_32M, },
+		{ igt_create_local,    SZ_64K, SZ_1G,  },
 	};
-	int i;
 	int err;
+	int i;

 	/*
-	 * Sanity check that the HW uses huge pages correctly through gemfs --
-	 * ensure that our writes land in the right place.
+	 * Sanity check that the HW uses huge pages correctly through our
+	 * various backends -- ensure that our writes land in the right place.
 	 */

-	if (!igt_can_allocate_thp(i915)) {
-		pr_info("missing THP support, skipping\n");
-		return 0;
-	}
+	for (i = 0; i < ARRAY_SIZE(backends); ++i) {
+		u32 min = backends[i].min;
+		u32 max = backends[i].max;
+		u32 size = max;
+try_again:
+		size = igt_random_size(&prng, min, rounddown_pow_of_two(size));

-	for (i = 0; i < ARRAY_SIZE(sizes); ++i) {
-		unsigned int size = sizes[i];
+		obj = backends[i].fn(i915, size, 0);
+		if (IS_ERR(obj)) {
+			err = PTR_ERR(obj);
+			if (err == -E2BIG) {
+				size >>= 1;
+				goto try_again;
+			} else if (err == -ENODEV) {
+				err = 0;
+				continue;
+			}

-		obj = i915_gem_object_create_shmem(i915, size);
-		if (IS_ERR(obj))
-			return PTR_ERR(obj);
+			return err;
+		}

 		err = i915_gem_object_pin_pages(obj);
-		if (err)
+		if (err) {
+			if (err == -ENXIO) {
+				i915_gem_object_put(obj);
+				size >>= 1;
+				goto try_again;
+			}
 			goto out_put;
+		}

-		if (obj->mm.page_sizes.phys < I915_GTT_PAGE_SIZE_2M) {
-			pr_info("finishing test early, gemfs unable to allocate huge-page(s) with size=%u\n",
-				size);
+		if (obj->mm.page_sizes.phys < min) {
+			pr_info("%s unable to allocate huge-page(s) with size=%u, i=%d\n",
+				__func__, size, i);
+			err = -ENOMEM;
 			goto out_unpin;
 		}

 		err = igt_write_huge(ctx, obj);
 		if (err) {
-			pr_err("gemfs write-huge failed with size=%u\n",
-			       size);
-			goto out_unpin;
+			pr_err("%s write-huge failed with size=%u, i=%d\n",
+			       __func__, size, i);
 		}
-
+out_unpin:
 		i915_gem_object_unpin_pages(obj);
 		__i915_gem_object_put_pages(obj, I915_MM_NORMAL);
+out_put:
 		i915_gem_object_put(obj);
+
+		if (err == -ENOMEM || err == -ENXIO)
+			err = 0;
+
+		if (err)
+			break;
+
+		cond_resched();
 	}

-	return 0;
+	return err;
+}

-out_unpin:
-	i915_gem_object_unpin_pages(obj);
-out_put:
-	i915_gem_object_put(obj);
+static int igt_ppgtt_sanity_check(void *arg)
+{
+	struct i915_gem_context *ctx = arg;
+	struct drm_i915_private *i915 = ctx->i915;
+	unsigned int supported = INTEL_INFO(i915)->page_sizes;
+	struct {
+		igt_create_fn fn;
+		unsigned int flags;
+	} backends[] = {
+		{ igt_create_system, 0,                        },
+		{ igt_create_local,  I915_BO_ALLOC_CONTIGUOUS, },
+	};
+	struct {
+		u32 size;
+		u32 pages;
+	} combos[] = {
+		{ SZ_64K,		SZ_64K		},
+		{ SZ_2M,		SZ_2M		},
+		{ SZ_2M,		SZ_64K		},
+		{ SZ_2M - SZ_64K,	SZ_64K		},
+		{ SZ_2M - SZ_4K,	SZ_64K | SZ_4K	},
+		{ SZ_2M + SZ_4K,	SZ_64K | SZ_4K	},
+		{ SZ_2M + SZ_4K,	SZ_2M  | SZ_4K	},
+		{ SZ_2M + SZ_64K,	SZ_2M  | SZ_64K },
+	};
+	int i, j;
+	int err;
+
+	if (supported == I915_GTT_PAGE_SIZE_4K)
+		return 0;
+
+	/*
+	 * Sanity check that the HW behaves with a limited set of combinations.
+	 * We already have a bunch of randomised testing, which should give us
+	 * a decent amount of variation between runs, however we should keep
+	 * this to limit the chances of introducing a temporary regression, by
+	 * testing the most obvious cases that might make something blow up.
+	 */
+
+	for (i = 0; i < ARRAY_SIZE(backends); ++i) {
+		for (j = 0; j < ARRAY_SIZE(combos); ++j) {
+			struct drm_i915_gem_object *obj;
+			u32 size = combos[j].size;
+			u32 pages = combos[j].pages;
+
+			obj = backends[i].fn(i915, size, backends[i].flags);
+			if (IS_ERR(obj)) {
+				err = PTR_ERR(obj);
+				if (err == -ENODEV) {
+					pr_info("Device lacks local memory, skipping\n");
+					err = 0;
+					break;
+				}
+
+				return err;
+			}
+
+			err = i915_gem_object_pin_pages(obj);
+			if (err) {
+				i915_gem_object_put(obj);
+				goto out;
+			}
+
+			GEM_BUG_ON(pages > obj->base.size);
+			pages = pages & supported;
+
+			if (pages)
+				obj->mm.page_sizes.sg = pages;
+
+			err = igt_write_huge(ctx, obj);
+
+			i915_gem_object_unpin_pages(obj);
+			__i915_gem_object_put_pages(obj, I915_MM_NORMAL);
+			i915_gem_object_put(obj);
+
+			if (err) {
+				pr_err("%s write-huge failed with size=%u pages=%u i=%d, j=%d\n",
+				       __func__, size, pages, i, j);
+				goto out;
+			}
+		}
+
+		cond_resched();
+	}
+
+out:
+	if (err == -ENOMEM)
+		err = 0;

 	return err;
 }
@ -1756,8 +1909,8 @@ int i915_gem_huge_page_live_selftests(struct drm_i915_private *i915)
 		SUBTEST(igt_ppgtt_pin_update),
 		SUBTEST(igt_tmpfs_fallback),
 		SUBTEST(igt_ppgtt_exhaust_huge),
-		SUBTEST(igt_ppgtt_gemfs_huge),
-		SUBTEST(igt_ppgtt_internal_huge),
+		SUBTEST(igt_ppgtt_smoke_huge),
+		SUBTEST(igt_ppgtt_sanity_check),
 	};
 	struct drm_file *file;
 	struct i915_gem_context *ctx;
--- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_client_blt.c
+++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_client_blt.c
@ -5,6 +5,7 @@

 #include "i915_selftest.h"

+#include "gt/intel_engine_user.h"
 #include "gt/intel_gt.h"

 #include "selftests/igt_flush_test.h"
@ -12,10 +13,9 @@
 #include "huge_gem_object.h"
 #include "mock_context.h"

-static int igt_client_fill(void *arg)
+static int __igt_client_fill(struct intel_engine_cs *engine)
 {
-	struct drm_i915_private *i915 = arg;
-	struct intel_context *ce = i915->engine[BCS0]->kernel_context;
+	struct intel_context *ce = engine->kernel_context;
 	struct drm_i915_gem_object *obj;
 	struct rnd_state prng;
 	IGT_TIMEOUT(end);
@ -37,7 +37,7 @@ static int igt_client_fill(void *arg)
 		pr_debug("%s with phys_sz= %x, sz=%x, val=%x\n", __func__,
 			 phys_sz, sz, val);

-		obj = huge_gem_object(i915, phys_sz, sz);
+		obj = huge_gem_object(engine->i915, phys_sz, sz);
 		if (IS_ERR(obj)) {
 			err = PTR_ERR(obj);
 			goto err_flush;
@ -103,6 +103,28 @@ err_flush:
 	return err;
 }

+static int igt_client_fill(void *arg)
+{
+	int inst = 0;
+
+	do {
+		struct intel_engine_cs *engine;
+		int err;
+
+		engine = intel_engine_lookup_user(arg,
+						  I915_ENGINE_CLASS_COPY,
+						  inst++);
+		if (!engine)
+			return 0;
+
+		err = __igt_client_fill(engine);
+		if (err == -ENOMEM)
+			err = 0;
+		if (err)
+			return err;
+	} while (1);
+}
+
 int i915_gem_client_blt_live_selftests(struct drm_i915_private *i915)
 {
 	static const struct i915_subtest tests[] = {
--- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_coherency.c
+++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_coherency.c
@ -8,13 +8,17 @@

 #include "gt/intel_gt.h"
 #include "gt/intel_gt_pm.h"
+#include "gt/intel_ring.h"

 #include "i915_selftest.h"
 #include "selftests/i915_random.h"

-static int cpu_set(struct drm_i915_gem_object *obj,
-		   unsigned long offset,
-		   u32 v)
+struct context {
+	struct drm_i915_gem_object *obj;
+	struct intel_engine_cs *engine;
+};
+
+static int cpu_set(struct context *ctx, unsigned long offset, u32 v)
 {
 	unsigned int needs_clflush;
 	struct page *page;
@ -22,11 +26,11 @@ static int cpu_set(struct drm_i915_gem_object *obj,
 	u32 *cpu;
 	int err;

-	err = i915_gem_object_prepare_write(obj, &needs_clflush);
+	err = i915_gem_object_prepare_write(ctx->obj, &needs_clflush);
 	if (err)
 		return err;

-	page = i915_gem_object_get_page(obj, offset >> PAGE_SHIFT);
+	page = i915_gem_object_get_page(ctx->obj, offset >> PAGE_SHIFT);
 	map = kmap_atomic(page);
 	cpu = map + offset_in_page(offset);

@ -39,14 +43,12 @@ static int cpu_set(struct drm_i915_gem_object *obj,
 		drm_clflush_virt_range(cpu, sizeof(*cpu));

 	kunmap_atomic(map);
-	i915_gem_object_finish_access(obj);
+	i915_gem_object_finish_access(ctx->obj);

 	return 0;
 }

-static int cpu_get(struct drm_i915_gem_object *obj,
-		   unsigned long offset,
-		   u32 *v)
+static int cpu_get(struct context *ctx, unsigned long offset, u32 *v)
 {
 	unsigned int needs_clflush;
 	struct page *page;
@ -54,11 +56,11 @@ static int cpu_get(struct drm_i915_gem_object *obj,
 	u32 *cpu;
 	int err;

-	err = i915_gem_object_prepare_read(obj, &needs_clflush);
+	err = i915_gem_object_prepare_read(ctx->obj, &needs_clflush);
 	if (err)
 		return err;

-	page = i915_gem_object_get_page(obj, offset >> PAGE_SHIFT);
+	page = i915_gem_object_get_page(ctx->obj, offset >> PAGE_SHIFT);
 	map = kmap_atomic(page);
 	cpu = map + offset_in_page(offset);

@ -68,26 +70,24 @@ static int cpu_get(struct drm_i915_gem_object *obj,
 	*v = *cpu;

 	kunmap_atomic(map);
-	i915_gem_object_finish_access(obj);
+	i915_gem_object_finish_access(ctx->obj);

 	return 0;
 }

-static int gtt_set(struct drm_i915_gem_object *obj,
-		   unsigned long offset,
-		   u32 v)
+static int gtt_set(struct context *ctx, unsigned long offset, u32 v)
 {
 	struct i915_vma *vma;
 	u32 __iomem *map;
 	int err = 0;

-	i915_gem_object_lock(obj);
-	err = i915_gem_object_set_to_gtt_domain(obj, true);
-	i915_gem_object_unlock(obj);
+	i915_gem_object_lock(ctx->obj);
+	err = i915_gem_object_set_to_gtt_domain(ctx->obj, true);
+	i915_gem_object_unlock(ctx->obj);
 	if (err)
 		return err;

-	vma = i915_gem_object_ggtt_pin(obj, NULL, 0, 0, PIN_MAPPABLE);
+	vma = i915_gem_object_ggtt_pin(ctx->obj, NULL, 0, 0, PIN_MAPPABLE);
 	if (IS_ERR(vma))
 		return PTR_ERR(vma);

@ -108,21 +108,19 @@ out_rpm:
 	return err;
 }

-static int gtt_get(struct drm_i915_gem_object *obj,
-		   unsigned long offset,
-		   u32 *v)
+static int gtt_get(struct context *ctx, unsigned long offset, u32 *v)
 {
 	struct i915_vma *vma;
 	u32 __iomem *map;
 	int err = 0;

-	i915_gem_object_lock(obj);
-	err = i915_gem_object_set_to_gtt_domain(obj, false);
-	i915_gem_object_unlock(obj);
+	i915_gem_object_lock(ctx->obj);
+	err = i915_gem_object_set_to_gtt_domain(ctx->obj, false);
+	i915_gem_object_unlock(ctx->obj);
 	if (err)
 		return err;

-	vma = i915_gem_object_ggtt_pin(obj, NULL, 0, 0, PIN_MAPPABLE);
+	vma = i915_gem_object_ggtt_pin(ctx->obj, NULL, 0, 0, PIN_MAPPABLE);
 	if (IS_ERR(vma))
 		return PTR_ERR(vma);

@ -143,73 +141,66 @@ out_rpm:
 	return err;
 }

-static int wc_set(struct drm_i915_gem_object *obj,
-		  unsigned long offset,
-		  u32 v)
+static int wc_set(struct context *ctx, unsigned long offset, u32 v)
 {
 	u32 *map;
 	int err;

-	i915_gem_object_lock(obj);
-	err = i915_gem_object_set_to_wc_domain(obj, true);
-	i915_gem_object_unlock(obj);
+	i915_gem_object_lock(ctx->obj);
+	err = i915_gem_object_set_to_wc_domain(ctx->obj, true);
+	i915_gem_object_unlock(ctx->obj);
 	if (err)
 		return err;

-	map = i915_gem_object_pin_map(obj, I915_MAP_WC);
+	map = i915_gem_object_pin_map(ctx->obj, I915_MAP_WC);
 	if (IS_ERR(map))
 		return PTR_ERR(map);

 	map[offset / sizeof(*map)] = v;
-	i915_gem_object_unpin_map(obj);
+	i915_gem_object_unpin_map(ctx->obj);

 	return 0;
 }

-static int wc_get(struct drm_i915_gem_object *obj,
-		  unsigned long offset,
-		  u32 *v)
+static int wc_get(struct context *ctx, unsigned long offset, u32 *v)
 {
 	u32 *map;
 	int err;

-	i915_gem_object_lock(obj);
-	err = i915_gem_object_set_to_wc_domain(obj, false);
-	i915_gem_object_unlock(obj);
+	i915_gem_object_lock(ctx->obj);
+	err = i915_gem_object_set_to_wc_domain(ctx->obj, false);
+	i915_gem_object_unlock(ctx->obj);
 	if (err)
 		return err;

-	map = i915_gem_object_pin_map(obj, I915_MAP_WC);
+	map = i915_gem_object_pin_map(ctx->obj, I915_MAP_WC);
 	if (IS_ERR(map))
 		return PTR_ERR(map);

 	*v = map[offset / sizeof(*map)];
-	i915_gem_object_unpin_map(obj);
+	i915_gem_object_unpin_map(ctx->obj);

 	return 0;
 }

-static int gpu_set(struct drm_i915_gem_object *obj,
-		   unsigned long offset,
-		   u32 v)
+static int gpu_set(struct context *ctx, unsigned long offset, u32 v)
 {
-	struct drm_i915_private *i915 = to_i915(obj->base.dev);
 	struct i915_request *rq;
 	struct i915_vma *vma;
 	u32 *cs;
 	int err;

-	i915_gem_object_lock(obj);
-	err = i915_gem_object_set_to_gtt_domain(obj, true);
-	i915_gem_object_unlock(obj);
+	i915_gem_object_lock(ctx->obj);
+	err = i915_gem_object_set_to_gtt_domain(ctx->obj, true);
+	i915_gem_object_unlock(ctx->obj);
 	if (err)
 		return err;

-	vma = i915_gem_object_ggtt_pin(obj, NULL, 0, 0, 0);
+	vma = i915_gem_object_ggtt_pin(ctx->obj, NULL, 0, 0, 0);
 	if (IS_ERR(vma))
 		return PTR_ERR(vma);

-	rq = i915_request_create(i915->engine[RCS0]->kernel_context);
+	rq = i915_request_create(ctx->engine->kernel_context);
 	if (IS_ERR(rq)) {
 		i915_vma_unpin(vma);
 		return PTR_ERR(rq);
@ -222,12 +213,12 @@ static int gpu_set(struct drm_i915_gem_object *obj,
 		return PTR_ERR(cs);
 	}

-	if (INTEL_GEN(i915) >= 8) {
+	if (INTEL_GEN(ctx->engine->i915) >= 8) {
 		*cs++ = MI_STORE_DWORD_IMM_GEN4 | 1 << 22;
 		*cs++ = lower_32_bits(i915_ggtt_offset(vma) + offset);
 		*cs++ = upper_32_bits(i915_ggtt_offset(vma) + offset);
 		*cs++ = v;
-	} else if (INTEL_GEN(i915) >= 4) {
+	} else if (INTEL_GEN(ctx->engine->i915) >= 4) {
 		*cs++ = MI_STORE_DWORD_IMM_GEN4 | MI_USE_GGTT;
 		*cs++ = 0;
 		*cs++ = i915_ggtt_offset(vma) + offset;
@ -252,32 +243,34 @@ static int gpu_set(struct drm_i915_gem_object *obj,
 	return err;
 }

-static bool always_valid(struct drm_i915_private *i915)
+static bool always_valid(struct context *ctx)
 {
 	return true;
 }

-static bool needs_fence_registers(struct drm_i915_private *i915)
+static bool needs_fence_registers(struct context *ctx)
 {
-	return !intel_gt_is_wedged(&i915->gt);
+	struct intel_gt *gt = ctx->engine->gt;
+
+	if (intel_gt_is_wedged(gt))
+		return false;
+
+	return gt->ggtt->num_fences;
 }

-static bool needs_mi_store_dword(struct drm_i915_private *i915)
+static bool needs_mi_store_dword(struct context *ctx)
 {
-	if (intel_gt_is_wedged(&i915->gt))
+	if (intel_gt_is_wedged(ctx->engine->gt))
 		return false;

-	if (!HAS_ENGINE(i915, RCS0))
-		return false;
-
-	return intel_engine_can_store_dword(i915->engine[RCS0]);
+	return intel_engine_can_store_dword(ctx->engine);
 }

 static const struct igt_coherency_mode {
 	const char *name;
-	int (*set)(struct drm_i915_gem_object *, unsigned long offset, u32 v);
-	int (*get)(struct drm_i915_gem_object *, unsigned long offset, u32 *v);
-	bool (*valid)(struct drm_i915_private *i915);
+	int (*set)(struct context *ctx, unsigned long offset, u32 v);
+	int (*get)(struct context *ctx, unsigned long offset, u32 *v);
+	bool (*valid)(struct context *ctx);
 } igt_coherency_mode[] = {
 	{ "cpu", cpu_set, cpu_get, always_valid },
 	{ "gtt", gtt_set, gtt_get, needs_fence_registers },
@ -286,18 +279,37 @@ static const struct igt_coherency_mode {
 	{ },
 };

+static struct intel_engine_cs *
+random_engine(struct drm_i915_private *i915, struct rnd_state *prng)
+{
+	struct intel_engine_cs *engine;
+	unsigned int count;
+
+	count = 0;
+	for_each_uabi_engine(engine, i915)
+		count++;
+
+	count = i915_prandom_u32_max_state(count, prng);
+	for_each_uabi_engine(engine, i915)
+		if (count-- == 0)
+			return engine;
+
+	return NULL;
+}
+
 static int igt_gem_coherency(void *arg)
 {
 	const unsigned int ncachelines = PAGE_SIZE/64;
-	I915_RND_STATE(prng);
 	struct drm_i915_private *i915 = arg;
 	const struct igt_coherency_mode *read, *write, *over;
-	struct drm_i915_gem_object *obj;
 	unsigned long count, n;
 	u32 *offsets, *values;
+	I915_RND_STATE(prng);
+	struct context ctx;
 	int err = 0;

-	/* We repeatedly write, overwrite and read from a sequence of
+	/*
+	 * We repeatedly write, overwrite and read from a sequence of
 	 * cachelines in order to try and detect incoherency (unflushed writes
 	 * from either the CPU or GPU). Each setter/getter uses our cache
 	 * domain API which should prevent incoherency.
@ -311,31 +323,35 @@ static int igt_gem_coherency(void *arg)

 	values = offsets + ncachelines;

+	ctx.engine = random_engine(i915, &prng);
+	GEM_BUG_ON(!ctx.engine);
+	pr_info("%s: using %s\n", __func__, ctx.engine->name);
+
 	for (over = igt_coherency_mode; over->name; over++) {
 		if (!over->set)
 			continue;

-		if (!over->valid(i915))
+		if (!over->valid(&ctx))
 			continue;

 		for (write = igt_coherency_mode; write->name; write++) {
 			if (!write->set)
 				continue;

-			if (!write->valid(i915))
+			if (!write->valid(&ctx))
 				continue;

 			for (read = igt_coherency_mode; read->name; read++) {
 				if (!read->get)
 					continue;

-				if (!read->valid(i915))
+				if (!read->valid(&ctx))
 					continue;

 				for_each_prime_number_from(count, 1, ncachelines) {
-					obj = i915_gem_object_create_internal(i915, PAGE_SIZE);
-					if (IS_ERR(obj)) {
-						err = PTR_ERR(obj);
+					ctx.obj = i915_gem_object_create_internal(i915, PAGE_SIZE);
+					if (IS_ERR(ctx.obj)) {
+						err = PTR_ERR(ctx.obj);
 						goto free;
 					}

@ -344,7 +360,7 @@ static int igt_gem_coherency(void *arg)
 						values[n] = prandom_u32_state(&prng);

 					for (n = 0; n < count; n++) {
-						err = over->set(obj, offsets[n], ~values[n]);
+						err = over->set(&ctx, offsets[n], ~values[n]);
 						if (err) {
 							pr_err("Failed to set stale value[%ld/%ld] in object using %s, err=%d\n",
 							       n, count, over->name, err);
@ -353,7 +369,7 @@ static int igt_gem_coherency(void *arg)
 					}

 					for (n = 0; n < count; n++) {
-						err = write->set(obj, offsets[n], values[n]);
+						err = write->set(&ctx, offsets[n], values[n]);
 						if (err) {
 							pr_err("Failed to set value[%ld/%ld] in object using %s, err=%d\n",
 							       n, count, write->name, err);
@ -364,7 +380,7 @@ static int igt_gem_coherency(void *arg)
 					for (n = 0; n < count; n++) {
 						u32 found;

-						err = read->get(obj, offsets[n], &found);
+						err = read->get(&ctx, offsets[n], &found);
 						if (err) {
 							pr_err("Failed to get value[%ld/%ld] in object using %s, err=%d\n",
 							       n, count, read->name, err);
@ -382,7 +398,7 @@ static int igt_gem_coherency(void *arg)
 						}
 					}

-					i915_gem_object_put(obj);
+					i915_gem_object_put(ctx.obj);
 				}
 			}
 		}
@ -392,7 +408,7 @@ free:
 	return err;

 put_object:
-	i915_gem_object_put(obj);
+	i915_gem_object_put(ctx.obj);
 	goto free;
 }

--- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_context.c
@ -32,7 +32,6 @@ static int live_nop_switch(void *arg)
 	struct drm_i915_private *i915 = arg;
 	struct intel_engine_cs *engine;
 	struct i915_gem_context **ctx;
-	enum intel_engine_id id;
 	struct igt_live_test t;
 	struct drm_file *file;
 	unsigned long n;
@ -67,7 +66,7 @@ static int live_nop_switch(void *arg)
 		}
 	}

-	for_each_engine(engine, i915, id) {
+	for_each_uabi_engine(engine, i915) {
 		struct i915_request *rq;
 		unsigned long end_time, prime;
 		ktime_t times[2] = {};
@ -170,18 +169,24 @@ static int __live_parallel_switch1(void *data)
 		struct i915_request *rq = NULL;
 		int err, n;

-		for (n = 0; n < ARRAY_SIZE(arg->ce); n++) {
-			i915_request_put(rq);
+		err = 0;
+		for (n = 0; !err && n < ARRAY_SIZE(arg->ce); n++) {
+			struct i915_request *prev = rq;

 			rq = i915_request_create(arg->ce[n]);
-			if (IS_ERR(rq))
+			if (IS_ERR(rq)) {
+				i915_request_put(prev);
 				return PTR_ERR(rq);
+			}

 			i915_request_get(rq);
+			if (prev) {
+				err = i915_request_await_dma_fence(rq, &prev->fence);
+				i915_request_put(prev);
+			}
+
 			i915_request_add(rq);
 		}
-
-		err = 0;
 		if (i915_request_wait(rq, 0, HZ / 5) < 0)
 			err = -ETIME;
 		i915_request_put(rq);
@ -198,6 +203,7 @@ static int __live_parallel_switch1(void *data)
 static int __live_parallel_switchN(void *data)
 {
 	struct parallel_switch *arg = data;
+	struct i915_request *rq = NULL;
 	IGT_TIMEOUT(end_time);
 	unsigned long count;
 	int n;
@ -205,17 +211,31 @@ static int __live_parallel_switchN(void *data)
 	count = 0;
 	do {
 		for (n = 0; n < ARRAY_SIZE(arg->ce); n++) {
-			struct i915_request *rq;
+			struct i915_request *prev = rq;
+			int err = 0;

 			rq = i915_request_create(arg->ce[n]);
-			if (IS_ERR(rq))
+			if (IS_ERR(rq)) {
+				i915_request_put(prev);
 				return PTR_ERR(rq);
+			}
+
+			i915_request_get(rq);
+			if (prev) {
+				err = i915_request_await_dma_fence(rq, &prev->fence);
+				i915_request_put(prev);
+			}

 			i915_request_add(rq);
+			if (err) {
+				i915_request_put(rq);
+				return err;
+			}
 		}

 		count++;
 	} while (!__igt_timeout(end_time, NULL));
+	i915_request_put(rq);

 	pr_info("%s: %lu switches (many)\n", arg->ce[0]->engine->name, count);
 	return 0;
@ -325,6 +345,8 @@ static int live_parallel_switch(void *arg)
 			get_task_struct(data[n].tsk);
 		}

+		yield(); /* start all threads before we kthread_stop() */
+
 		for (n = 0; n < count; n++) {
 			int status;

@ -583,7 +605,6 @@ static int igt_ctx_exec(void *arg)
 {
 	struct drm_i915_private *i915 = arg;
 	struct intel_engine_cs *engine;
-	enum intel_engine_id id;
 	int err = -ENODEV;

 	/*
@ -595,7 +616,7 @@ static int igt_ctx_exec(void *arg)
 	if (!DRIVER_CAPS(i915)->has_logical_contexts)
 		return 0;

-	for_each_engine(engine, i915, id) {
+	for_each_uabi_engine(engine, i915) {
 		struct drm_i915_gem_object *obj = NULL;
 		unsigned long ncontexts, ndwords, dw;
 		struct i915_request *tq[5] = {};
@ -711,7 +732,6 @@ static int igt_shared_ctx_exec(void *arg)
 	struct i915_request *tq[5] = {};
 	struct i915_gem_context *parent;
 	struct intel_engine_cs *engine;
-	enum intel_engine_id id;
 	struct igt_live_test t;
 	struct drm_file *file;
 	int err = 0;
@ -743,7 +763,7 @@ static int igt_shared_ctx_exec(void *arg)
 	if (err)
 		goto out_file;

-	for_each_engine(engine, i915, id) {
+	for_each_uabi_engine(engine, i915) {
 		unsigned long ncontexts, ndwords, dw;
 		struct drm_i915_gem_object *obj = NULL;
 		IGT_TIMEOUT(end_time);
@ -1168,93 +1188,90 @@ __igt_ctx_sseu(struct drm_i915_private *i915,
 	       const char *name,
 	       unsigned int flags)
 {
-	struct intel_engine_cs *engine = i915->engine[RCS0];
 	struct drm_i915_gem_object *obj;
-	struct i915_gem_context *ctx;
-	struct intel_context *ce;
-	struct intel_sseu pg_sseu;
-	struct drm_file *file;
-	int ret;
+	int inst = 0;
+	int ret = 0;

-	if (INTEL_GEN(i915) < 9 || !engine)
+	if (INTEL_GEN(i915) < 9 || !RUNTIME_INFO(i915)->sseu.has_slice_pg)
 		return 0;

-	if (!RUNTIME_INFO(i915)->sseu.has_slice_pg)
-		return 0;
-
-	if (hweight32(engine->sseu.slice_mask) < 2)
-		return 0;
-
-	/*
-	 * Gen11 VME friendly power-gated configuration with half enabled
-	 * sub-slices.
-	 */
-	pg_sseu = engine->sseu;
-	pg_sseu.slice_mask = 1;
-	pg_sseu.subslice_mask =
-		~(~0 << (hweight32(engine->sseu.subslice_mask) / 2));
-
-	pr_info("SSEU subtest '%s', flags=%x, def_slices=%u, pg_slices=%u\n",
-		name, flags, hweight32(engine->sseu.slice_mask),
-		hweight32(pg_sseu.slice_mask));
-
-	file = mock_file(i915);
-	if (IS_ERR(file))
-		return PTR_ERR(file);
-
 	if (flags & TEST_RESET)
 		igt_global_reset_lock(&i915->gt);

-	ctx = live_context(i915, file);
-	if (IS_ERR(ctx)) {
-		ret = PTR_ERR(ctx);
-		goto out_unlock;
-	}
-	i915_gem_context_clear_bannable(ctx); /* to reset and beyond! */
-
 	obj = i915_gem_object_create_internal(i915, PAGE_SIZE);
 	if (IS_ERR(obj)) {
 		ret = PTR_ERR(obj);
 		goto out_unlock;
 	}

-	ce = i915_gem_context_get_engine(ctx, RCS0);
-	if (IS_ERR(ce)) {
-		ret = PTR_ERR(ce);
-		goto out_put;
-	}
+	do {
+		struct intel_engine_cs *engine;
+		struct intel_context *ce;
+		struct intel_sseu pg_sseu;

-	ret = intel_context_pin(ce);
-	if (ret)
-		goto out_context;
+		engine = intel_engine_lookup_user(i915,
+						  I915_ENGINE_CLASS_RENDER,
+						  inst++);
+		if (!engine)
+			break;

-	/* First set the default mask. */
-	ret = __sseu_test(name, flags, ce, obj, engine->sseu);
-	if (ret)
-		goto out_fail;
+		if (hweight32(engine->sseu.slice_mask) < 2)
+			continue;

-	/* Then set a power-gated configuration. */
-	ret = __sseu_test(name, flags, ce, obj, pg_sseu);
-	if (ret)
-		goto out_fail;
+		/*
+		 * Gen11 VME friendly power-gated configuration with
+		 * half enabled sub-slices.
+		 */
+		pg_sseu = engine->sseu;
+		pg_sseu.slice_mask = 1;
+		pg_sseu.subslice_mask =
+			~(~0 << (hweight32(engine->sseu.subslice_mask) / 2));

-	/* Back to defaults. */
-	ret = __sseu_test(name, flags, ce, obj, engine->sseu);
-	if (ret)
-		goto out_fail;
+		pr_info("%s: SSEU subtest '%s', flags=%x, def_slices=%u, pg_slices=%u\n",
+			engine->name, name, flags,
+			hweight32(engine->sseu.slice_mask),
+			hweight32(pg_sseu.slice_mask));

-	/* One last power-gated configuration for the road. */
-	ret = __sseu_test(name, flags, ce, obj, pg_sseu);
-	if (ret)
-		goto out_fail;
+		ce = intel_context_create(engine->kernel_context->gem_context,
+					  engine);
+		if (IS_ERR(ce)) {
+			ret = PTR_ERR(ce);
+			goto out_put;
+		}
+
+		ret = intel_context_pin(ce);
+		if (ret)
+			goto out_ce;
+
+		/* First set the default mask. */
+		ret = __sseu_test(name, flags, ce, obj, engine->sseu);
+		if (ret)
+			goto out_unpin;
+
+		/* Then set a power-gated configuration. */
+		ret = __sseu_test(name, flags, ce, obj, pg_sseu);
+		if (ret)
+			goto out_unpin;
+
+		/* Back to defaults. */
+		ret = __sseu_test(name, flags, ce, obj, engine->sseu);
+		if (ret)
+			goto out_unpin;
+
+		/* One last power-gated configuration for the road. */
+		ret = __sseu_test(name, flags, ce, obj, pg_sseu);
+		if (ret)
+			goto out_unpin;
+
+out_unpin:
+		intel_context_unpin(ce);
+out_ce:
+		intel_context_put(ce);
+	} while (!ret);

-out_fail:
 	if (igt_flush_test(i915))
 		ret = -EIO;

-	intel_context_unpin(ce);
-out_context:
-	intel_context_put(ce);
 out_put:
 	i915_gem_object_put(obj);

@ -1262,8 +1279,6 @@ out_unlock:
 	if (flags & TEST_RESET)
 		igt_global_reset_unlock(&i915->gt);

-	mock_file_free(i915, file);
-
 	if (ret)
 		pr_err("%s: Failed with %d!\n", name, ret);

@ -1651,7 +1666,6 @@ static int igt_vm_isolation(void *arg)
 	struct drm_file *file;
 	I915_RND_STATE(prng);
 	unsigned long count;
-	unsigned int id;
 	u64 vm_total;
 	int err;

@ -1692,7 +1706,7 @@ static int igt_vm_isolation(void *arg)
 	vm_total -= I915_GTT_PAGE_SIZE;

 	count = 0;
-	for_each_engine(engine, i915, id) {
+	for_each_uabi_engine(engine, i915) {
 		IGT_TIMEOUT(end_time);
 		unsigned long this = 0;

--- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c
+++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c
@ -301,6 +301,9 @@ static int igt_partial_tiling(void *arg)
 	int tiling;
 	int err;

+	if (!i915_ggtt_has_aperture(&i915->ggtt))
+		return 0;
+
 	/* We want to check the page mapping and fencing of a large object
 	 * mmapped through the GTT. The object we create is larger than can
 	 * possibly be mmaped as a whole, and so we must use partial GGTT vma.
@ -431,6 +434,9 @@ static int igt_smoke_tiling(void *arg)
 	IGT_TIMEOUT(end);
 	int err;

+	if (!i915_ggtt_has_aperture(&i915->ggtt))
+		return 0;
+
 	/*
 	 * igt_partial_tiling() does an exhastive check of partial tiling
 	 * chunking, but will undoubtably run out of time. Here, we do a
@ -515,20 +521,19 @@ static int make_obj_busy(struct drm_i915_gem_object *obj)
 {
 	struct drm_i915_private *i915 = to_i915(obj->base.dev);
 	struct intel_engine_cs *engine;
-	enum intel_engine_id id;
-	struct i915_vma *vma;
-	int err;

-	vma = i915_vma_instance(obj, &i915->ggtt.vm, NULL);
-	if (IS_ERR(vma))
-		return PTR_ERR(vma);
-
-	err = i915_vma_pin(vma, 0, 0, PIN_USER);
-	if (err)
-		return err;
-
-	for_each_engine(engine, i915, id) {
+	for_each_uabi_engine(engine, i915) {
 		struct i915_request *rq;
+		struct i915_vma *vma;
+		int err;
+
+		vma = i915_vma_instance(obj, &engine->gt->ggtt->vm, NULL);
+		if (IS_ERR(vma))
+			return PTR_ERR(vma);
+
+		err = i915_vma_pin(vma, 0, 0, PIN_USER);
+		if (err)
+			return err;

 		rq = i915_request_create(engine->kernel_context);
 		if (IS_ERR(rq)) {
@ -544,12 +549,13 @@ static int make_obj_busy(struct drm_i915_gem_object *obj)
 		i915_vma_unlock(vma);

 		i915_request_add(rq);
+		i915_vma_unpin(vma);
+		if (err)
+			return err;
 	}

-	i915_vma_unpin(vma);
 	i915_gem_object_put(obj); /* leave it only alive via its active ref */
-
-	return err;
+	return 0;
 }

 static bool assert_mmap_offset(struct drm_i915_private *i915,
--- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_object_blt.c
+++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_object_blt.c
@ -3,40 +3,241 @@
 * Copyright © 2019 Intel Corporation
 */

+#include <linux/sort.h>
+
 #include "gt/intel_gt.h"
+#include "gt/intel_engine_user.h"

 #include "i915_selftest.h"

+#include "gem/i915_gem_context.h"
 #include "selftests/igt_flush_test.h"
+#include "selftests/i915_random.h"
 #include "selftests/mock_drm.h"
 #include "huge_gem_object.h"
 #include "mock_context.h"

-static int igt_fill_blt(void *arg)
+static int wrap_ktime_compare(const void *A, const void *B)
+{
+	const ktime_t *a = A, *b = B;
+
+	return ktime_compare(*a, *b);
+}
+
+static int __perf_fill_blt(struct drm_i915_gem_object *obj)
+{
+	struct drm_i915_private *i915 = to_i915(obj->base.dev);
+	int inst = 0;
+
+	do {
+		struct intel_engine_cs *engine;
+		ktime_t t[5];
+		int pass;
+		int err;
+
+		engine = intel_engine_lookup_user(i915,
+						  I915_ENGINE_CLASS_COPY,
+						  inst++);
+		if (!engine)
+			return 0;
+
+		for (pass = 0; pass < ARRAY_SIZE(t); pass++) {
+			struct intel_context *ce = engine->kernel_context;
+			ktime_t t0, t1;
+
+			t0 = ktime_get();
+
+			err = i915_gem_object_fill_blt(obj, ce, 0);
+			if (err)
+				return err;
+
+			err = i915_gem_object_wait(obj,
+						   I915_WAIT_ALL,
+						   MAX_SCHEDULE_TIMEOUT);
+			if (err)
+				return err;
+
+			t1 = ktime_get();
+			t[pass] = ktime_sub(t1, t0);
+		}
+
+		sort(t, ARRAY_SIZE(t), sizeof(*t), wrap_ktime_compare, NULL);
+		pr_info("%s: blt %zd KiB fill: %lld MiB/s\n",
+			engine->name,
+			obj->base.size >> 10,
+			div64_u64(mul_u32_u32(4 * obj->base.size,
+					      1000 * 1000 * 1000),
+				  t[1] + 2 * t[2] + t[3]) >> 20);
+	} while (1);
+}
+
+static int perf_fill_blt(void *arg)
 {
 	struct drm_i915_private *i915 = arg;
-	struct intel_context *ce = i915->engine[BCS0]->kernel_context;
-	struct drm_i915_gem_object *obj;
+	static const unsigned long sizes[] = {
+		SZ_4K,
+		SZ_64K,
+		SZ_2M,
+		SZ_64M
+	};
+	int i;
+
+	for (i = 0; i < ARRAY_SIZE(sizes); i++) {
+		struct drm_i915_gem_object *obj;
+		int err;
+
+		obj = i915_gem_object_create_internal(i915, sizes[i]);
+		if (IS_ERR(obj))
+			return PTR_ERR(obj);
+
+		err = __perf_fill_blt(obj);
+		i915_gem_object_put(obj);
+		if (err)
+			return err;
+	}
+
+	return 0;
+}
+
+static int __perf_copy_blt(struct drm_i915_gem_object *src,
+			   struct drm_i915_gem_object *dst)
+{
+	struct drm_i915_private *i915 = to_i915(src->base.dev);
+	int inst = 0;
+
+	do {
+		struct intel_engine_cs *engine;
+		ktime_t t[5];
+		int pass;
+
+		engine = intel_engine_lookup_user(i915,
+						  I915_ENGINE_CLASS_COPY,
+						  inst++);
+		if (!engine)
+			return 0;
+
+		for (pass = 0; pass < ARRAY_SIZE(t); pass++) {
+			struct intel_context *ce = engine->kernel_context;
+			ktime_t t0, t1;
+			int err;
+
+			t0 = ktime_get();
+
+			err = i915_gem_object_copy_blt(src, dst, ce);
+			if (err)
+				return err;
+
+			err = i915_gem_object_wait(dst,
+						   I915_WAIT_ALL,
+						   MAX_SCHEDULE_TIMEOUT);
+			if (err)
+				return err;
+
+			t1 = ktime_get();
+			t[pass] = ktime_sub(t1, t0);
+		}
+
+		sort(t, ARRAY_SIZE(t), sizeof(*t), wrap_ktime_compare, NULL);
+		pr_info("%s: blt %zd KiB copy: %lld MiB/s\n",
+			engine->name,
+			src->base.size >> 10,
+			div64_u64(mul_u32_u32(4 * src->base.size,
+					      1000 * 1000 * 1000),
+				  t[1] + 2 * t[2] + t[3]) >> 20);
+	} while (1);
+}
+
+static int perf_copy_blt(void *arg)
+{
+	struct drm_i915_private *i915 = arg;
+	static const unsigned long sizes[] = {
+		SZ_4K,
+		SZ_64K,
+		SZ_2M,
+		SZ_64M
+	};
+	int i;
+
+	for (i = 0; i < ARRAY_SIZE(sizes); i++) {
+		struct drm_i915_gem_object *src, *dst;
+		int err;
+
+		src = i915_gem_object_create_internal(i915, sizes[i]);
+		if (IS_ERR(src))
+			return PTR_ERR(src);
+
+		dst = i915_gem_object_create_internal(i915, sizes[i]);
+		if (IS_ERR(dst)) {
+			err = PTR_ERR(dst);
+			goto err_src;
+		}
+
+		err = __perf_copy_blt(src, dst);
+
+		i915_gem_object_put(dst);
+err_src:
+		i915_gem_object_put(src);
+		if (err)
+			return err;
+	}
+
+	return 0;
+}
+
+struct igt_thread_arg {
+	struct drm_i915_private *i915;
 	struct rnd_state prng;
+	unsigned int n_cpus;
+};
+
+static int igt_fill_blt_thread(void *arg)
+{
+	struct igt_thread_arg *thread = arg;
+	struct drm_i915_private *i915 = thread->i915;
+	struct rnd_state *prng = &thread->prng;
+	struct drm_i915_gem_object *obj;
+	struct i915_gem_context *ctx;
+	struct intel_context *ce;
+	struct drm_file *file;
+	unsigned int prio;
 	IGT_TIMEOUT(end);
-	u32 *vaddr;
-	int err = 0;
+	int err;

-	prandom_seed_state(&prng, i915_selftest.random_seed);
+	file = mock_file(i915);
+	if (IS_ERR(file))
+		return PTR_ERR(file);

-	/*
-	 * XXX: needs some threads to scale all these tests, also maybe throw
-	 * in submission from higher priority context to see if we are
-	 * preempted for very large objects...
-	 */
+	ctx = live_context(i915, file);
+	if (IS_ERR(ctx)) {
+		err = PTR_ERR(ctx);
+		goto out_file;
+	}
+
+	prio = i915_prandom_u32_max_state(I915_PRIORITY_MAX, prng);
+	ctx->sched.priority = I915_USER_PRIORITY(prio);
+
+	ce = i915_gem_context_get_engine(ctx, BCS0);
+	GEM_BUG_ON(IS_ERR(ce));

 	do {
 		const u32 max_block_size = S16_MAX * PAGE_SIZE;
-		u32 sz = min_t(u64, ce->vm->total >> 4, prandom_u32_state(&prng));
-		u32 phys_sz = sz % (max_block_size + 1);
-		u32 val = prandom_u32_state(&prng);
+		u32 val = prandom_u32_state(prng);
+		u64 total = ce->vm->total;
+		u32 phys_sz;
+		u32 sz;
+		u32 *vaddr;
 		u32 i;

+		/*
+		 * If we have a tiny shared address space, like for the GGTT
+		 * then we can't be too greedy.
+		 */
+		if (i915_is_ggtt(ce->vm))
+			total = div64_u64(total, thread->n_cpus);
+
+		sz = min_t(u64, total >> 4, prandom_u32_state(prng));
+		phys_sz = sz % (max_block_size + 1);
+
 		sz = round_up(sz, PAGE_SIZE);
 		phys_sz = round_up(phys_sz, PAGE_SIZE);

@ -98,28 +299,56 @@ err_flush:
 	if (err == -ENOMEM)
 		err = 0;

+	intel_context_put(ce);
+out_file:
+	mock_file_free(i915, file);
 	return err;
 }

-static int igt_copy_blt(void *arg)
+static int igt_copy_blt_thread(void *arg)
 {
-	struct drm_i915_private *i915 = arg;
-	struct intel_context *ce = i915->engine[BCS0]->kernel_context;
+	struct igt_thread_arg *thread = arg;
+	struct drm_i915_private *i915 = thread->i915;
+	struct rnd_state *prng = &thread->prng;
 	struct drm_i915_gem_object *src, *dst;
-	struct rnd_state prng;
+	struct i915_gem_context *ctx;
+	struct intel_context *ce;
+	struct drm_file *file;
+	unsigned int prio;
 	IGT_TIMEOUT(end);
-	u32 *vaddr;
-	int err = 0;
+	int err;

-	prandom_seed_state(&prng, i915_selftest.random_seed);
+	file = mock_file(i915);
+	if (IS_ERR(file))
+		return PTR_ERR(file);
+
+	ctx = live_context(i915, file);
+	if (IS_ERR(ctx)) {
+		err = PTR_ERR(ctx);
+		goto out_file;
+	}
+
+	prio = i915_prandom_u32_max_state(I915_PRIORITY_MAX, prng);
+	ctx->sched.priority = I915_USER_PRIORITY(prio);
+
+	ce = i915_gem_context_get_engine(ctx, BCS0);
+	GEM_BUG_ON(IS_ERR(ce));

 	do {
 		const u32 max_block_size = S16_MAX * PAGE_SIZE;
-		u32 sz = min_t(u64, ce->vm->total >> 4, prandom_u32_state(&prng));
-		u32 phys_sz = sz % (max_block_size + 1);
-		u32 val = prandom_u32_state(&prng);
+		u32 val = prandom_u32_state(prng);
+		u64 total = ce->vm->total;
+		u32 phys_sz;
+		u32 sz;
+		u32 *vaddr;
 		u32 i;

+		if (i915_is_ggtt(ce->vm))
+			total = div64_u64(total, thread->n_cpus);
+
+		sz = min_t(u64, total >> 4, prandom_u32_state(prng));
+		phys_sz = sz % (max_block_size + 1);
+
 		sz = round_up(sz, PAGE_SIZE);
 		phys_sz = round_up(phys_sz, PAGE_SIZE);

@ -201,12 +430,85 @@ err_flush:
 	if (err == -ENOMEM)
 		err = 0;

+	intel_context_put(ce);
+out_file:
+	mock_file_free(i915, file);
 	return err;
 }

+static int igt_threaded_blt(struct drm_i915_private *i915,
+			    int (*blt_fn)(void *arg))
+{
+	struct igt_thread_arg *thread;
+	struct task_struct **tsk;
+	I915_RND_STATE(prng);
+	unsigned int n_cpus;
+	unsigned int i;
+	int err = 0;
+
+	n_cpus = num_online_cpus() + 1;
+
+	tsk = kcalloc(n_cpus, sizeof(struct task_struct *), GFP_KERNEL);
+	if (!tsk)
+		return 0;
+
+	thread = kcalloc(n_cpus, sizeof(struct igt_thread_arg), GFP_KERNEL);
+	if (!thread) {
+		kfree(tsk);
+		return 0;
+	}
+
+	for (i = 0; i < n_cpus; ++i) {
+		thread[i].i915 = i915;
+		thread[i].n_cpus = n_cpus;
+		thread[i].prng =
+			I915_RND_STATE_INITIALIZER(prandom_u32_state(&prng));
+
+		tsk[i] = kthread_run(blt_fn, &thread[i], "igt/blt-%d", i);
+		if (IS_ERR(tsk[i])) {
+			err = PTR_ERR(tsk[i]);
+			break;
+		}
+
+		get_task_struct(tsk[i]);
+	}
+
+	yield(); /* start all threads before we kthread_stop() */
+
+	for (i = 0; i < n_cpus; ++i) {
+		int status;
+
+		if (IS_ERR_OR_NULL(tsk[i]))
+			continue;
+
+		status = kthread_stop(tsk[i]);
+		if (status && !err)
+			err = status;
+
+		put_task_struct(tsk[i]);
+	}
+
+	kfree(tsk);
+	kfree(thread);
+
+	return err;
+}
+
+static int igt_fill_blt(void *arg)
+{
+	return igt_threaded_blt(arg, igt_fill_blt_thread);
+}
+
+static int igt_copy_blt(void *arg)
+{
+	return igt_threaded_blt(arg, igt_copy_blt_thread);
+}
+
 int i915_gem_object_blt_live_selftests(struct drm_i915_private *i915)
 {
 	static const struct i915_subtest tests[] = {
+		SUBTEST(perf_fill_blt),
+		SUBTEST(perf_copy_blt),
 		SUBTEST(igt_fill_blt),
 		SUBTEST(igt_copy_blt),
 	};
--- a/drivers/gpu/drm/i915/gem/selftests/mock_context.c
+++ b/drivers/gpu/drm/i915/gem/selftests/mock_context.c
@ -22,6 +22,8 @@ mock_context(struct drm_i915_private *i915,
 	INIT_LIST_HEAD(&ctx->link);
 	ctx->i915 = i915;

+	i915_gem_context_set_persistence(ctx);
+
 	mutex_init(&ctx->engines_mutex);
 	e = default_engines(ctx);
 	if (IS_ERR(e))
--- a/drivers/gpu/drm/i915/gt/intel_context.c
+++ b/drivers/gpu/drm/i915/gt/intel_context.c
@ -13,6 +13,7 @@
 #include "intel_context.h"
 #include "intel_engine.h"
 #include "intel_engine_pm.h"
+#include "intel_ring.h"

 static struct i915_global_context {
 	struct i915_global base;
--- a/drivers/gpu/drm/i915/gt/intel_context.h
+++ b/drivers/gpu/drm/i915/gt/intel_context.h
@ -12,6 +12,7 @@
 #include "i915_active.h"
 #include "intel_context_types.h"
 #include "intel_engine_types.h"
+#include "intel_ring_types.h"
 #include "intel_timeline_types.h"

 void intel_context_init(struct intel_context *ce,
--- a/drivers/gpu/drm/i915/gt/intel_engine.h
+++ b/drivers/gpu/drm/i915/gt/intel_engine.h
@ -19,6 +19,7 @@
 #include "intel_workarounds.h"

 struct drm_printer;
+struct intel_gt;

 /* Early gen2 devices have a cacheline of just 32 bytes, using 64 is overkill,
 * but keeps the logic simple. Indeed, the whole purpose of this macro is just
@ -89,38 +90,6 @@ struct drm_printer;
 /* seqno size is actually only a uint32, but since we plan to use MI_FLUSH_DW to
 * do the writes, and that must have qw aligned offsets, simply pretend it's 8b.
 */
-enum intel_engine_hangcheck_action {
-	ENGINE_IDLE = 0,
-	ENGINE_WAIT,
-	ENGINE_ACTIVE_SEQNO,
-	ENGINE_ACTIVE_HEAD,
-	ENGINE_ACTIVE_SUBUNITS,
-	ENGINE_WAIT_KICK,
-	ENGINE_DEAD,
-};
-
-static inline const char *
-hangcheck_action_to_str(const enum intel_engine_hangcheck_action a)
-{
-	switch (a) {
-	case ENGINE_IDLE:
-		return "idle";
-	case ENGINE_WAIT:
-		return "wait";
-	case ENGINE_ACTIVE_SEQNO:
-		return "active seqno";
-	case ENGINE_ACTIVE_HEAD:
-		return "active head";
-	case ENGINE_ACTIVE_SUBUNITS:
-		return "active subunits";
-	case ENGINE_WAIT_KICK:
-		return "wait kick";
-	case ENGINE_DEAD:
-		return "dead";
-	}
-
-	return "unknown";
-}

 static inline unsigned int
 execlists_num_ports(const struct intel_engine_execlists * const execlists)
@ -206,126 +175,13 @@ intel_write_status_page(struct intel_engine_cs *engine, int reg, u32 value)
 #define I915_HWS_CSB_WRITE_INDEX	0x1f
 #define CNL_HWS_CSB_WRITE_INDEX		0x2f

-struct intel_ring *
-intel_engine_create_ring(struct intel_engine_cs *engine, int size);
-int intel_ring_pin(struct intel_ring *ring);
-void intel_ring_reset(struct intel_ring *ring, u32 tail);
-unsigned int intel_ring_update_space(struct intel_ring *ring);
-void intel_ring_unpin(struct intel_ring *ring);
-void intel_ring_free(struct kref *ref);
-
-static inline struct intel_ring *intel_ring_get(struct intel_ring *ring)
-{
-	kref_get(&ring->ref);
-	return ring;
-}
-
-static inline void intel_ring_put(struct intel_ring *ring)
-{
-	kref_put(&ring->ref, intel_ring_free);
-}
-
 void intel_engine_stop(struct intel_engine_cs *engine);
 void intel_engine_cleanup(struct intel_engine_cs *engine);

-int __must_check intel_ring_cacheline_align(struct i915_request *rq);
-
-u32 __must_check *intel_ring_begin(struct i915_request *rq, unsigned int n);
-
-static inline void intel_ring_advance(struct i915_request *rq, u32 *cs)
-{
-	/* Dummy function.
-	 *
-	 * This serves as a placeholder in the code so that the reader
-	 * can compare against the preceding intel_ring_begin() and
-	 * check that the number of dwords emitted matches the space
-	 * reserved for the command packet (i.e. the value passed to
-	 * intel_ring_begin()).
-	 */
-	GEM_BUG_ON((rq->ring->vaddr + rq->ring->emit) != cs);
-}
-
-static inline u32 intel_ring_wrap(const struct intel_ring *ring, u32 pos)
-{
-	return pos & (ring->size - 1);
-}
-
-static inline bool
-intel_ring_offset_valid(const struct intel_ring *ring,
-			unsigned int pos)
-{
-	if (pos & -ring->size) /* must be strictly within the ring */
-		return false;
-
-	if (!IS_ALIGNED(pos, 8)) /* must be qword aligned */
-		return false;
-
-	return true;
-}
-
-static inline u32 intel_ring_offset(const struct i915_request *rq, void *addr)
-{
-	/* Don't write ring->size (equivalent to 0) as that hangs some GPUs. */
-	u32 offset = addr - rq->ring->vaddr;
-	GEM_BUG_ON(offset > rq->ring->size);
-	return intel_ring_wrap(rq->ring, offset);
-}
-
-static inline void
-assert_ring_tail_valid(const struct intel_ring *ring, unsigned int tail)
-{
-	GEM_BUG_ON(!intel_ring_offset_valid(ring, tail));
-
-	/*
-	 * "Ring Buffer Use"
-	 *	Gen2 BSpec "1. Programming Environment" / 1.4.4.6
-	 *	Gen3 BSpec "1c Memory Interface Functions" / 2.3.4.5
-	 *	Gen4+ BSpec "1c Memory Interface and Command Stream" / 5.3.4.5
-	 * "If the Ring Buffer Head Pointer and the Tail Pointer are on the
-	 * same cacheline, the Head Pointer must not be greater than the Tail
-	 * Pointer."
-	 *
-	 * We use ring->head as the last known location of the actual RING_HEAD,
-	 * it may have advanced but in the worst case it is equally the same
-	 * as ring->head and so we should never program RING_TAIL to advance
-	 * into the same cacheline as ring->head.
-	 */
-#define cacheline(a) round_down(a, CACHELINE_BYTES)
-	GEM_BUG_ON(cacheline(tail) == cacheline(ring->head) &&
-		   tail < ring->head);
-#undef cacheline
-}
-
-static inline unsigned int
-intel_ring_set_tail(struct intel_ring *ring, unsigned int tail)
-{
-	/* Whilst writes to the tail are strictly order, there is no
-	 * serialisation between readers and the writers. The tail may be
-	 * read by i915_request_retire() just as it is being updated
-	 * by execlists, as although the breadcrumb is complete, the context
-	 * switch hasn't been seen.
-	 */
-	assert_ring_tail_valid(ring, tail);
-	ring->tail = tail;
-	return tail;
-}
-
-static inline unsigned int
-__intel_ring_space(unsigned int head, unsigned int tail, unsigned int size)
-{
-	/*
-	 * "If the Ring Buffer Head Pointer and the Tail Pointer are on the
-	 * same cacheline, the Head Pointer must not be greater than the Tail
-	 * Pointer."
-	 */
-	GEM_BUG_ON(!is_power_of_2(size));
-	return (head - tail - CACHELINE_BYTES) & (size - 1);
-}
-
-int intel_engines_init_mmio(struct drm_i915_private *i915);
-int intel_engines_setup(struct drm_i915_private *i915);
-int intel_engines_init(struct drm_i915_private *i915);
-void intel_engines_cleanup(struct drm_i915_private *i915);
+int intel_engines_init_mmio(struct intel_gt *gt);
+int intel_engines_setup(struct intel_gt *gt);
+int intel_engines_init(struct intel_gt *gt);
+void intel_engines_cleanup(struct intel_gt *gt);

 int intel_engine_init_common(struct intel_engine_cs *engine);
 void intel_engine_cleanup_common(struct intel_engine_cs *engine);
@ -434,61 +290,6 @@ void intel_engine_dump(struct intel_engine_cs *engine,
 		       struct drm_printer *m,
 		       const char *header, ...);

-static inline void intel_engine_context_in(struct intel_engine_cs *engine)
-{
-	unsigned long flags;
-
-	if (READ_ONCE(engine->stats.enabled) == 0)
-		return;
-
-	write_seqlock_irqsave(&engine->stats.lock, flags);
-
-	if (engine->stats.enabled > 0) {
-		if (engine->stats.active++ == 0)
-			engine->stats.start = ktime_get();
-		GEM_BUG_ON(engine->stats.active == 0);
-	}
-
-	write_sequnlock_irqrestore(&engine->stats.lock, flags);
-}
-
-static inline void intel_engine_context_out(struct intel_engine_cs *engine)
-{
-	unsigned long flags;
-
-	if (READ_ONCE(engine->stats.enabled) == 0)
-		return;
-
-	write_seqlock_irqsave(&engine->stats.lock, flags);
-
-	if (engine->stats.enabled > 0) {
-		ktime_t last;
-
-		if (engine->stats.active && --engine->stats.active == 0) {
-			/*
-			 * Decrement the active context count and in case GPU
-			 * is now idle add up to the running total.
-			 */
-			last = ktime_sub(ktime_get(), engine->stats.start);
-
-			engine->stats.total = ktime_add(engine->stats.total,
-							last);
-		} else if (engine->stats.active == 0) {
-			/*
-			 * After turning on engine stats, context out might be
-			 * the first event in which case we account from the
-			 * time stats gathering was turned on.
-			 */
-			last = ktime_sub(ktime_get(), engine->stats.enabled_at);
-
-			engine->stats.total = ktime_add(engine->stats.total,
-							last);
-		}
-	}
-
-	write_sequnlock_irqrestore(&engine->stats.lock, flags);
-}
-
 int intel_enable_engine_stats(struct intel_engine_cs *engine);
 void intel_disable_engine_stats(struct intel_engine_cs *engine);

@ -525,4 +326,22 @@ void intel_engine_init_active(struct intel_engine_cs *engine,
 #define ENGINE_MOCK	1
 #define ENGINE_VIRTUAL	2

+static inline bool
+intel_engine_has_preempt_reset(const struct intel_engine_cs *engine)
+{
+	if (!IS_ACTIVE(CONFIG_DRM_I915_PREEMPT_TIMEOUT))
+		return false;
+
+	return intel_engine_has_preemption(engine);
+}
+
+static inline bool
+intel_engine_has_timeslices(const struct intel_engine_cs *engine)
+{
+	if (!IS_ACTIVE(CONFIG_DRM_I915_TIMESLICE_DURATION))
+		return false;
+
+	return intel_engine_has_semaphores(engine);
+}
+
 #endif /* _INTEL_RINGBUFFER_H_ */
--- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c
+++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
@ -37,6 +37,7 @@
 #include "intel_context.h"
 #include "intel_lrc.h"
 #include "intel_reset.h"
+#include "intel_ring.h"

 /* Haswell does have the CXT_SIZE register however it does not appear to be
 * valid. Now, docs explain in dwords what is in the context object. The full
@ -308,6 +309,15 @@ static int intel_engine_setup(struct intel_gt *gt, enum intel_engine_id id)
 	engine->instance = info->instance;
 	__sprint_engine_name(engine);

+	engine->props.heartbeat_interval_ms =
+		CONFIG_DRM_I915_HEARTBEAT_INTERVAL;
+	engine->props.preempt_timeout_ms =
+		CONFIG_DRM_I915_PREEMPT_TIMEOUT;
+	engine->props.stop_timeout_ms =
+		CONFIG_DRM_I915_STOP_TIMEOUT;
+	engine->props.timeslice_duration_ms =
+		CONFIG_DRM_I915_TIMESLICE_DURATION;
+
 	/*
 	 * To be overridden by the backend on setup. However to facilitate
 	 * cleanup on error during setup, we always provide the destroy vfunc.
@ -370,38 +380,40 @@ static void __setup_engine_capabilities(struct intel_engine_cs *engine)
 	}
 }

-static void intel_setup_engine_capabilities(struct drm_i915_private *i915)
+static void intel_setup_engine_capabilities(struct intel_gt *gt)
 {
 	struct intel_engine_cs *engine;
 	enum intel_engine_id id;

-	for_each_engine(engine, i915, id)
+	for_each_engine(engine, gt, id)
 		__setup_engine_capabilities(engine);
 }

 /**
 * intel_engines_cleanup() - free the resources allocated for Command Streamers
- * @i915: the i915 devic
+ * @gt: pointer to struct intel_gt
 */
-void intel_engines_cleanup(struct drm_i915_private *i915)
+void intel_engines_cleanup(struct intel_gt *gt)
 {
 	struct intel_engine_cs *engine;
 	enum intel_engine_id id;

-	for_each_engine(engine, i915, id) {
+	for_each_engine(engine, gt, id) {
 		engine->destroy(engine);
-		i915->engine[id] = NULL;
+		gt->engine[id] = NULL;
+		gt->i915->engine[id] = NULL;
 	}
 }

 /**
 * intel_engines_init_mmio() - allocate and prepare the Engine Command Streamers
- * @i915: the i915 device
+ * @gt: pointer to struct intel_gt
 *
 * Return: non-zero if the initialization failed.
 */
-int intel_engines_init_mmio(struct drm_i915_private *i915)
+int intel_engines_init_mmio(struct intel_gt *gt)
 {
+	struct drm_i915_private *i915 = gt->i915;
 	struct intel_device_info *device_info = mkwrite_device_info(i915);
 	const unsigned int engine_mask = INTEL_INFO(i915)->engine_mask;
 	unsigned int mask = 0;
@ -419,7 +431,7 @@ int intel_engines_init_mmio(struct drm_i915_private *i915)
 		if (!HAS_ENGINE(i915, i))
 			continue;

-		err = intel_engine_setup(&i915->gt, i);
+		err = intel_engine_setup(gt, i);
 		if (err)
 			goto cleanup;

@ -436,36 +448,36 @@ int intel_engines_init_mmio(struct drm_i915_private *i915)

 	RUNTIME_INFO(i915)->num_engines = hweight32(mask);

-	intel_gt_check_and_clear_faults(&i915->gt);
+	intel_gt_check_and_clear_faults(gt);

-	intel_setup_engine_capabilities(i915);
+	intel_setup_engine_capabilities(gt);

 	return 0;

 cleanup:
-	intel_engines_cleanup(i915);
+	intel_engines_cleanup(gt);
 	return err;
 }

 /**
 * intel_engines_init() - init the Engine Command Streamers
- * @i915: i915 device private
+ * @gt: pointer to struct intel_gt
 *
 * Return: non-zero if the initialization failed.
 */
-int intel_engines_init(struct drm_i915_private *i915)
+int intel_engines_init(struct intel_gt *gt)
 {
 	int (*init)(struct intel_engine_cs *engine);
 	struct intel_engine_cs *engine;
 	enum intel_engine_id id;
 	int err;

-	if (HAS_EXECLISTS(i915))
+	if (HAS_EXECLISTS(gt->i915))
 		init = intel_execlists_submission_init;
 	else
 		init = intel_ring_submission_init;

-	for_each_engine(engine, i915, id) {
+	for_each_engine(engine, gt, id) {
 		err = init(engine);
 		if (err)
 			goto cleanup;
@ -474,7 +486,7 @@ int intel_engines_init(struct drm_i915_private *i915)
 	return 0;

 cleanup:
-	intel_engines_cleanup(i915);
+	intel_engines_cleanup(gt);
 	return err;
 }

@ -518,7 +530,7 @@ static int pin_ggtt_status_page(struct intel_engine_cs *engine,
 	unsigned int flags;

 	flags = PIN_GLOBAL;
-	if (!HAS_LLC(engine->i915))
+	if (!HAS_LLC(engine->i915) && i915_ggtt_has_aperture(engine->gt->ggtt))
 		/*
 		 * On g33, we cannot place HWS above 256MiB, so
 		 * restrict its pinning to the low mappable arena.
@ -602,7 +614,6 @@ static int intel_engine_setup_common(struct intel_engine_cs *engine)
 	intel_engine_init_active(engine, ENGINE_PHYSICAL);
 	intel_engine_init_breadcrumbs(engine);
 	intel_engine_init_execlists(engine);
-	intel_engine_init_hangcheck(engine);
 	intel_engine_init_cmd_parser(engine);
 	intel_engine_init__pm(engine);

@ -621,26 +632,26 @@ static int intel_engine_setup_common(struct intel_engine_cs *engine)

 /**
 * intel_engines_setup- setup engine state not requiring hw access
- * @i915: Device to setup.
+ * @gt: pointer to struct intel_gt
 *
 * Initializes engine structure members shared between legacy and execlists
 * submission modes which do not require hardware access.
 *
 * Typically done early in the submission mode specific engine setup stage.
 */
-int intel_engines_setup(struct drm_i915_private *i915)
+int intel_engines_setup(struct intel_gt *gt)
 {
 	int (*setup)(struct intel_engine_cs *engine);
 	struct intel_engine_cs *engine;
 	enum intel_engine_id id;
 	int err;

-	if (HAS_EXECLISTS(i915))
+	if (HAS_EXECLISTS(gt->i915))
 		setup = intel_execlists_submission_setup;
 	else
 		setup = intel_ring_submission_setup;

-	for_each_engine(engine, i915, id) {
+	for_each_engine(engine, gt, id) {
 		err = intel_engine_setup_common(engine);
 		if (err)
 			goto cleanup;
@ -658,7 +669,7 @@ int intel_engines_setup(struct drm_i915_private *i915)
 	return 0;

 cleanup:
-	intel_engines_cleanup(i915);
+	intel_engines_cleanup(gt);
 	return err;
 }

@ -873,6 +884,21 @@ u64 intel_engine_get_last_batch_head(const struct intel_engine_cs *engine)
 	return bbaddr;
 }

+static unsigned long stop_timeout(const struct intel_engine_cs *engine)
+{
+	if (in_atomic() || irqs_disabled()) /* inside atomic preempt-reset? */
+		return 0;
+
+	/*
+	 * If we are doing a normal GPU reset, we can take our time and allow
+	 * the engine to quiesce. We've stopped submission to the engine, and
+	 * if we wait long enough an innocent context should complete and
+	 * leave the engine idle. So they should not be caught unaware by
+	 * the forthcoming GPU reset (which usually follows the stop_cs)!
+	 */
+	return READ_ONCE(engine->props.stop_timeout_ms);
+}
+
 int intel_engine_stop_cs(struct intel_engine_cs *engine)
 {
 	struct intel_uncore *uncore = engine->uncore;
@ -890,7 +916,7 @@ int intel_engine_stop_cs(struct intel_engine_cs *engine)
 	err = 0;
 	if (__intel_wait_for_register_fw(uncore,
 					 mode, MODE_IDLE, MODE_IDLE,
-					 1000, 0,
+					 1000, stop_timeout(engine),
 					 NULL)) {
 		GEM_TRACE("%s: timed out on STOP_RING -> IDLE\n", engine->name);
 		err = -ETIMEDOUT;
@ -1318,10 +1344,11 @@ static void intel_engine_print_registers(struct intel_engine_cs *engine,
 		unsigned int idx;
 		u8 read, write;

-		drm_printf(m, "\tExeclist tasklet queued? %s (%s), timeslice? %s\n",
+		drm_printf(m, "\tExeclist tasklet queued? %s (%s), preempt? %s, timeslice? %s\n",
 			   yesno(test_bit(TASKLET_STATE_SCHED,
 					  &engine->execlists.tasklet.state)),
 			   enableddisabled(!atomic_read(&engine->execlists.tasklet.count)),
+			   repr_timer(&engine->execlists.preempt),
 			   repr_timer(&engine->execlists.timer));

 		read = execlists->csb_head;
@ -1447,8 +1474,13 @@ void intel_engine_dump(struct intel_engine_cs *engine,
 		drm_printf(m, "*** WEDGED ***\n");

 	drm_printf(m, "\tAwake? %d\n", atomic_read(&engine->wakeref.count));
-	drm_printf(m, "\tHangcheck: %d ms ago\n",
-		   jiffies_to_msecs(jiffies - engine->hangcheck.action_timestamp));
+
+	rcu_read_lock();
+	rq = READ_ONCE(engine->heartbeat.systole);
+	if (rq)
+		drm_printf(m, "\tHeartbeat: %d ms ago\n",
+			   jiffies_to_msecs(jiffies - rq->emitted_jiffies));
+	rcu_read_unlock();
 	drm_printf(m, "\tReset count: %d (global %d)\n",
 		   i915_reset_engine_count(error, engine),
 		   i915_reset_count(error));
--- a/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c
+++ b/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c
@ -0,0 +1,234 @@
+/*
+ * SPDX-License-Identifier: MIT
+ *
+ * Copyright © 2019 Intel Corporation
+ */
+
+#include "i915_request.h"
+
+#include "intel_context.h"
+#include "intel_engine_heartbeat.h"
+#include "intel_engine_pm.h"
+#include "intel_engine.h"
+#include "intel_gt.h"
+#include "intel_reset.h"
+
+/*
+ * While the engine is active, we send a periodic pulse along the engine
+ * to check on its health and to flush any idle-barriers. If that request
+ * is stuck, and we fail to preempt it, we declare the engine hung and
+ * issue a reset -- in the hope that restores progress.
+ */
+
+static bool next_heartbeat(struct intel_engine_cs *engine)
+{
+	long delay;
+
+	delay = READ_ONCE(engine->props.heartbeat_interval_ms);
+	if (!delay)
+		return false;
+
+	delay = msecs_to_jiffies_timeout(delay);
+	if (delay >= HZ)
+		delay = round_jiffies_up_relative(delay);
+	schedule_delayed_work(&engine->heartbeat.work, delay);
+
+	return true;
+}
+
+static void idle_pulse(struct intel_engine_cs *engine, struct i915_request *rq)
+{
+	engine->wakeref_serial = READ_ONCE(engine->serial) + 1;
+	i915_request_add_active_barriers(rq);
+}
+
+static void show_heartbeat(const struct i915_request *rq,
+			   struct intel_engine_cs *engine)
+{
+	struct drm_printer p = drm_debug_printer("heartbeat");
+
+	intel_engine_dump(engine, &p,
+			  "%s heartbeat {prio:%d} not ticking\n",
+			  engine->name,
+			  rq->sched.attr.priority);
+}
+
+static void heartbeat(struct work_struct *wrk)
+{
+	struct i915_sched_attr attr = {
+		.priority = I915_USER_PRIORITY(I915_PRIORITY_MIN),
+	};
+	struct intel_engine_cs *engine =
+		container_of(wrk, typeof(*engine), heartbeat.work.work);
+	struct intel_context *ce = engine->kernel_context;
+	struct i915_request *rq;
+
+	if (!intel_engine_pm_get_if_awake(engine))
+		return;
+
+	rq = engine->heartbeat.systole;
+	if (rq && i915_request_completed(rq)) {
+		i915_request_put(rq);
+		engine->heartbeat.systole = NULL;
+	}
+
+	if (intel_gt_is_wedged(engine->gt))
+		goto out;
+
+	if (engine->heartbeat.systole) {
+		if (engine->schedule &&
+		    rq->sched.attr.priority < I915_PRIORITY_BARRIER) {
+			/*
+			 * Gradually raise the priority of the heartbeat to
+			 * give high priority work [which presumably desires
+			 * low latency and no jitter] the chance to naturally
+			 * complete before being preempted.
+			 */
+			attr.priority = I915_PRIORITY_MASK;
+			if (rq->sched.attr.priority >= attr.priority)
+				attr.priority |= I915_USER_PRIORITY(I915_PRIORITY_HEARTBEAT);
+			if (rq->sched.attr.priority >= attr.priority)
+				attr.priority = I915_PRIORITY_BARRIER;
+
+			local_bh_disable();
+			engine->schedule(rq, &attr);
+			local_bh_enable();
+		} else {
+			if (IS_ENABLED(CONFIG_DRM_I915_DEBUG_GEM))
+				show_heartbeat(rq, engine);
+
+			intel_gt_handle_error(engine->gt, engine->mask,
+					      I915_ERROR_CAPTURE,
+					      "stopped heartbeat on %s",
+					      engine->name);
+		}
+		goto out;
+	}
+
+	if (engine->wakeref_serial == engine->serial)
+		goto out;
+
+	mutex_lock(&ce->timeline->mutex);
+
+	intel_context_enter(ce);
+	rq = __i915_request_create(ce, GFP_NOWAIT | __GFP_NOWARN);
+	intel_context_exit(ce);
+	if (IS_ERR(rq))
+		goto unlock;
+
+	idle_pulse(engine, rq);
+	if (i915_modparams.enable_hangcheck)
+		engine->heartbeat.systole = i915_request_get(rq);
+
+	__i915_request_commit(rq);
+	__i915_request_queue(rq, &attr);
+
+unlock:
+	mutex_unlock(&ce->timeline->mutex);
+out:
+	if (!next_heartbeat(engine))
+		i915_request_put(fetch_and_zero(&engine->heartbeat.systole));
+	intel_engine_pm_put(engine);
+}
+
+void intel_engine_unpark_heartbeat(struct intel_engine_cs *engine)
+{
+	if (!IS_ACTIVE(CONFIG_DRM_I915_HEARTBEAT_INTERVAL))
+		return;
+
+	next_heartbeat(engine);
+}
+
+void intel_engine_park_heartbeat(struct intel_engine_cs *engine)
+{
+	cancel_delayed_work(&engine->heartbeat.work);
+	i915_request_put(fetch_and_zero(&engine->heartbeat.systole));
+}
+
+void intel_engine_init_heartbeat(struct intel_engine_cs *engine)
+{
+	INIT_DELAYED_WORK(&engine->heartbeat.work, heartbeat);
+}
+
+int intel_engine_set_heartbeat(struct intel_engine_cs *engine,
+			       unsigned long delay)
+{
+	int err;
+
+	/* Send one last pulse before to cleanup persistent hogs */
+	if (!delay && IS_ACTIVE(CONFIG_DRM_I915_PREEMPT_TIMEOUT)) {
+		err = intel_engine_pulse(engine);
+		if (err)
+			return err;
+	}
+
+	WRITE_ONCE(engine->props.heartbeat_interval_ms, delay);
+
+	if (intel_engine_pm_get_if_awake(engine)) {
+		if (delay)
+			intel_engine_unpark_heartbeat(engine);
+		else
+			intel_engine_park_heartbeat(engine);
+		intel_engine_pm_put(engine);
+	}
+
+	return 0;
+}
+
+int intel_engine_pulse(struct intel_engine_cs *engine)
+{
+	struct i915_sched_attr attr = { .priority = I915_PRIORITY_BARRIER };
+	struct intel_context *ce = engine->kernel_context;
+	struct i915_request *rq;
+	int err = 0;
+
+	if (!intel_engine_has_preemption(engine))
+		return -ENODEV;
+
+	if (!intel_engine_pm_get_if_awake(engine))
+		return 0;
+
+	if (mutex_lock_interruptible(&ce->timeline->mutex))
+		goto out_rpm;
+
+	intel_context_enter(ce);
+	rq = __i915_request_create(ce, GFP_NOWAIT | __GFP_NOWARN);
+	intel_context_exit(ce);
+	if (IS_ERR(rq)) {
+		err = PTR_ERR(rq);
+		goto out_unlock;
+	}
+
+	rq->flags |= I915_REQUEST_SENTINEL;
+	idle_pulse(engine, rq);
+
+	__i915_request_commit(rq);
+	__i915_request_queue(rq, &attr);
+
+out_unlock:
+	mutex_unlock(&ce->timeline->mutex);
+out_rpm:
+	intel_engine_pm_put(engine);
+	return err;
+}
+
+int intel_engine_flush_barriers(struct intel_engine_cs *engine)
+{
+	struct i915_request *rq;
+
+	if (llist_empty(&engine->barrier_tasks))
+		return 0;
+
+	rq = i915_request_create(engine->kernel_context);
+	if (IS_ERR(rq))
+		return PTR_ERR(rq);
+
+	idle_pulse(engine, rq);
+	i915_request_add(rq);
+
+	return 0;
+}
+
+#if IS_ENABLED(CONFIG_DRM_I915_SELFTEST)
+#include "selftest_engine_heartbeat.c"
+#endif
--- a/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.h
+++ b/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.h
@ -0,0 +1,23 @@
+/*
+ * SPDX-License-Identifier: MIT
+ *
+ * Copyright © 2019 Intel Corporation
+ */
+
+#ifndef INTEL_ENGINE_HEARTBEAT_H
+#define INTEL_ENGINE_HEARTBEAT_H
+
+struct intel_engine_cs;
+
+void intel_engine_init_heartbeat(struct intel_engine_cs *engine);
+
+int intel_engine_set_heartbeat(struct intel_engine_cs *engine,
+			       unsigned long delay);
+
+void intel_engine_park_heartbeat(struct intel_engine_cs *engine);
+void intel_engine_unpark_heartbeat(struct intel_engine_cs *engine);
+
+int intel_engine_pulse(struct intel_engine_cs *engine);
+int intel_engine_flush_barriers(struct intel_engine_cs *engine);
+
+#endif /* INTEL_ENGINE_HEARTBEAT_H */
--- a/drivers/gpu/drm/i915/gt/intel_engine_pm.c
+++ b/drivers/gpu/drm/i915/gt/intel_engine_pm.c
@ -7,11 +7,13 @@
 #include "i915_drv.h"

 #include "intel_engine.h"
+#include "intel_engine_heartbeat.h"
 #include "intel_engine_pm.h"
 #include "intel_engine_pool.h"
 #include "intel_gt.h"
 #include "intel_gt_pm.h"
 #include "intel_rc6.h"
+#include "intel_ring.h"

 static int __engine_unpark(struct intel_wakeref *wf)
 {
@ -34,7 +36,7 @@ static int __engine_unpark(struct intel_wakeref *wf)
 	if (engine->unpark)
 		engine->unpark(engine);

-	intel_engine_init_hangcheck(engine);
+	intel_engine_unpark_heartbeat(engine);
 	return 0;
 }

@ -111,7 +113,7 @@ static bool switch_to_kernel_context(struct intel_engine_cs *engine)
 	i915_request_add_active_barriers(rq);

 	/* Install ourselves as a preemption barrier */
-	rq->sched.attr.priority = I915_PRIORITY_UNPREEMPTABLE;
+	rq->sched.attr.priority = I915_PRIORITY_BARRIER;
 	__i915_request_commit(rq);

 	/* Release our exclusive hold on the engine */
@ -158,6 +160,7 @@ static int __engine_park(struct intel_wakeref *wf)

 	call_idle_barriers(engine); /* cleanup after wedging */

+	intel_engine_park_heartbeat(engine);
 	intel_engine_disarm_breadcrumbs(engine);
 	intel_engine_pool_park(&engine->pool);

@ -188,6 +191,7 @@ void intel_engine_init__pm(struct intel_engine_cs *engine)
 	struct intel_runtime_pm *rpm = engine->uncore->rpm;

 	intel_wakeref_init(&engine->wakeref, rpm, &wf_ops);
+	intel_engine_init_heartbeat(engine);
 }

 #if IS_ENABLED(CONFIG_DRM_I915_SELFTEST)
--- a/drivers/gpu/drm/i915/gt/intel_engine_types.h
+++ b/drivers/gpu/drm/i915/gt/intel_engine_types.h
@ -15,6 +15,7 @@
 #include <linux/rbtree.h>
 #include <linux/timer.h>
 #include <linux/types.h>
+#include <linux/workqueue.h>

 #include "i915_gem.h"
 #include "i915_pmu.h"
@ -58,6 +59,7 @@ struct i915_gem_context;
 struct i915_request;
 struct i915_sched_attr;
 struct intel_gt;
+struct intel_ring;
 struct intel_uncore;

 typedef u8 intel_engine_mask_t;
@ -76,40 +78,6 @@ struct intel_instdone {
 	u32 row[I915_MAX_SLICES][I915_MAX_SUBSLICES];
 };

-struct intel_engine_hangcheck {
-	u64 acthd;
-	u32 last_ring;
-	u32 last_head;
-	unsigned long action_timestamp;
-	struct intel_instdone instdone;
-};
-
-struct intel_ring {
-	struct kref ref;
-	struct i915_vma *vma;
-	void *vaddr;
-
-	/*
-	 * As we have two types of rings, one global to the engine used
-	 * by ringbuffer submission and those that are exclusive to a
-	 * context used by execlists, we have to play safe and allow
-	 * atomic updates to the pin_count. However, the actual pinning
-	 * of the context is either done during initialisation for
-	 * ringbuffer submission or serialised as part of the context
-	 * pinning for execlists, and so we do not need a mutex ourselves
-	 * to serialise intel_ring_pin/intel_ring_unpin.
-	 */
-	atomic_t pin_count;
-
-	u32 head;
-	u32 tail;
-	u32 emit;
-
-	u32 space;
-	u32 size;
-	u32 effective_size;
-};
-
 /*
 * we use a single page to load ctx workarounds so all of these
 * values are referred in terms of dwords
@ -174,6 +142,11 @@ struct intel_engine_execlists {
 	 */
 	struct timer_list timer;

+	/**
+	 * @preempt: reset the current context if it fails to give way
+	 */
+	struct timer_list preempt;
+
 	/**
 	 * @default_priolist: priority list for I915_PRIORITY_NORMAL
 	 */
@ -326,6 +299,11 @@ struct intel_engine_cs {

 	intel_engine_mask_t saturated; /* submitting semaphores too late? */

+	struct {
+		struct delayed_work work;
+		struct i915_request *systole;
+	} heartbeat;
+
 	unsigned long serial;

 	unsigned long wakeref_serial;
@ -476,8 +454,6 @@ struct intel_engine_cs {
 	/* status_notifier: list of callbacks for context-switch changes */
 	struct atomic_notifier_head context_status_notifier;

-	struct intel_engine_hangcheck hangcheck;
-
 #define I915_ENGINE_NEEDS_CMD_PARSER BIT(0)
 #define I915_ENGINE_SUPPORTS_STATS   BIT(1)
 #define I915_ENGINE_HAS_PREEMPTION   BIT(2)
@ -542,6 +518,13 @@ struct intel_engine_cs {
 		 */
 		ktime_t total;
 	} stats;
+
+	struct {
+		unsigned long heartbeat_interval_ms;
+		unsigned long preempt_timeout_ms;
+		unsigned long stop_timeout_ms;
+		unsigned long timeslice_duration_ms;
+	} props;
 };

 static inline bool
--- a/drivers/gpu/drm/i915/gt/intel_gt.c
+++ b/drivers/gpu/drm/i915/gt/intel_gt.c
@ -9,6 +9,7 @@
 #include "intel_gt_requests.h"
 #include "intel_mocs.h"
 #include "intel_rc6.h"
+#include "intel_rps.h"
 #include "intel_uncore.h"
 #include "intel_pm.h"

@ -22,19 +23,17 @@ void intel_gt_init_early(struct intel_gt *gt, struct drm_i915_private *i915)
 	INIT_LIST_HEAD(&gt->closed_vma);
 	spin_lock_init(&gt->closed_lock);

-	intel_gt_init_hangcheck(gt);
 	intel_gt_init_reset(gt);
 	intel_gt_init_requests(gt);
 	intel_gt_pm_init_early(gt);
+
+	intel_rps_init_early(&gt->rps);
 	intel_uc_init_early(&gt->uc);
 }

 void intel_gt_init_hw_early(struct drm_i915_private *i915)
 {
 	i915->gt.ggtt = &i915->ggtt;
-
-	/* BIOS often leaves RC6 enabled, but disable it for hw init */
-	intel_gt_pm_disable(&i915->gt);
 }

 static void init_unused_ring(struct intel_gt *gt, u32 base)
@ -321,8 +320,7 @@ void intel_gt_chipset_flush(struct intel_gt *gt)

 void intel_gt_driver_register(struct intel_gt *gt)
 {
-	if (IS_GEN(gt->i915, 5))
-		intel_gpu_ips_init(gt->i915);
+	intel_rps_driver_register(&gt->rps);
 }

 static int intel_gt_init_scratch(struct intel_gt *gt, unsigned int size)
@ -380,20 +378,16 @@ int intel_gt_init(struct intel_gt *gt)
 void intel_gt_driver_remove(struct intel_gt *gt)
 {
 	GEM_BUG_ON(gt->awake);
-	intel_gt_pm_disable(gt);
 }

 void intel_gt_driver_unregister(struct intel_gt *gt)
 {
-	intel_gpu_ips_teardown();
+	intel_rps_driver_unregister(&gt->rps);
 }

 void intel_gt_driver_release(struct intel_gt *gt)
 {
-	/* Paranoia: make sure we have disabled everything before we exit. */
-	intel_gt_pm_disable(gt);
 	intel_gt_pm_fini(gt);
-
 	intel_gt_fini_scratch(gt);
 }

--- a/drivers/gpu/drm/i915/gt/intel_gt.h
+++ b/drivers/gpu/drm/i915/gt/intel_gt.h
@ -46,8 +46,6 @@ void intel_gt_clear_error_registers(struct intel_gt *gt,
 void intel_gt_flush_ggtt_writes(struct intel_gt *gt);
 void intel_gt_chipset_flush(struct intel_gt *gt);

-void intel_gt_init_hangcheck(struct intel_gt *gt);
-
 static inline u32 intel_gt_scratch_offset(const struct intel_gt *gt,
 					  enum intel_gt_scratch_field field)
 {
@ -59,6 +57,4 @@ static inline bool intel_gt_is_wedged(struct intel_gt *gt)
 	return __intel_reset_failed(&gt->reset);
 }

-void intel_gt_queue_hangcheck(struct intel_gt *gt);
-
 #endif /* __INTEL_GT_H__ */
--- a/drivers/gpu/drm/i915/gt/intel_gt_irq.c
+++ b/drivers/gpu/drm/i915/gt/intel_gt_irq.c
@ -11,6 +11,7 @@
 #include "intel_gt.h"
 #include "intel_gt_irq.h"
 #include "intel_uncore.h"
+#include "intel_rps.h"

 static void guc_irq_handler(struct intel_guc *guc, u16 iir)
 {
@ -77,7 +78,7 @@ gen11_other_irq_handler(struct intel_gt *gt, const u8 instance,
 		return guc_irq_handler(&gt->uc.guc, iir);

 	if (instance == OTHER_GTPM_INSTANCE)
-		return gen11_rps_irq_handler(gt, iir);
+		return gen11_rps_irq_handler(&gt->rps, iir);

 	WARN_ONCE(1, "unhandled other interrupt instance=0x%x, iir=0x%x\n",
 		  instance, iir);
@ -336,7 +337,7 @@ void gen8_gt_irq_handler(struct intel_gt *gt, u32 master_ctl, u32 gt_iir[4])
 	}

 	if (master_ctl & (GEN8_GT_PM_IRQ | GEN8_GT_GUC_IRQ)) {
-		gen6_rps_irq_handler(gt->i915, gt_iir[2]);
+		gen6_rps_irq_handler(&gt->rps, gt_iir[2]);
 		guc_irq_handler(&gt->uc.guc, gt_iir[2] >> 16);
 	}
 }
--- a/drivers/gpu/drm/i915/gt/intel_gt_pm.c
+++ b/drivers/gpu/drm/i915/gt/intel_gt_pm.c
@ -12,15 +12,12 @@
 #include "intel_gt.h"
 #include "intel_gt_pm.h"
 #include "intel_gt_requests.h"
+#include "intel_llc.h"
 #include "intel_pm.h"
 #include "intel_rc6.h"
+#include "intel_rps.h"
 #include "intel_wakeref.h"

-static void pm_notify(struct intel_gt *gt, int state)
-{
-	blocking_notifier_call_chain(&gt->pm_notifications, state, gt->i915);
-}
-
 static int __gt_unpark(struct intel_wakeref *wf)
 {
 	struct intel_gt *gt = container_of(wf, typeof(*gt), wakeref);
@ -44,19 +41,11 @@ static int __gt_unpark(struct intel_wakeref *wf)
 	gt->awake = intel_display_power_get(i915, POWER_DOMAIN_GT_IRQ);
 	GEM_BUG_ON(!gt->awake);

-	intel_enable_gt_powersave(i915);
-
-	i915_update_gfx_val(i915);
-	if (INTEL_GEN(i915) >= 6)
-		gen6_rps_busy(i915);
-
+	intel_rps_unpark(&gt->rps);
 	i915_pmu_gt_unparked(i915);

-	intel_gt_queue_hangcheck(gt);
 	intel_gt_unpark_requests(gt);

-	pm_notify(gt, INTEL_GT_UNPARK);
-
 	return 0;
 }

@ -68,12 +57,11 @@ static int __gt_park(struct intel_wakeref *wf)

 	GEM_TRACE("\n");

-	pm_notify(gt, INTEL_GT_PARK);
 	intel_gt_park_requests(gt);

+	i915_vma_parked(gt);
 	i915_pmu_gt_parked(i915);
-	if (INTEL_GEN(i915) >= 6)
-		gen6_rps_idle(i915);
+	intel_rps_park(&gt->rps);

 	/* Everything switched off, flush any residual interrupt just in case */
 	intel_synchronize_irq(i915);
@ -95,8 +83,6 @@ static const struct intel_wakeref_ops wf_ops = {
 void intel_gt_pm_init_early(struct intel_gt *gt)
 {
 	intel_wakeref_init(&gt->wakeref, gt->uncore->rpm, &wf_ops);
-
-	BLOCKING_INIT_NOTIFIER_HEAD(&gt->pm_notifications);
 }

 void intel_gt_pm_init(struct intel_gt *gt)
@ -107,6 +93,7 @@ void intel_gt_pm_init(struct intel_gt *gt)
 	 * user.
 	 */
 	intel_rc6_init(&gt->rc6);
+	intel_rps_init(&gt->rps);
 }

 static bool reset_engines(struct intel_gt *gt)
@ -150,12 +137,6 @@ void intel_gt_sanitize(struct intel_gt *gt, bool force)
 			engine->reset.finish(engine);
 }

-void intel_gt_pm_disable(struct intel_gt *gt)
-{
-	if (!is_mock_gt(gt))
-		intel_sanitize_gt_powersave(gt->i915);
-}
-
 void intel_gt_pm_fini(struct intel_gt *gt)
 {
 	intel_rc6_fini(&gt->rc6);
@ -174,9 +155,13 @@ int intel_gt_resume(struct intel_gt *gt)
 	 * allowing us to fixup the user contexts on their first pin.
 	 */
 	intel_gt_pm_get(gt);
+
 	intel_uncore_forcewake_get(gt->uncore, FORCEWAKE_ALL);
 	intel_rc6_sanitize(&gt->rc6);

+	intel_rps_enable(&gt->rps);
+	intel_llc_enable(&gt->llc);
+
 	for_each_engine(engine, gt, id) {
 		struct intel_context *ce;

@ -185,9 +170,7 @@ int intel_gt_resume(struct intel_gt *gt)
 		ce = engine->kernel_context;
 		if (ce) {
 			GEM_BUG_ON(!intel_context_is_pinned(ce));
-			mutex_acquire(&ce->pin_mutex.dep_map, 0, 0, _THIS_IP_);
 			ce->ops->reset(ce);
-			mutex_release(&ce->pin_mutex.dep_map, 0, _THIS_IP_);
 		}

 		engine->serial++; /* kernel context lost */
@ -229,8 +212,11 @@ void intel_gt_suspend(struct intel_gt *gt)
 	/* We expect to be idle already; but also want to be independent */
 	wait_for_idle(gt);

-	with_intel_runtime_pm(gt->uncore->rpm, wakeref)
+	with_intel_runtime_pm(gt->uncore->rpm, wakeref) {
+		intel_rps_disable(&gt->rps);
 		intel_rc6_disable(&gt->rc6);
+		intel_llc_disable(&gt->llc);
+	}
 }

 void intel_gt_runtime_suspend(struct intel_gt *gt)
--- a/drivers/gpu/drm/i915/gt/intel_gt_pm.h
+++ b/drivers/gpu/drm/i915/gt/intel_gt_pm.h
@ -12,11 +12,6 @@
 #include "intel_gt_types.h"
 #include "intel_wakeref.h"

-enum {
-	INTEL_GT_UNPARK,
-	INTEL_GT_PARK,
-};
-
 static inline bool intel_gt_pm_is_awake(const struct intel_gt *gt)
 {
 	return intel_wakeref_is_active(&gt->wakeref);
@ -44,7 +39,6 @@ static inline int intel_gt_pm_wait_for_idle(struct intel_gt *gt)

 void intel_gt_pm_init_early(struct intel_gt *gt);
 void intel_gt_pm_init(struct intel_gt *gt);
-void intel_gt_pm_disable(struct intel_gt *gt);
 void intel_gt_pm_fini(struct intel_gt *gt);

 void intel_gt_sanitize(struct intel_gt *gt, bool force);
--- a/drivers/gpu/drm/i915/gt/intel_gt_types.h
+++ b/drivers/gpu/drm/i915/gt/intel_gt_types.h
@ -20,6 +20,7 @@
 #include "intel_llc_types.h"
 #include "intel_reset_types.h"
 #include "intel_rc6_types.h"
+#include "intel_rps_types.h"
 #include "intel_wakeref.h"

 struct drm_i915_private;
@ -27,14 +28,6 @@ struct i915_ggtt;
 struct intel_engine_cs;
 struct intel_uncore;

-struct intel_hangcheck {
-	/* For hangcheck timer */
-#define DRM_I915_HANGCHECK_PERIOD 1500 /* in ms */
-#define DRM_I915_HANGCHECK_JIFFIES msecs_to_jiffies(DRM_I915_HANGCHECK_PERIOD)
-
-	struct delayed_work work;
-};
-
 struct intel_gt {
 	struct drm_i915_private *i915;
 	struct intel_uncore *uncore;
@ -68,7 +61,6 @@ struct intel_gt {
 	struct list_head closed_vma;
 	spinlock_t closed_lock; /* guards the list of closed_vma */

-	struct intel_hangcheck hangcheck;
 	struct intel_reset reset;

 	/**
@ -82,8 +74,7 @@ struct intel_gt {

 	struct intel_llc llc;
 	struct intel_rc6 rc6;
-
-	struct blocking_notifier_head pm_notifications;
+	struct intel_rps rps;

 	ktime_t last_init_time;

--- a/drivers/gpu/drm/i915/gt/intel_hangcheck.c
+++ b/drivers/gpu/drm/i915/gt/intel_hangcheck.c
@ -1,361 +0,0 @@
-/*
- * Copyright © 2016 Intel Corporation
- *
- * Permission is hereby granted, free of charge, to any person obtaining a
- * copy of this software and associated documentation files (the "Software"),
- * to deal in the Software without restriction, including without limitation
- * the rights to use, copy, modify, merge, publish, distribute, sublicense,
- * and/or sell copies of the Software, and to permit persons to whom the
- * Software is furnished to do so, subject to the following conditions:
- *
- * The above copyright notice and this permission notice (including the next
- * paragraph) shall be included in all copies or substantial portions of the
- * Software.
- *
- * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
- * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
- * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
- * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
- * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
- * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
- * IN THE SOFTWARE.
- *
- */
-
-#include "i915_drv.h"
-#include "intel_engine.h"
-#include "intel_gt.h"
-#include "intel_reset.h"
-
-struct hangcheck {
-	u64 acthd;
-	u32 ring;
-	u32 head;
-	enum intel_engine_hangcheck_action action;
-	unsigned long action_timestamp;
-	int deadlock;
-	struct intel_instdone instdone;
-	bool wedged:1;
-	bool stalled:1;
-};
-
-static bool instdone_unchanged(u32 current_instdone, u32 *old_instdone)
-{
-	u32 tmp = current_instdone | *old_instdone;
-	bool unchanged;
-
-	unchanged = tmp == *old_instdone;
-	*old_instdone |= tmp;
-
-	return unchanged;
-}
-
-static bool subunits_stuck(struct intel_engine_cs *engine)
-{
-	struct drm_i915_private *dev_priv = engine->i915;
-	const struct sseu_dev_info *sseu = &RUNTIME_INFO(dev_priv)->sseu;
-	struct intel_instdone instdone;
-	struct intel_instdone *accu_instdone = &engine->hangcheck.instdone;
-	bool stuck;
-	int slice;
-	int subslice;
-
-	intel_engine_get_instdone(engine, &instdone);
-
-	/* There might be unstable subunit states even when
-	 * actual head is not moving. Filter out the unstable ones by
-	 * accumulating the undone -> done transitions and only
-	 * consider those as progress.
-	 */
-	stuck = instdone_unchanged(instdone.instdone,
-				   &accu_instdone->instdone);
-	stuck &= instdone_unchanged(instdone.slice_common,
-				    &accu_instdone->slice_common);
-
-	for_each_instdone_slice_subslice(dev_priv, sseu, slice, subslice) {
-		stuck &= instdone_unchanged(instdone.sampler[slice][subslice],
-					    &accu_instdone->sampler[slice][subslice]);
-		stuck &= instdone_unchanged(instdone.row[slice][subslice],
-					    &accu_instdone->row[slice][subslice]);
-	}
-
-	return stuck;
-}
-
-static enum intel_engine_hangcheck_action
-head_stuck(struct intel_engine_cs *engine, u64 acthd)
-{
-	if (acthd != engine->hangcheck.acthd) {
-
-		/* Clear subunit states on head movement */
-		memset(&engine->hangcheck.instdone, 0,
-		       sizeof(engine->hangcheck.instdone));
-
-		return ENGINE_ACTIVE_HEAD;
-	}
-
-	if (!subunits_stuck(engine))
-		return ENGINE_ACTIVE_SUBUNITS;
-
-	return ENGINE_DEAD;
-}
-
-static enum intel_engine_hangcheck_action
-engine_stuck(struct intel_engine_cs *engine, u64 acthd)
-{
-	enum intel_engine_hangcheck_action ha;
-	u32 tmp;
-
-	ha = head_stuck(engine, acthd);
-	if (ha != ENGINE_DEAD)
-		return ha;
-
-	if (IS_GEN(engine->i915, 2))
-		return ENGINE_DEAD;
-
-	/* Is the chip hanging on a WAIT_FOR_EVENT?
-	 * If so we can simply poke the RB_WAIT bit
-	 * and break the hang. This should work on
-	 * all but the second generation chipsets.
-	 */
-	tmp = ENGINE_READ(engine, RING_CTL);
-	if (tmp & RING_WAIT) {
-		intel_gt_handle_error(engine->gt, engine->mask, 0,
-				      "stuck wait on %s", engine->name);
-		ENGINE_WRITE(engine, RING_CTL, tmp);
-		return ENGINE_WAIT_KICK;
-	}
-
-	return ENGINE_DEAD;
-}
-
-static void hangcheck_load_sample(struct intel_engine_cs *engine,
-				  struct hangcheck *hc)
-{
-	hc->acthd = intel_engine_get_active_head(engine);
-	hc->ring = ENGINE_READ(engine, RING_START);
-	hc->head = ENGINE_READ(engine, RING_HEAD);
-}
-
-static void hangcheck_store_sample(struct intel_engine_cs *engine,
-				   const struct hangcheck *hc)
-{
-	engine->hangcheck.acthd = hc->acthd;
-	engine->hangcheck.last_ring = hc->ring;
-	engine->hangcheck.last_head = hc->head;
-}
-
-static enum intel_engine_hangcheck_action
-hangcheck_get_action(struct intel_engine_cs *engine,
-		     const struct hangcheck *hc)
-{
-	if (intel_engine_is_idle(engine))
-		return ENGINE_IDLE;
-
-	if (engine->hangcheck.last_ring != hc->ring)
-		return ENGINE_ACTIVE_SEQNO;
-
-	if (engine->hangcheck.last_head != hc->head)
-		return ENGINE_ACTIVE_SEQNO;
-
-	return engine_stuck(engine, hc->acthd);
-}
-
-static void hangcheck_accumulate_sample(struct intel_engine_cs *engine,
-					struct hangcheck *hc)
-{
-	unsigned long timeout = I915_ENGINE_DEAD_TIMEOUT;
-
-	hc->action = hangcheck_get_action(engine, hc);
-
-	/* We always increment the progress
-	 * if the engine is busy and still processing
-	 * the same request, so that no single request
-	 * can run indefinitely (such as a chain of
-	 * batches). The only time we do not increment
-	 * the hangcheck score on this ring, if this
-	 * engine is in a legitimate wait for another
-	 * engine. In that case the waiting engine is a
-	 * victim and we want to be sure we catch the
-	 * right culprit. Then every time we do kick
-	 * the ring, make it as a progress as the seqno
-	 * advancement might ensure and if not, it
-	 * will catch the hanging engine.
-	 */
-
-	switch (hc->action) {
-	case ENGINE_IDLE:
-	case ENGINE_ACTIVE_SEQNO:
-		/* Clear head and subunit states on seqno movement */
-		hc->acthd = 0;
-
-		memset(&engine->hangcheck.instdone, 0,
-		       sizeof(engine->hangcheck.instdone));
-
-		/* Intentional fall through */
-	case ENGINE_WAIT_KICK:
-	case ENGINE_WAIT:
-		engine->hangcheck.action_timestamp = jiffies;
-		break;
-
-	case ENGINE_ACTIVE_HEAD:
-	case ENGINE_ACTIVE_SUBUNITS:
-		/*
-		 * Seqno stuck with still active engine gets leeway,
-		 * in hopes that it is just a long shader.
-		 */
-		timeout = I915_SEQNO_DEAD_TIMEOUT;
-		break;
-
-	case ENGINE_DEAD:
-		break;
-
-	default:
-		MISSING_CASE(hc->action);
-	}
-
-	hc->stalled = time_after(jiffies,
-				 engine->hangcheck.action_timestamp + timeout);
-	hc->wedged = time_after(jiffies,
-				 engine->hangcheck.action_timestamp +
-				 I915_ENGINE_WEDGED_TIMEOUT);
-}
-
-static void hangcheck_declare_hang(struct intel_gt *gt,
-				   intel_engine_mask_t hung,
-				   intel_engine_mask_t stuck)
-{
-	struct intel_engine_cs *engine;
-	intel_engine_mask_t tmp;
-	char msg[80];
-	int len;
-
-	/* If some rings hung but others were still busy, only
-	 * blame the hanging rings in the synopsis.
-	 */
-	if (stuck != hung)
-		hung &= ~stuck;
-	len = scnprintf(msg, sizeof(msg),
-			"%s on ", stuck == hung ? "no progress" : "hang");
-	for_each_engine_masked(engine, gt, hung, tmp)
-		len += scnprintf(msg + len, sizeof(msg) - len,
-				 "%s, ", engine->name);
-	msg[len-2] = '\0';
-
-	return intel_gt_handle_error(gt, hung, I915_ERROR_CAPTURE, "%s", msg);
-}
-
-/*
- * This is called when the chip hasn't reported back with completed
- * batchbuffers in a long time. We keep track per ring seqno progress and
- * if there are no progress, hangcheck score for that ring is increased.
- * Further, acthd is inspected to see if the ring is stuck. On stuck case
- * we kick the ring. If we see no progress on three subsequent calls
- * we assume chip is wedged and try to fix it by resetting the chip.
- */
-static void hangcheck_elapsed(struct work_struct *work)
-{
-	struct intel_gt *gt =
-		container_of(work, typeof(*gt), hangcheck.work.work);
-	intel_engine_mask_t hung = 0, stuck = 0, wedged = 0;
-	struct intel_engine_cs *engine;
-	enum intel_engine_id id;
-	intel_wakeref_t wakeref;
-
-	if (!i915_modparams.enable_hangcheck)
-		return;
-
-	if (!READ_ONCE(gt->awake))
-		return;
-
-	if (intel_gt_is_wedged(gt))
-		return;
-
-	wakeref = intel_runtime_pm_get_if_in_use(gt->uncore->rpm);
-	if (!wakeref)
-		return;
-
-	/* As enabling the GPU requires fairly extensive mmio access,
-	 * periodically arm the mmio checker to see if we are triggering
-	 * any invalid access.
-	 */
-	intel_uncore_arm_unclaimed_mmio_detection(gt->uncore);
-
-	for_each_engine(engine, gt, id) {
-		struct hangcheck hc;
-
-		intel_engine_breadcrumbs_irq(engine);
-
-		hangcheck_load_sample(engine, &hc);
-		hangcheck_accumulate_sample(engine, &hc);
-		hangcheck_store_sample(engine, &hc);
-
-		if (hc.stalled) {
-			hung |= engine->mask;
-			if (hc.action != ENGINE_DEAD)
-				stuck |= engine->mask;
-		}
-
-		if (hc.wedged)
-			wedged |= engine->mask;
-	}
-
-	if (GEM_SHOW_DEBUG() && (hung | stuck)) {
-		struct drm_printer p = drm_debug_printer("hangcheck");
-
-		for_each_engine(engine, gt, id) {
-			if (intel_engine_is_idle(engine))
-				continue;
-
-			intel_engine_dump(engine, &p, "%s\n", engine->name);
-		}
-	}
-
-	if (wedged) {
-		dev_err(gt->i915->drm.dev,
-			"GPU recovery timed out,"
-			" cancelling all in-flight rendering.\n");
-		GEM_TRACE_DUMP();
-		intel_gt_set_wedged(gt);
-	}
-
-	if (hung)
-		hangcheck_declare_hang(gt, hung, stuck);
-
-	intel_runtime_pm_put(gt->uncore->rpm, wakeref);
-
-	/* Reset timer in case GPU hangs without another request being added */
-	intel_gt_queue_hangcheck(gt);
-}
-
-void intel_gt_queue_hangcheck(struct intel_gt *gt)
-{
-	unsigned long delay;
-
-	if (unlikely(!i915_modparams.enable_hangcheck))
-		return;
-
-	/*
-	 * Don't continually defer the hangcheck so that it is always run at
-	 * least once after work has been scheduled on any ring. Otherwise,
-	 * we will ignore a hung ring if a second ring is kept busy.
-	 */
-
-	delay = round_jiffies_up_relative(DRM_I915_HANGCHECK_JIFFIES);
-	queue_delayed_work(system_long_wq, &gt->hangcheck.work, delay);
-}
-
-void intel_engine_init_hangcheck(struct intel_engine_cs *engine)
-{
-	memset(&engine->hangcheck, 0, sizeof(engine->hangcheck));
-	engine->hangcheck.action_timestamp = jiffies;
-}
-
-void intel_gt_init_hangcheck(struct intel_gt *gt)
-{
-	INIT_DELAYED_WORK(&gt->hangcheck.work, hangcheck_elapsed);
-}
-
-#if IS_ENABLED(CONFIG_DRM_I915_SELFTEST)
-#include "selftest_hangcheck.c"
-#endif
--- a/drivers/gpu/drm/i915/gt/intel_llc.c
+++ b/drivers/gpu/drm/i915/gt/intel_llc.c
@ -48,7 +48,7 @@ static bool get_ia_constants(struct intel_llc *llc,
 			     struct ia_constants *consts)
 {
 	struct drm_i915_private *i915 = llc_to_gt(llc)->i915;
-	struct intel_rps *rps = &i915->gt_pm.rps;
+	struct intel_rps *rps = &llc_to_gt(llc)->rps;

 	if (rps->max_freq <= rps->min_freq)
 		return false;
--- a/drivers/gpu/drm/i915/gt/intel_lrc.c
+++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
@ -145,6 +145,7 @@
 #include "intel_lrc_reg.h"
 #include "intel_mocs.h"
 #include "intel_reset.h"
+#include "intel_ring.h"
 #include "intel_workarounds.h"

 #define RING_EXECLIST_QFULL		(1 << 0x2)
@ -234,16 +235,9 @@ static void execlists_init_reg_state(u32 *reg_state,
 				     const struct intel_engine_cs *engine,
 				     const struct intel_ring *ring,
 				     bool close);
-
-static void __context_pin_acquire(struct intel_context *ce)
-{
-	mutex_acquire(&ce->pin_mutex.dep_map, 2, 0, _RET_IP_);
-}
-
-static void __context_pin_release(struct intel_context *ce)
-{
-	mutex_release(&ce->pin_mutex.dep_map, 0, _RET_IP_);
-}
+static void
+__execlists_update_reg_state(const struct intel_context *ce,
+			     const struct intel_engine_cs *engine);

 static void mark_eio(struct i915_request *rq)
 {
@ -256,6 +250,23 @@ static void mark_eio(struct i915_request *rq)
 	i915_request_mark_complete(rq);
 }

+static struct i915_request *
+active_request(const struct intel_timeline * const tl, struct i915_request *rq)
+{
+	struct i915_request *active = rq;
+
+	rcu_read_lock();
+	list_for_each_entry_continue_reverse(rq, &tl->requests, link) {
+		if (i915_request_completed(rq))
+			break;
+
+		active = rq;
+	}
+	rcu_read_unlock();
+
+	return active;
+}
+
 static inline u32 intel_hws_preempt_address(struct intel_engine_cs *engine)
 {
 	return (i915_ggtt_offset(engine->status_page.vma) +
@ -460,8 +471,7 @@ lrc_descriptor(struct intel_context *ce, struct intel_engine_cs *engine)
 	if (IS_GEN(engine->i915, 8))
 		desc |= GEN8_CTX_L3LLC_COHERENT;

-	desc |= i915_ggtt_offset(ce->state) + LRC_HEADER_PAGES * PAGE_SIZE;
-								/* bits 12-31 */
+	desc |= i915_ggtt_offset(ce->state); /* bits 12-31 */
 	/*
 	 * The following 32bits are copied into the OA reports (dword 2).
 	 * Consider updating oa_get_render_ctx_id in i915_perf.c when changing
@ -925,6 +935,61 @@ execlists_context_status_change(struct i915_request *rq, unsigned long status)
 				   status, rq);
 }

+static void intel_engine_context_in(struct intel_engine_cs *engine)
+{
+	unsigned long flags;
+
+	if (READ_ONCE(engine->stats.enabled) == 0)
+		return;
+
+	write_seqlock_irqsave(&engine->stats.lock, flags);
+
+	if (engine->stats.enabled > 0) {
+		if (engine->stats.active++ == 0)
+			engine->stats.start = ktime_get();
+		GEM_BUG_ON(engine->stats.active == 0);
+	}
+
+	write_sequnlock_irqrestore(&engine->stats.lock, flags);
+}
+
+static void intel_engine_context_out(struct intel_engine_cs *engine)
+{
+	unsigned long flags;
+
+	if (READ_ONCE(engine->stats.enabled) == 0)
+		return;
+
+	write_seqlock_irqsave(&engine->stats.lock, flags);
+
+	if (engine->stats.enabled > 0) {
+		ktime_t last;
+
+		if (engine->stats.active && --engine->stats.active == 0) {
+			/*
+			 * Decrement the active context count and in case GPU
+			 * is now idle add up to the running total.
+			 */
+			last = ktime_sub(ktime_get(), engine->stats.start);
+
+			engine->stats.total = ktime_add(engine->stats.total,
+							last);
+		} else if (engine->stats.active == 0) {
+			/*
+			 * After turning on engine stats, context out might be
+			 * the first event in which case we account from the
+			 * time stats gathering was turned on.
+			 */
+			last = ktime_sub(ktime_get(), engine->stats.enabled_at);
+
+			engine->stats.total = ktime_add(engine->stats.total,
+							last);
+		}
+	}
+
+	write_sequnlock_irqrestore(&engine->stats.lock, flags);
+}
+
 static inline struct intel_engine_cs *
 __execlists_schedule_in(struct i915_request *rq)
 {
@ -982,6 +1047,59 @@ static void kick_siblings(struct i915_request *rq, struct intel_context *ce)
 		tasklet_schedule(&ve->base.execlists.tasklet);
 }

+static void restore_default_state(struct intel_context *ce,
+				  struct intel_engine_cs *engine)
+{
+	u32 *regs = ce->lrc_reg_state;
+
+	if (engine->pinned_default_state)
+		memcpy(regs, /* skip restoring the vanilla PPHWSP */
+		       engine->pinned_default_state + LRC_STATE_PN * PAGE_SIZE,
+		       engine->context_size - PAGE_SIZE);
+
+	execlists_init_reg_state(regs, ce, engine, ce->ring, false);
+}
+
+static void reset_active(struct i915_request *rq,
+			 struct intel_engine_cs *engine)
+{
+	struct intel_context * const ce = rq->hw_context;
+	u32 head;
+
+	/*
+	 * The executing context has been cancelled. We want to prevent
+	 * further execution along this context and propagate the error on
+	 * to anything depending on its results.
+	 *
+	 * In __i915_request_submit(), we apply the -EIO and remove the
+	 * requests' payloads for any banned requests. But first, we must
+	 * rewind the context back to the start of the incomplete request so
+	 * that we do not jump back into the middle of the batch.
+	 *
+	 * We preserve the breadcrumbs and semaphores of the incomplete
+	 * requests so that inter-timeline dependencies (i.e other timelines)
+	 * remain correctly ordered. And we defer to __i915_request_submit()
+	 * so that all asynchronous waits are correctly handled.
+	 */
+	GEM_TRACE("%s(%s): { rq=%llx:%lld }\n",
+		  __func__, engine->name, rq->fence.context, rq->fence.seqno);
+
+	/* On resubmission of the active request, payload will be scrubbed */
+	if (i915_request_completed(rq))
+		head = rq->tail;
+	else
+		head = active_request(ce->timeline, rq)->head;
+	ce->ring->head = intel_ring_wrap(ce->ring, head);
+	intel_ring_update_space(ce->ring);
+
+	/* Scrub the context image to prevent replaying the previous batch */
+	restore_default_state(ce, engine);
+	__execlists_update_reg_state(ce, engine);
+
+	/* We've switched away, so this should be a no-op, but intent matters */
+	ce->lrc_desc |= CTX_DESC_FORCE_RESTORE;
+}
+
 static inline void
 __execlists_schedule_out(struct i915_request *rq,
 			 struct intel_engine_cs * const engine)
@ -992,6 +1110,9 @@ __execlists_schedule_out(struct i915_request *rq,
 	execlists_context_status_change(rq, INTEL_CONTEXT_SCHEDULE_OUT);
 	intel_gt_pm_put(engine->gt);

+	if (unlikely(i915_gem_context_is_banned(ce->gem_context)))
+		reset_active(rq, engine);
+
 	/*
 	 * If this is part of a virtual engine, its next request may
 	 * have been blocked waiting for access to the active context.
@ -1345,7 +1466,7 @@ need_timeslice(struct intel_engine_cs *engine, const struct i915_request *rq)
 {
 	int hint;

-	if (!intel_engine_has_semaphores(engine))
+	if (!intel_engine_has_timeslices(engine))
 		return false;

 	if (list_is_last(&rq->sched.link, &engine->active.requests))
@ -1366,15 +1487,32 @@ switch_prio(struct intel_engine_cs *engine, const struct i915_request *rq)
 	return rq_prio(list_next_entry(rq, sched.link));
 }

-static bool
-enable_timeslice(const struct intel_engine_execlists *execlists)
+static inline unsigned long
+timeslice(const struct intel_engine_cs *engine)
 {
-	const struct i915_request *rq = *execlists->active;
+	return READ_ONCE(engine->props.timeslice_duration_ms);
+}
+
+static unsigned long
+active_timeslice(const struct intel_engine_cs *engine)
+{
+	const struct i915_request *rq = *engine->execlists.active;

 	if (i915_request_completed(rq))
-		return false;
+		return 0;

-	return execlists->switch_priority_hint >= effective_prio(rq);
+	if (engine->execlists.switch_priority_hint < effective_prio(rq))
+		return 0;
+
+	return timeslice(engine);
+}
+
+static void set_timeslice(struct intel_engine_cs *engine)
+{
+	if (!intel_engine_has_timeslices(engine))
+		return;
+
+	set_timer_ms(&engine->execlists.timer, active_timeslice(engine));
 }

 static void record_preemption(struct intel_engine_execlists *execlists)
@ -1382,6 +1520,30 @@ static void record_preemption(struct intel_engine_execlists *execlists)
 	(void)I915_SELFTEST_ONLY(execlists->preempt_hang.count++);
 }

+static unsigned long active_preempt_timeout(struct intel_engine_cs *engine)
+{
+	struct i915_request *rq;
+
+	rq = last_active(&engine->execlists);
+	if (!rq)
+		return 0;
+
+	/* Force a fast reset for terminated contexts (ignoring sysfs!) */
+	if (unlikely(i915_gem_context_is_banned(rq->gem_context)))
+		return 1;
+
+	return READ_ONCE(engine->props.preempt_timeout_ms);
+}
+
+static void set_preempt_timeout(struct intel_engine_cs *engine)
+{
+	if (!intel_engine_has_preempt_reset(engine))
+		return;
+
+	set_timer_ms(&engine->execlists.preempt,
+		     active_preempt_timeout(engine));
+}
+
 static void execlists_dequeue(struct intel_engine_cs *engine)
 {
 	struct intel_engine_execlists * const execlists = &engine->execlists;
@ -1521,8 +1683,9 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
 				 */
 				if (!execlists->timer.expires &&
 				    need_timeslice(engine, last))
-					mod_timer(&execlists->timer,
-						  jiffies + 1);
+					set_timer_ms(&execlists->timer,
+						     timeslice(engine));
+
 				return;
 			}

@ -1757,6 +1920,8 @@ done:

 		memset(port + 1, 0, (last_port - port) * sizeof(*port));
 		execlists_submit_ports(engine);
+
+		set_preempt_timeout(engine);
 	} else {
 skip_submit:
 		ring_set_paused(engine, 0);
@ -1867,7 +2032,7 @@ static void process_csb(struct intel_engine_cs *engine)
 	 */
 	GEM_BUG_ON(!tasklet_is_locked(&execlists->tasklet) &&
 		   !reset_in_progress(execlists));
-	GEM_BUG_ON(USES_GUC_SUBMISSION(engine->i915));
+	GEM_BUG_ON(!intel_engine_in_execlists_submission_mode(engine));

 	/*
 	 * Note that csb_write, csb_status may be either in HWSP or mmio.
@ -1944,10 +2109,7 @@ static void process_csb(struct intel_engine_cs *engine)
 				       execlists_num_ports(execlists) *
 				       sizeof(*execlists->pending));

-			if (enable_timeslice(execlists))
-				mod_timer(&execlists->timer, jiffies + 1);
-			else
-				cancel_timer(&execlists->timer);
+			set_timeslice(engine);

 			WRITE_ONCE(execlists->pending[0], NULL);
 		} else {
@ -1997,6 +2159,43 @@ static void __execlists_submission_tasklet(struct intel_engine_cs *const engine)
 	}
 }

+static noinline void preempt_reset(struct intel_engine_cs *engine)
+{
+	const unsigned int bit = I915_RESET_ENGINE + engine->id;
+	unsigned long *lock = &engine->gt->reset.flags;
+
+	if (i915_modparams.reset < 3)
+		return;
+
+	if (test_and_set_bit(bit, lock))
+		return;
+
+	/* Mark this tasklet as disabled to avoid waiting for it to complete */
+	tasklet_disable_nosync(&engine->execlists.tasklet);
+
+	GEM_TRACE("%s: preempt timeout %lu+%ums\n",
+		  engine->name,
+		  READ_ONCE(engine->props.preempt_timeout_ms),
+		  jiffies_to_msecs(jiffies - engine->execlists.preempt.expires));
+	intel_engine_reset(engine, "preemption time out");
+
+	tasklet_enable(&engine->execlists.tasklet);
+	clear_and_wake_up_bit(bit, lock);
+}
+
+static bool preempt_timeout(const struct intel_engine_cs *const engine)
+{
+	const struct timer_list *t = &engine->execlists.preempt;
+
+	if (!CONFIG_DRM_I915_PREEMPT_TIMEOUT)
+		return false;
+
+	if (!timer_expired(t))
+		return false;
+
+	return READ_ONCE(engine->execlists.pending[0]);
+}
+
 /*
 * Check the unread Context Status Buffers and manage the submission of new
 * contexts to the ELSP accordingly.
@ -2004,23 +2203,39 @@ static void __execlists_submission_tasklet(struct intel_engine_cs *const engine)
 static void execlists_submission_tasklet(unsigned long data)
 {
 	struct intel_engine_cs * const engine = (struct intel_engine_cs *)data;
-	unsigned long flags;
+	bool timeout = preempt_timeout(engine);

 	process_csb(engine);
-	if (!READ_ONCE(engine->execlists.pending[0])) {
+	if (!READ_ONCE(engine->execlists.pending[0]) || timeout) {
+		unsigned long flags;
+
 		spin_lock_irqsave(&engine->active.lock, flags);
 		__execlists_submission_tasklet(engine);
 		spin_unlock_irqrestore(&engine->active.lock, flags);
+
+		/* Recheck after serialising with direct-submission */
+		if (timeout && preempt_timeout(engine))
+			preempt_reset(engine);
 	}
 }

-static void execlists_submission_timer(struct timer_list *timer)
+static void __execlists_kick(struct intel_engine_execlists *execlists)
 {
-	struct intel_engine_cs *engine =
-		from_timer(engine, timer, execlists.timer);
-
 	/* Kick the tasklet for some interrupt coalescing and reset handling */
-	tasklet_hi_schedule(&engine->execlists.tasklet);
+	tasklet_hi_schedule(&execlists->tasklet);
+}
+
+#define execlists_kick(t, member) \
+	__execlists_kick(container_of(t, struct intel_engine_execlists, member))
+
+static void execlists_timeslice(struct timer_list *timer)
+{
+	execlists_kick(timer, timer);
+}
+
+static void execlists_preempt(struct timer_list *timer)
+{
+	execlists_kick(timer, preempt);
 }

 static void queue_request(struct intel_engine_cs *engine,
@ -2100,7 +2315,6 @@ set_redzone(void *vaddr, const struct intel_engine_cs *engine)
 	if (!IS_ENABLED(CONFIG_DRM_I915_DEBUG_GEM))
 		return;

-	vaddr += LRC_HEADER_PAGES * PAGE_SIZE;
 	vaddr += engine->context_size;

 	memset(vaddr, POISON_INUSE, I915_GTT_PAGE_SIZE);
@ -2112,7 +2326,6 @@ check_redzone(const void *vaddr, const struct intel_engine_cs *engine)
 	if (!IS_ENABLED(CONFIG_DRM_I915_DEBUG_GEM))
 		return;

-	vaddr += LRC_HEADER_PAGES * PAGE_SIZE;
 	vaddr += engine->context_size;

 	if (memchr_inv(vaddr, POISON_INUSE, I915_GTT_PAGE_SIZE))
@ -2727,37 +2940,28 @@ static void reset_csb_pointers(struct intel_engine_cs *engine)
 			       &execlists->csb_status[reset_value]);
 }

-static struct i915_request *active_request(struct i915_request *rq)
+static int lrc_ring_mi_mode(const struct intel_engine_cs *engine)
 {
-	const struct intel_context * const ce = rq->hw_context;
-	struct i915_request *active = NULL;
-	struct list_head *list;
-
-	if (!i915_request_is_active(rq)) /* unwound, but incomplete! */
-		return rq;
-
-	list = &i915_request_active_timeline(rq)->requests;
-	list_for_each_entry_from_reverse(rq, list, link) {
-		if (i915_request_completed(rq))
-			break;
-
-		if (rq->hw_context != ce)
-			break;
-
-		active = rq;
-	}
-
-	return active;
+	if (INTEL_GEN(engine->i915) >= 12)
+		return 0x60;
+	else if (INTEL_GEN(engine->i915) >= 9)
+		return 0x54;
+	else if (engine->class == RENDER_CLASS)
+		return 0x58;
+	else
+		return -1;
 }

 static void __execlists_reset_reg_state(const struct intel_context *ce,
 					const struct intel_engine_cs *engine)
 {
 	u32 *regs = ce->lrc_reg_state;
+	int x;

-	if (INTEL_GEN(engine->i915) >= 9) {
-		regs[GEN9_CTX_RING_MI_MODE + 1] &= ~STOP_RING;
-		regs[GEN9_CTX_RING_MI_MODE + 1] |= STOP_RING << 16;
+	x = lrc_ring_mi_mode(engine);
+	if (x != -1) {
+		regs[x + 1] &= ~STOP_RING;
+		regs[x + 1] |= STOP_RING << 16;
 	}
 }

@ -2766,7 +2970,6 @@ static void __execlists_reset(struct intel_engine_cs *engine, bool stalled)
 	struct intel_engine_execlists * const execlists = &engine->execlists;
 	struct intel_context *ce;
 	struct i915_request *rq;
-	u32 *regs;

 	mb(); /* paranoia: read the CSB pointers from after the reset */
 	clflush(execlists->csb_write);
@ -2792,19 +2995,17 @@ static void __execlists_reset(struct intel_engine_cs *engine, bool stalled)
 	ce = rq->hw_context;
 	GEM_BUG_ON(!i915_vma_is_pinned(ce->state));

-	/* Proclaim we have exclusive access to the context image! */
-	__context_pin_acquire(ce);
-
-	rq = active_request(rq);
-	if (!rq) {
+	if (i915_request_completed(rq)) {
 		/* Idle context; tidy up the ring so we can restart afresh */
-		ce->ring->head = ce->ring->tail;
+		ce->ring->head = intel_ring_wrap(ce->ring, rq->tail);
 		goto out_replay;
 	}

 	/* Context has requests still in-flight; it should not be idle! */
 	GEM_BUG_ON(i915_active_is_idle(&ce->active));
+	rq = active_request(ce->timeline, rq);
 	ce->ring->head = intel_ring_wrap(ce->ring, rq->head);
+	GEM_BUG_ON(ce->ring->head == ce->ring->tail);

 	/*
 	 * If this request hasn't started yet, e.g. it is waiting on a
@ -2845,22 +3046,15 @@ static void __execlists_reset(struct intel_engine_cs *engine, bool stalled)
 	 * to recreate its own state.
 	 */
 	GEM_BUG_ON(!intel_context_is_pinned(ce));
-	regs = ce->lrc_reg_state;
-	if (engine->pinned_default_state) {
-		memcpy(regs, /* skip restoring the vanilla PPHWSP */
-		       engine->pinned_default_state + LRC_STATE_PN * PAGE_SIZE,
-		       engine->context_size - PAGE_SIZE);
-	}
-	execlists_init_reg_state(regs, ce, engine, ce->ring, false);
+	restore_default_state(ce, engine);

 out_replay:
-	GEM_TRACE("%s replay {head:%04x, tail:%04x\n",
+	GEM_TRACE("%s replay {head:%04x, tail:%04x}\n",
 		  engine->name, ce->ring->head, ce->ring->tail);
 	intel_ring_update_space(ce->ring);
 	__execlists_reset_reg_state(ce, engine);
 	__execlists_update_reg_state(ce, engine);
 	ce->lrc_desc |= CTX_DESC_FORCE_RESTORE; /* paranoid: GPU was reset! */
-	__context_pin_release(ce);

 unwind:
 	/* Push back any incomplete requests for replay after the reset. */
@ -3469,6 +3663,7 @@ gen12_emit_fini_breadcrumb_rcs(struct i915_request *request, u32 *cs)
 static void execlists_park(struct intel_engine_cs *engine)
 {
 	cancel_timer(&engine->execlists.timer);
+	cancel_timer(&engine->execlists.preempt);
 }

 void intel_execlists_set_default_submission(struct intel_engine_cs *engine)
@ -3586,7 +3781,8 @@ int intel_execlists_submission_setup(struct intel_engine_cs *engine)
 {
 	tasklet_init(&engine->execlists.tasklet,
 		     execlists_submission_tasklet, (unsigned long)engine);
-	timer_setup(&engine->execlists.timer, execlists_submission_timer, 0);
+	timer_setup(&engine->execlists.timer, execlists_timeslice, 0);
+	timer_setup(&engine->execlists.preempt, execlists_preempt, 0);

 	logical_ring_default_vfuncs(engine);
 	logical_ring_default_irqs(engine);
@ -3796,12 +3992,6 @@ populate_lr_context(struct intel_context *ce,
 	set_redzone(vaddr, engine);

 	if (engine->default_state) {
-		/*
-		 * We only want to copy over the template context state;
-		 * skipping over the headers reserved for GuC communication,
-		 * leaving those as zero.
-		 */
-		const unsigned long start = LRC_HEADER_PAGES * PAGE_SIZE;
 		void *defaults;

 		defaults = i915_gem_object_pin_map(engine->default_state,
@ -3811,7 +4001,7 @@ populate_lr_context(struct intel_context *ce,
 			goto err_unpin_ctx;
 		}

-		memcpy(vaddr + start, defaults + start, engine->context_size);
+		memcpy(vaddr, defaults, engine->context_size);
 		i915_gem_object_unpin_map(engine->default_state);
 		inhibit = false;
 	}
@ -3826,9 +4016,7 @@ populate_lr_context(struct intel_context *ce,

 	ret = 0;
 err_unpin_ctx:
-	__i915_gem_object_flush_map(ctx_obj,
-				    LRC_HEADER_PAGES * PAGE_SIZE,
-				    engine->context_size);
+	__i915_gem_object_flush_map(ctx_obj, 0, engine->context_size);
 	i915_gem_object_unpin_map(ctx_obj);
 	return ret;
 }
@ -3845,11 +4033,6 @@ static int __execlists_context_alloc(struct intel_context *ce,
 	GEM_BUG_ON(ce->state);
 	context_size = round_up(engine->context_size, I915_GTT_PAGE_SIZE);

-	/*
-	 * Before the actual start of the context image, we insert a few pages
-	 * for our own use and for sharing with the GuC.
-	 */
-	context_size += LRC_HEADER_PAGES * PAGE_SIZE;
 	if (IS_ENABLED(CONFIG_DRM_I915_DEBUG_GEM))
 		context_size += I915_GTT_PAGE_SIZE; /* for redzone */

@ -4502,7 +4685,6 @@ void intel_lr_context_reset(struct intel_engine_cs *engine,
 			    bool scrub)
 {
 	GEM_BUG_ON(!intel_context_is_pinned(ce));
-	__context_pin_acquire(ce);

 	/*
 	 * We want a simple context + ring to execute the breadcrumb update.
@ -4512,23 +4694,21 @@ void intel_lr_context_reset(struct intel_engine_cs *engine,
 	 * future request will be after userspace has had the opportunity
 	 * to recreate its own state.
 	 */
-	if (scrub) {
-		u32 *regs = ce->lrc_reg_state;
-
-		if (engine->pinned_default_state) {
-			memcpy(regs, /* skip restoring the vanilla PPHWSP */
-			       engine->pinned_default_state + LRC_STATE_PN * PAGE_SIZE,
-			       engine->context_size - PAGE_SIZE);
-		}
-		execlists_init_reg_state(regs, ce, engine, ce->ring, false);
-	}
+	if (scrub)
+		restore_default_state(ce, engine);

 	/* Rerun the request; its payload has been neutered (if guilty). */
 	ce->ring->head = head;
 	intel_ring_update_space(ce->ring);

 	__execlists_update_reg_state(ce, engine);
-	__context_pin_release(ce);
+}
+
+bool
+intel_engine_in_execlists_submission_mode(const struct intel_engine_cs *engine)
+{
+	return engine->set_default_submission ==
+	       intel_execlists_set_default_submission;
 }

 #if IS_ENABLED(CONFIG_DRM_I915_SELFTEST)
--- a/drivers/gpu/drm/i915/gt/intel_lrc.h
+++ b/drivers/gpu/drm/i915/gt/intel_lrc.h
@ -43,6 +43,7 @@ struct intel_engine_cs;
 #define	  CTX_CTRL_ENGINE_CTX_RESTORE_INHIBIT	(1 << 0)
 #define   CTX_CTRL_RS_CTX_ENABLE		(1 << 1)
 #define	  CTX_CTRL_ENGINE_CTX_SAVE_INHIBIT	(1 << 2)
+#define	  GEN12_CTX_CTRL_OAR_CONTEXT_ENABLE	(1 << 8)
 #define RING_CONTEXT_STATUS_PTR(base)		_MMIO((base) + 0x3a0)
 #define RING_EXECLIST_SQ_CONTENTS(base)		_MMIO((base) + 0x510)
 #define RING_EXECLIST_CONTROL(base)		_MMIO((base) + 0x550)
@ -85,31 +86,12 @@ int intel_execlists_submission_setup(struct intel_engine_cs *engine);
 int intel_execlists_submission_init(struct intel_engine_cs *engine);

 /* Logical Ring Contexts */
-
-/*
- * We allocate a header at the start of the context image for our own
- * use, therefore the actual location of the logical state is offset
- * from the start of the VMA. The layout is
- *
- * | [guc]          | [hwsp] [logical state] |
- * |<- our header ->|<- context image      ->|
- *
- */
-/* The first page is used for sharing data with the GuC */
-#define LRC_GUCSHR_PN	(0)
-#define LRC_GUCSHR_SZ	(1)
 /* At the start of the context image is its per-process HWS page */
-#define LRC_PPHWSP_PN	(LRC_GUCSHR_PN + LRC_GUCSHR_SZ)
+#define LRC_PPHWSP_PN	(0)
 #define LRC_PPHWSP_SZ	(1)
-/* Finally we have the logical state for the context */
+/* After the PPHWSP we have the logical state for the context */
 #define LRC_STATE_PN	(LRC_PPHWSP_PN + LRC_PPHWSP_SZ)

-/*
- * Currently we include the PPHWSP in __intel_engine_context_size() so
- * the size of the header is synonymous with the start of the PPHWSP.
- */
-#define LRC_HEADER_PAGES LRC_PPHWSP_PN
-
 /* Space within PPHWSP reserved to be used as scratch */
 #define LRC_PPHWSP_SCRATCH		0x34
 #define LRC_PPHWSP_SCRATCH_ADDR		(LRC_PPHWSP_SCRATCH * sizeof(u32))
@ -145,4 +127,7 @@ struct intel_engine_cs *
 intel_virtual_engine_get_sibling(struct intel_engine_cs *engine,
 				 unsigned int sibling);

+bool
+intel_engine_in_execlists_submission_mode(const struct intel_engine_cs *engine);
+
 #endif /* _INTEL_LRC_H_ */
--- a/drivers/gpu/drm/i915/gt/intel_mocs.c
+++ b/drivers/gpu/drm/i915/gt/intel_mocs.c
@ -26,6 +26,7 @@
 #include "intel_gt.h"
 #include "intel_mocs.h"
 #include "intel_lrc.h"
+#include "intel_ring.h"

 /* structures required */
 struct drm_i915_mocs_entry {
@ -461,6 +462,12 @@ static void intel_mocs_init_global(struct intel_gt *gt)
 	struct drm_i915_mocs_table table;
 	unsigned int index;

+	/*
+	 * LLC and eDRAM control values are not applicable to dgfx
+	 */
+	if (IS_DGFX(gt->i915))
+		return;
+
 	GEM_BUG_ON(!HAS_GLOBAL_MOCS_REGISTERS(gt->i915));

 	if (!get_mocs_settings(gt->i915, &table))
--- a/drivers/gpu/drm/i915/gt/intel_renderstate.c
+++ b/drivers/gpu/drm/i915/gt/intel_renderstate.c
@ -27,6 +27,7 @@

 #include "i915_drv.h"
 #include "intel_renderstate.h"
+#include "intel_ring.h"

 struct intel_renderstate {
 	const struct intel_renderstate_rodata *rodata;
--- a/drivers/gpu/drm/i915/gt/intel_reset.c
+++ b/drivers/gpu/drm/i915/gt/intel_reset.c
@ -1024,8 +1024,6 @@ void intel_gt_reset(struct intel_gt *gt,
 	if (ret)
 		goto taint;

-	intel_gt_queue_hangcheck(gt);
-
 finish:
 	reset_finish(gt, awake);
 unlock:
@ -1353,4 +1351,5 @@ void __intel_fini_wedge(struct intel_wedge_me *w)

 #if IS_ENABLED(CONFIG_DRM_I915_SELFTEST)
 #include "selftest_reset.c"
+#include "selftest_hangcheck.c"
 #endif
--- a/drivers/gpu/drm/i915/gt/intel_ring.c
+++ b/drivers/gpu/drm/i915/gt/intel_ring.c
@ -0,0 +1,323 @@
+/*
+ * SPDX-License-Identifier: MIT
+ *
+ * Copyright © 2019 Intel Corporation
+ */
+
+#include "gem/i915_gem_object.h"
+#include "i915_drv.h"
+#include "i915_vma.h"
+#include "intel_engine.h"
+#include "intel_ring.h"
+#include "intel_timeline.h"
+
+unsigned int intel_ring_update_space(struct intel_ring *ring)
+{
+	unsigned int space;
+
+	space = __intel_ring_space(ring->head, ring->emit, ring->size);
+
+	ring->space = space;
+	return space;
+}
+
+int intel_ring_pin(struct intel_ring *ring)
+{
+	struct i915_vma *vma = ring->vma;
+	unsigned int flags;
+	void *addr;
+	int ret;
+
+	if (atomic_fetch_inc(&ring->pin_count))
+		return 0;
+
+	flags = PIN_GLOBAL;
+
+	/* Ring wraparound at offset 0 sometimes hangs. No idea why. */
+	flags |= PIN_OFFSET_BIAS | i915_ggtt_pin_bias(vma);
+
+	if (vma->obj->stolen)
+		flags |= PIN_MAPPABLE;
+	else
+		flags |= PIN_HIGH;
+
+	ret = i915_vma_pin(vma, 0, 0, flags);
+	if (unlikely(ret))
+		goto err_unpin;
+
+	if (i915_vma_is_map_and_fenceable(vma))
+		addr = (void __force *)i915_vma_pin_iomap(vma);
+	else
+		addr = i915_gem_object_pin_map(vma->obj,
+					       i915_coherent_map_type(vma->vm->i915));
+	if (IS_ERR(addr)) {
+		ret = PTR_ERR(addr);
+		goto err_ring;
+	}
+
+	i915_vma_make_unshrinkable(vma);
+
+	GEM_BUG_ON(ring->vaddr);
+	ring->vaddr = addr;
+
+	return 0;
+
+err_ring:
+	i915_vma_unpin(vma);
+err_unpin:
+	atomic_dec(&ring->pin_count);
+	return ret;
+}
+
+void intel_ring_reset(struct intel_ring *ring, u32 tail)
+{
+	tail = intel_ring_wrap(ring, tail);
+	ring->tail = tail;
+	ring->head = tail;
+	ring->emit = tail;
+	intel_ring_update_space(ring);
+}
+
+void intel_ring_unpin(struct intel_ring *ring)
+{
+	struct i915_vma *vma = ring->vma;
+
+	if (!atomic_dec_and_test(&ring->pin_count))
+		return;
+
+	/* Discard any unused bytes beyond that submitted to hw. */
+	intel_ring_reset(ring, ring->emit);
+
+	i915_vma_unset_ggtt_write(vma);
+	if (i915_vma_is_map_and_fenceable(vma))
+		i915_vma_unpin_iomap(vma);
+	else
+		i915_gem_object_unpin_map(vma->obj);
+
+	GEM_BUG_ON(!ring->vaddr);
+	ring->vaddr = NULL;
+
+	i915_vma_unpin(vma);
+	i915_vma_make_purgeable(vma);
+}
+
+static struct i915_vma *create_ring_vma(struct i915_ggtt *ggtt, int size)
+{
+	struct i915_address_space *vm = &ggtt->vm;
+	struct drm_i915_private *i915 = vm->i915;
+	struct drm_i915_gem_object *obj;
+	struct i915_vma *vma;
+
+	obj = ERR_PTR(-ENODEV);
+	if (i915_ggtt_has_aperture(ggtt))
+		obj = i915_gem_object_create_stolen(i915, size);
+	if (IS_ERR(obj))
+		obj = i915_gem_object_create_internal(i915, size);
+	if (IS_ERR(obj))
+		return ERR_CAST(obj);
+
+	/*
+	 * Mark ring buffers as read-only from GPU side (so no stray overwrites)
+	 * if supported by the platform's GGTT.
+	 */
+	if (vm->has_read_only)
+		i915_gem_object_set_readonly(obj);
+
+	vma = i915_vma_instance(obj, vm, NULL);
+	if (IS_ERR(vma))
+		goto err;
+
+	return vma;
+
+err:
+	i915_gem_object_put(obj);
+	return vma;
+}
+
+struct intel_ring *
+intel_engine_create_ring(struct intel_engine_cs *engine, int size)
+{
+	struct drm_i915_private *i915 = engine->i915;
+	struct intel_ring *ring;
+	struct i915_vma *vma;
+
+	GEM_BUG_ON(!is_power_of_2(size));
+	GEM_BUG_ON(RING_CTL_SIZE(size) & ~RING_NR_PAGES);
+
+	ring = kzalloc(sizeof(*ring), GFP_KERNEL);
+	if (!ring)
+		return ERR_PTR(-ENOMEM);
+
+	kref_init(&ring->ref);
+	ring->size = size;
+
+	/*
+	 * Workaround an erratum on the i830 which causes a hang if
+	 * the TAIL pointer points to within the last 2 cachelines
+	 * of the buffer.
+	 */
+	ring->effective_size = size;
+	if (IS_I830(i915) || IS_I845G(i915))
+		ring->effective_size -= 2 * CACHELINE_BYTES;
+
+	intel_ring_update_space(ring);
+
+	vma = create_ring_vma(engine->gt->ggtt, size);
+	if (IS_ERR(vma)) {
+		kfree(ring);
+		return ERR_CAST(vma);
+	}
+	ring->vma = vma;
+
+	return ring;
+}
+
+void intel_ring_free(struct kref *ref)
+{
+	struct intel_ring *ring = container_of(ref, typeof(*ring), ref);
+
+	i915_vma_put(ring->vma);
+	kfree(ring);
+}
+
+static noinline int
+wait_for_space(struct intel_ring *ring,
+	       struct intel_timeline *tl,
+	       unsigned int bytes)
+{
+	struct i915_request *target;
+	long timeout;
+
+	if (intel_ring_update_space(ring) >= bytes)
+		return 0;
+
+	GEM_BUG_ON(list_empty(&tl->requests));
+	list_for_each_entry(target, &tl->requests, link) {
+		if (target->ring != ring)
+			continue;
+
+		/* Would completion of this request free enough space? */
+		if (bytes <= __intel_ring_space(target->postfix,
+						ring->emit, ring->size))
+			break;
+	}
+
+	if (GEM_WARN_ON(&target->link == &tl->requests))
+		return -ENOSPC;
+
+	timeout = i915_request_wait(target,
+				    I915_WAIT_INTERRUPTIBLE,
+				    MAX_SCHEDULE_TIMEOUT);
+	if (timeout < 0)
+		return timeout;
+
+	i915_request_retire_upto(target);
+
+	intel_ring_update_space(ring);
+	GEM_BUG_ON(ring->space < bytes);
+	return 0;
+}
+
+u32 *intel_ring_begin(struct i915_request *rq, unsigned int num_dwords)
+{
+	struct intel_ring *ring = rq->ring;
+	const unsigned int remain_usable = ring->effective_size - ring->emit;
+	const unsigned int bytes = num_dwords * sizeof(u32);
+	unsigned int need_wrap = 0;
+	unsigned int total_bytes;
+	u32 *cs;
+
+	/* Packets must be qword aligned. */
+	GEM_BUG_ON(num_dwords & 1);
+
+	total_bytes = bytes + rq->reserved_space;
+	GEM_BUG_ON(total_bytes > ring->effective_size);
+
+	if (unlikely(total_bytes > remain_usable)) {
+		const int remain_actual = ring->size - ring->emit;
+
+		if (bytes > remain_usable) {
+			/*
+			 * Not enough space for the basic request. So need to
+			 * flush out the remainder and then wait for
+			 * base + reserved.
+			 */
+			total_bytes += remain_actual;
+			need_wrap = remain_actual | 1;
+		} else  {
+			/*
+			 * The base request will fit but the reserved space
+			 * falls off the end. So we don't need an immediate
+			 * wrap and only need to effectively wait for the
+			 * reserved size from the start of ringbuffer.
+			 */
+			total_bytes = rq->reserved_space + remain_actual;
+		}
+	}
+
+	if (unlikely(total_bytes > ring->space)) {
+		int ret;
+
+		/*
+		 * Space is reserved in the ringbuffer for finalising the
+		 * request, as that cannot be allowed to fail. During request
+		 * finalisation, reserved_space is set to 0 to stop the
+		 * overallocation and the assumption is that then we never need
+		 * to wait (which has the risk of failing with EINTR).
+		 *
+		 * See also i915_request_alloc() and i915_request_add().
+		 */
+		GEM_BUG_ON(!rq->reserved_space);
+
+		ret = wait_for_space(ring,
+				     i915_request_timeline(rq),
+				     total_bytes);
+		if (unlikely(ret))
+			return ERR_PTR(ret);
+	}
+
+	if (unlikely(need_wrap)) {
+		need_wrap &= ~1;
+		GEM_BUG_ON(need_wrap > ring->space);
+		GEM_BUG_ON(ring->emit + need_wrap > ring->size);
+		GEM_BUG_ON(!IS_ALIGNED(need_wrap, sizeof(u64)));
+
+		/* Fill the tail with MI_NOOP */
+		memset64(ring->vaddr + ring->emit, 0, need_wrap / sizeof(u64));
+		ring->space -= need_wrap;
+		ring->emit = 0;
+	}
+
+	GEM_BUG_ON(ring->emit > ring->size - bytes);
+	GEM_BUG_ON(ring->space < bytes);
+	cs = ring->vaddr + ring->emit;
+	GEM_DEBUG_EXEC(memset32(cs, POISON_INUSE, bytes / sizeof(*cs)));
+	ring->emit += bytes;
+	ring->space -= bytes;
+
+	return cs;
+}
+
+/* Align the ring tail to a cacheline boundary */
+int intel_ring_cacheline_align(struct i915_request *rq)
+{
+	int num_dwords;
+	void *cs;
+
+	num_dwords = (rq->ring->emit & (CACHELINE_BYTES - 1)) / sizeof(u32);
+	if (num_dwords == 0)
+		return 0;
+
+	num_dwords = CACHELINE_DWORDS - num_dwords;
+	GEM_BUG_ON(num_dwords & 1);
+
+	cs = intel_ring_begin(rq, num_dwords);
+	if (IS_ERR(cs))
+		return PTR_ERR(cs);
+
+	memset64(cs, (u64)MI_NOOP << 32 | MI_NOOP, num_dwords / 2);
+	intel_ring_advance(rq, cs + num_dwords);
+
+	GEM_BUG_ON(rq->ring->emit & (CACHELINE_BYTES - 1));
+	return 0;
+}
--- a/drivers/gpu/drm/i915/gt/intel_ring.h
+++ b/drivers/gpu/drm/i915/gt/intel_ring.h
@ -0,0 +1,131 @@
+/*
+ * SPDX-License-Identifier: MIT
+ *
+ * Copyright © 2019 Intel Corporation
+ */
+
+#ifndef INTEL_RING_H
+#define INTEL_RING_H
+
+#include "i915_gem.h" /* GEM_BUG_ON */
+#include "i915_request.h"
+#include "intel_ring_types.h"
+
+struct intel_engine_cs;
+
+struct intel_ring *
+intel_engine_create_ring(struct intel_engine_cs *engine, int size);
+
+u32 *intel_ring_begin(struct i915_request *rq, unsigned int num_dwords);
+int intel_ring_cacheline_align(struct i915_request *rq);
+
+unsigned int intel_ring_update_space(struct intel_ring *ring);
+
+int intel_ring_pin(struct intel_ring *ring);
+void intel_ring_unpin(struct intel_ring *ring);
+void intel_ring_reset(struct intel_ring *ring, u32 tail);
+
+void intel_ring_free(struct kref *ref);
+
+static inline struct intel_ring *intel_ring_get(struct intel_ring *ring)
+{
+	kref_get(&ring->ref);
+	return ring;
+}
+
+static inline void intel_ring_put(struct intel_ring *ring)
+{
+	kref_put(&ring->ref, intel_ring_free);
+}
+
+static inline void intel_ring_advance(struct i915_request *rq, u32 *cs)
+{
+	/* Dummy function.
+	 *
+	 * This serves as a placeholder in the code so that the reader
+	 * can compare against the preceding intel_ring_begin() and
+	 * check that the number of dwords emitted matches the space
+	 * reserved for the command packet (i.e. the value passed to
+	 * intel_ring_begin()).
+	 */
+	GEM_BUG_ON((rq->ring->vaddr + rq->ring->emit) != cs);
+}
+
+static inline u32 intel_ring_wrap(const struct intel_ring *ring, u32 pos)
+{
+	return pos & (ring->size - 1);
+}
+
+static inline bool
+intel_ring_offset_valid(const struct intel_ring *ring,
+			unsigned int pos)
+{
+	if (pos & -ring->size) /* must be strictly within the ring */
+		return false;
+
+	if (!IS_ALIGNED(pos, 8)) /* must be qword aligned */
+		return false;
+
+	return true;
+}
+
+static inline u32 intel_ring_offset(const struct i915_request *rq, void *addr)
+{
+	/* Don't write ring->size (equivalent to 0) as that hangs some GPUs. */
+	u32 offset = addr - rq->ring->vaddr;
+	GEM_BUG_ON(offset > rq->ring->size);
+	return intel_ring_wrap(rq->ring, offset);
+}
+
+static inline void
+assert_ring_tail_valid(const struct intel_ring *ring, unsigned int tail)
+{
+	GEM_BUG_ON(!intel_ring_offset_valid(ring, tail));
+
+	/*
+	 * "Ring Buffer Use"
+	 *	Gen2 BSpec "1. Programming Environment" / 1.4.4.6
+	 *	Gen3 BSpec "1c Memory Interface Functions" / 2.3.4.5
+	 *	Gen4+ BSpec "1c Memory Interface and Command Stream" / 5.3.4.5
+	 * "If the Ring Buffer Head Pointer and the Tail Pointer are on the
+	 * same cacheline, the Head Pointer must not be greater than the Tail
+	 * Pointer."
+	 *
+	 * We use ring->head as the last known location of the actual RING_HEAD,
+	 * it may have advanced but in the worst case it is equally the same
+	 * as ring->head and so we should never program RING_TAIL to advance
+	 * into the same cacheline as ring->head.
+	 */
+#define cacheline(a) round_down(a, CACHELINE_BYTES)
+	GEM_BUG_ON(cacheline(tail) == cacheline(ring->head) &&
+		   tail < ring->head);
+#undef cacheline
+}
+
+static inline unsigned int
+intel_ring_set_tail(struct intel_ring *ring, unsigned int tail)
+{
+	/* Whilst writes to the tail are strictly order, there is no
+	 * serialisation between readers and the writers. The tail may be
+	 * read by i915_request_retire() just as it is being updated
+	 * by execlists, as although the breadcrumb is complete, the context
+	 * switch hasn't been seen.
+	 */
+	assert_ring_tail_valid(ring, tail);
+	ring->tail = tail;
+	return tail;
+}
+
+static inline unsigned int
+__intel_ring_space(unsigned int head, unsigned int tail, unsigned int size)
+{
+	/*
+	 * "If the Ring Buffer Head Pointer and the Tail Pointer are on the
+	 * same cacheline, the Head Pointer must not be greater than the Tail
+	 * Pointer."
+	 */
+	GEM_BUG_ON(!is_power_of_2(size));
+	return (head - tail - CACHELINE_BYTES) & (size - 1);
+}
+
+#endif /* INTEL_RING_H */
--- a/drivers/gpu/drm/i915/gt/intel_ring_submission.c
+++ b/drivers/gpu/drm/i915/gt/intel_ring_submission.c
@ -40,6 +40,7 @@
 #include "intel_gt_irq.h"
 #include "intel_gt_pm_irq.h"
 #include "intel_reset.h"
+#include "intel_ring.h"
 #include "intel_workarounds.h"

 /* Rough estimate of the typical request size, performing a flush,
@ -47,16 +48,6 @@
 */
 #define LEGACY_REQUEST_SIZE 200

-unsigned int intel_ring_update_space(struct intel_ring *ring)
-{
-	unsigned int space;
-
-	space = __intel_ring_space(ring->head, ring->emit, ring->size);
-
-	ring->space = space;
-	return space;
-}
-
 static int
 gen2_render_ring_flush(struct i915_request *rq, u32 mode)
 {
@ -1186,162 +1177,6 @@ i915_emit_bb_start(struct i915_request *rq,
 	return 0;
 }

-int intel_ring_pin(struct intel_ring *ring)
-{
-	struct i915_vma *vma = ring->vma;
-	unsigned int flags;
-	void *addr;
-	int ret;
-
-	if (atomic_fetch_inc(&ring->pin_count))
-		return 0;
-
-	flags = PIN_GLOBAL;
-
-	/* Ring wraparound at offset 0 sometimes hangs. No idea why. */
-	flags |= PIN_OFFSET_BIAS | i915_ggtt_pin_bias(vma);
-
-	if (vma->obj->stolen)
-		flags |= PIN_MAPPABLE;
-	else
-		flags |= PIN_HIGH;
-
-	ret = i915_vma_pin(vma, 0, 0, flags);
-	if (unlikely(ret))
-		goto err_unpin;
-
-	if (i915_vma_is_map_and_fenceable(vma))
-		addr = (void __force *)i915_vma_pin_iomap(vma);
-	else
-		addr = i915_gem_object_pin_map(vma->obj,
-					       i915_coherent_map_type(vma->vm->i915));
-	if (IS_ERR(addr)) {
-		ret = PTR_ERR(addr);
-		goto err_ring;
-	}
-
-	i915_vma_make_unshrinkable(vma);
-
-	GEM_BUG_ON(ring->vaddr);
-	ring->vaddr = addr;
-
-	return 0;
-
-err_ring:
-	i915_vma_unpin(vma);
-err_unpin:
-	atomic_dec(&ring->pin_count);
-	return ret;
-}
-
-void intel_ring_reset(struct intel_ring *ring, u32 tail)
-{
-	tail = intel_ring_wrap(ring, tail);
-	ring->tail = tail;
-	ring->head = tail;
-	ring->emit = tail;
-	intel_ring_update_space(ring);
-}
-
-void intel_ring_unpin(struct intel_ring *ring)
-{
-	struct i915_vma *vma = ring->vma;
-
-	if (!atomic_dec_and_test(&ring->pin_count))
-		return;
-
-	/* Discard any unused bytes beyond that submitted to hw. */
-	intel_ring_reset(ring, ring->emit);
-
-	i915_vma_unset_ggtt_write(vma);
-	if (i915_vma_is_map_and_fenceable(vma))
-		i915_vma_unpin_iomap(vma);
-	else
-		i915_gem_object_unpin_map(vma->obj);
-
-	GEM_BUG_ON(!ring->vaddr);
-	ring->vaddr = NULL;
-
-	i915_vma_unpin(vma);
-	i915_vma_make_purgeable(vma);
-}
-
-static struct i915_vma *create_ring_vma(struct i915_ggtt *ggtt, int size)
-{
-	struct i915_address_space *vm = &ggtt->vm;
-	struct drm_i915_private *i915 = vm->i915;
-	struct drm_i915_gem_object *obj;
-	struct i915_vma *vma;
-
-	obj = i915_gem_object_create_stolen(i915, size);
-	if (IS_ERR(obj))
-		obj = i915_gem_object_create_internal(i915, size);
-	if (IS_ERR(obj))
-		return ERR_CAST(obj);
-
-	/*
-	 * Mark ring buffers as read-only from GPU side (so no stray overwrites)
-	 * if supported by the platform's GGTT.
-	 */
-	if (vm->has_read_only)
-		i915_gem_object_set_readonly(obj);
-
-	vma = i915_vma_instance(obj, vm, NULL);
-	if (IS_ERR(vma))
-		goto err;
-
-	return vma;
-
-err:
-	i915_gem_object_put(obj);
-	return vma;
-}
-
-struct intel_ring *
-intel_engine_create_ring(struct intel_engine_cs *engine, int size)
-{
-	struct drm_i915_private *i915 = engine->i915;
-	struct intel_ring *ring;
-	struct i915_vma *vma;
-
-	GEM_BUG_ON(!is_power_of_2(size));
-	GEM_BUG_ON(RING_CTL_SIZE(size) & ~RING_NR_PAGES);
-
-	ring = kzalloc(sizeof(*ring), GFP_KERNEL);
-	if (!ring)
-		return ERR_PTR(-ENOMEM);
-
-	kref_init(&ring->ref);
-
-	ring->size = size;
-	/* Workaround an erratum on the i830 which causes a hang if
-	 * the TAIL pointer points to within the last 2 cachelines
-	 * of the buffer.
-	 */
-	ring->effective_size = size;
-	if (IS_I830(i915) || IS_I845G(i915))
-		ring->effective_size -= 2 * CACHELINE_BYTES;
-
-	intel_ring_update_space(ring);
-
-	vma = create_ring_vma(engine->gt->ggtt, size);
-	if (IS_ERR(vma)) {
-		kfree(ring);
-		return ERR_CAST(vma);
-	}
-	ring->vma = vma;
-
-	return ring;
-}
-
-void intel_ring_free(struct kref *ref)
-{
-	struct intel_ring *ring = container_of(ref, typeof(*ring), ref);
-
-	i915_vma_put(ring->vma);
-	kfree(ring);
-}
-
 static void __ring_context_fini(struct intel_context *ce)
 {
 	i915_vma_put(ce->state);
@ -1836,148 +1671,6 @@ static int ring_request_alloc(struct i915_request *request)
 	return 0;
 }

-static noinline int
-wait_for_space(struct intel_ring *ring,
-	       struct intel_timeline *tl,
-	       unsigned int bytes)
-{
-	struct i915_request *target;
-	long timeout;
-
-	if (intel_ring_update_space(ring) >= bytes)
-		return 0;
-
-	GEM_BUG_ON(list_empty(&tl->requests));
-	list_for_each_entry(target, &tl->requests, link) {
-		if (target->ring != ring)
-			continue;
-
-		/* Would completion of this request free enough space? */
-		if (bytes <= __intel_ring_space(target->postfix,
-						ring->emit, ring->size))
-			break;
-	}
-
-	if (GEM_WARN_ON(&target->link == &tl->requests))
-		return -ENOSPC;
-
-	timeout = i915_request_wait(target,
-				    I915_WAIT_INTERRUPTIBLE,
-				    MAX_SCHEDULE_TIMEOUT);
-	if (timeout < 0)
-		return timeout;
-
-	i915_request_retire_upto(target);
-
-	intel_ring_update_space(ring);
-	GEM_BUG_ON(ring->space < bytes);
-	return 0;
-}
-
-u32 *intel_ring_begin(struct i915_request *rq, unsigned int num_dwords)
-{
-	struct intel_ring *ring = rq->ring;
-	const unsigned int remain_usable = ring->effective_size - ring->emit;
-	const unsigned int bytes = num_dwords * sizeof(u32);
-	unsigned int need_wrap = 0;
-	unsigned int total_bytes;
-	u32 *cs;
-
-	/* Packets must be qword aligned. */
-	GEM_BUG_ON(num_dwords & 1);
-
-	total_bytes = bytes + rq->reserved_space;
-	GEM_BUG_ON(total_bytes > ring->effective_size);
-
-	if (unlikely(total_bytes > remain_usable)) {
-		const int remain_actual = ring->size - ring->emit;
-
-		if (bytes > remain_usable) {
-			/*
-			 * Not enough space for the basic request. So need to
-			 * flush out the remainder and then wait for
-			 * base + reserved.
-			 */
-			total_bytes += remain_actual;
-			need_wrap = remain_actual | 1;
-		} else  {
-			/*
-			 * The base request will fit but the reserved space
-			 * falls off the end. So we don't need an immediate
-			 * wrap and only need to effectively wait for the
-			 * reserved size from the start of ringbuffer.
-			 */
-			total_bytes = rq->reserved_space + remain_actual;
-		}
-	}
-
-	if (unlikely(total_bytes > ring->space)) {
-		int ret;
-
-		/*
-		 * Space is reserved in the ringbuffer for finalising the
-		 * request, as that cannot be allowed to fail. During request
-		 * finalisation, reserved_space is set to 0 to stop the
-		 * overallocation and the assumption is that then we never need
-		 * to wait (which has the risk of failing with EINTR).
-		 *
-		 * See also i915_request_alloc() and i915_request_add().
-		 */
-		GEM_BUG_ON(!rq->reserved_space);
-
-		ret = wait_for_space(ring,
-				     i915_request_timeline(rq),
-				     total_bytes);
-		if (unlikely(ret))
-			return ERR_PTR(ret);
-	}
-
-	if (unlikely(need_wrap)) {
-		need_wrap &= ~1;
-		GEM_BUG_ON(need_wrap > ring->space);
-		GEM_BUG_ON(ring->emit + need_wrap > ring->size);
-		GEM_BUG_ON(!IS_ALIGNED(need_wrap, sizeof(u64)));
-
-		/* Fill the tail with MI_NOOP */
-		memset64(ring->vaddr + ring->emit, 0, need_wrap / sizeof(u64));
-		ring->space -= need_wrap;
-		ring->emit = 0;
-	}
-
-	GEM_BUG_ON(ring->emit > ring->size - bytes);
-	GEM_BUG_ON(ring->space < bytes);
-	cs = ring->vaddr + ring->emit;
-	GEM_DEBUG_EXEC(memset32(cs, POISON_INUSE, bytes / sizeof(*cs)));
-	ring->emit += bytes;
-	ring->space -= bytes;
-
-	return cs;
-}
-
-/* Align the ring tail to a cacheline boundary */
-int intel_ring_cacheline_align(struct i915_request *rq)
-{
-	int num_dwords;
-	void *cs;
-
-	num_dwords = (rq->ring->emit & (CACHELINE_BYTES - 1)) / sizeof(u32);
-	if (num_dwords == 0)
-		return 0;
-
-	num_dwords = CACHELINE_DWORDS - num_dwords;
-	GEM_BUG_ON(num_dwords & 1);
-
-	cs = intel_ring_begin(rq, num_dwords);
-	if (IS_ERR(cs))
-		return PTR_ERR(cs);
-
-	memset64(cs, (u64)MI_NOOP << 32 | MI_NOOP, num_dwords / 2);
-	intel_ring_advance(rq, cs);
-
-	GEM_BUG_ON(rq->ring->emit & (CACHELINE_BYTES - 1));
-	return 0;
-}
-
 static void gen6_bsd_submit_request(struct i915_request *request)
 {
 	struct intel_uncore *uncore = request->engine->uncore;
--- a/drivers/gpu/drm/i915/gt/intel_ring_types.h
+++ b/drivers/gpu/drm/i915/gt/intel_ring_types.h
@ -0,0 +1,51 @@
+/*
+ * SPDX-License-Identifier: MIT
+ *
+ * Copyright © 2019 Intel Corporation
+ */
+
+#ifndef INTEL_RING_TYPES_H
+#define INTEL_RING_TYPES_H
+
+#include <linux/atomic.h>
+#include <linux/kref.h>
+#include <linux/types.h>
+
+/*
+ * Early gen2 devices have a cacheline of just 32 bytes, using 64 is overkill,
+ * but keeps the logic simple. Indeed, the whole purpose of this macro is just
+ * to give some inclination as to some of the magic values used in the various
+ * workarounds!
+ */
+#define CACHELINE_BYTES 64
+#define CACHELINE_DWORDS (CACHELINE_BYTES / sizeof(u32))
+
+struct i915_vma;
+
+struct intel_ring {
+	struct kref ref;
+	struct i915_vma *vma;
+	void *vaddr;
+
+	/*
+	 * As we have two types of rings, one global to the engine used
+	 * by ringbuffer submission and those that are exclusive to a
+	 * context used by execlists, we have to play safe and allow
+	 * atomic updates to the pin_count. However, the actual pinning
+	 * of the context is either done during initialisation for
+	 * ringbuffer submission or serialised as part of the context
+	 * pinning for execlists, and so we do not need a mutex ourselves
+	 * to serialise intel_ring_pin/intel_ring_unpin.
+	 */
+	atomic_t pin_count;
+
+	u32 head;
+	u32 tail;
+	u32 emit;
+
+	u32 space;
+	u32 size;
+	u32 effective_size;
+};
+
+#endif /* INTEL_RING_TYPES_H */
--- a/drivers/gpu/drm/i915/gt/intel_rps.c
+++ b/drivers/gpu/drm/i915/gt/intel_rps.c
--- a/drivers/gpu/drm/i915/gt/intel_rps.h
+++ b/drivers/gpu/drm/i915/gt/intel_rps.h
@ -0,0 +1,38 @@
+/*
+ * SPDX-License-Identifier: MIT
+ *
+ * Copyright © 2019 Intel Corporation
+ */
+
+#ifndef INTEL_RPS_H
+#define INTEL_RPS_H
+
+#include "intel_rps_types.h"
+
+struct i915_request;
+
+void intel_rps_init_early(struct intel_rps *rps);
+void intel_rps_init(struct intel_rps *rps);
+
+void intel_rps_driver_register(struct intel_rps *rps);
+void intel_rps_driver_unregister(struct intel_rps *rps);
+
+void intel_rps_enable(struct intel_rps *rps);
+void intel_rps_disable(struct intel_rps *rps);
+
+void intel_rps_park(struct intel_rps *rps);
+void intel_rps_unpark(struct intel_rps *rps);
+void intel_rps_boost(struct i915_request *rq);
+
+int intel_rps_set(struct intel_rps *rps, u8 val);
+void intel_rps_mark_interactive(struct intel_rps *rps, bool interactive);
+
+int intel_gpu_freq(struct intel_rps *rps, int val);
+int intel_freq_opcode(struct intel_rps *rps, int val);
+u32 intel_get_cagf(struct intel_rps *rps, u32 rpstat1);
+
+void gen5_rps_irq_handler(struct intel_rps *rps);
+void gen6_rps_irq_handler(struct intel_rps *rps, u32 pm_iir);
+void gen11_rps_irq_handler(struct intel_rps *rps, u32 pm_iir);
+
+#endif /* INTEL_RPS_H */
--- a/drivers/gpu/drm/i915/gt/intel_rps_types.h
+++ b/drivers/gpu/drm/i915/gt/intel_rps_types.h
@ -0,0 +1,93 @@
+/*
+ * SPDX-License-Identifier: MIT
+ *
+ * Copyright © 2019 Intel Corporation
+ */
+
+#ifndef INTEL_RPS_TYPES_H
+#define INTEL_RPS_TYPES_H
+
+#include <linux/atomic.h>
+#include <linux/ktime.h>
+#include <linux/mutex.h>
+#include <linux/types.h>
+#include <linux/workqueue.h>
+
+struct intel_ips {
+	u64 last_count1;
+	unsigned long last_time1;
+	unsigned long chipset_power;
+	u64 last_count2;
+	u64 last_time2;
+	unsigned long gfx_power;
+	u8 corr;
+
+	int c, m;
+};
+
+struct intel_rps_ei {
+	ktime_t ktime;
+	u32 render_c0;
+	u32 media_c0;
+};
+
+struct intel_rps {
+	struct mutex lock; /* protects enabling and the worker */
+
+	/*
+	 * work, interrupts_enabled and pm_iir are protected by
+	 * dev_priv->irq_lock
+	 */
+	struct work_struct work;
+	bool enabled;
+	bool active;
+	u32 pm_iir;
+
+	/* PM interrupt bits that should never be masked */
+	u32 pm_intrmsk_mbz;
+	u32 pm_events;
+
+	/* Frequencies are stored in potentially platform dependent multiples.
+	 * In other words, *_freq needs to be multiplied by X to be interesting.
+	 * Soft limits are those which are used for the dynamic reclocking done
+	 * by the driver (raise frequencies under heavy loads, and lower for
+	 * lighter loads). Hard limits are those imposed by the hardware.
+	 *
+	 * A distinction is made for overclocking, which is never enabled by
+	 * default, and is considered to be above the hard limit if it's
+	 * possible at all.
+	 */
+	u8 cur_freq;		/* Current frequency (cached, may not == HW) */
+	u8 last_freq;		/* Last SWREQ frequency */
+	u8 min_freq_softlimit;	/* Minimum frequency permitted by the driver */
+	u8 max_freq_softlimit;	/* Max frequency permitted by the driver */
+	u8 max_freq;		/* Maximum frequency, RP0 if not overclocking */
+	u8 min_freq;		/* AKA RPn. Minimum frequency */
+	u8 boost_freq;		/* Frequency to request when wait boosting */
+	u8 idle_freq;		/* Frequency to request when we are idle */
+	u8 efficient_freq;	/* AKA RPe. Pre-determined balanced frequency */
+	u8 rp1_freq;		/* "less than" RP0 power/freqency */
+	u8 rp0_freq;		/* Non-overclocked max frequency. */
+	u16 gpll_ref_freq;	/* vlv/chv GPLL reference frequency */
+
+	int last_adj;
+
+	struct {
+		struct mutex mutex;
+
+		enum { LOW_POWER, BETWEEN, HIGH_POWER } mode;
+		unsigned int interactive;
+
+		u8 up_threshold; /* Current %busy required to uplock */
+		u8 down_threshold; /* Current %busy required to downclock */
+	} power;
+
+	atomic_t num_waiters;
+	atomic_t boosts;
+
+	/* manual wa residency calculations */
+	struct intel_rps_ei ei;
+	struct intel_ips ips;
+};
+
+#endif /* INTEL_RPS_TYPES_H */
--- a/drivers/gpu/drm/i915/gt/intel_timeline.c
+++ b/drivers/gpu/drm/i915/gt/intel_timeline.c
@ -4,13 +4,13 @@
 * Copyright © 2016-2018 Intel Corporation
 */

-#include "gt/intel_gt_types.h"
-
 #include "i915_drv.h"

 #include "i915_active.h"
 #include "i915_syncmap.h"
-#include "gt/intel_timeline.h"
+#include "intel_gt.h"
+#include "intel_ring.h"
+#include "intel_timeline.h"

 #define ptr_set_bit(ptr, bit) ((typeof(ptr))((unsigned long)(ptr) | BIT(bit)))
 #define ptr_test_bit(ptr, bit) ((unsigned long)(ptr) & BIT(bit))
--- a/drivers/gpu/drm/i915/gt/intel_workarounds.c
+++ b/drivers/gpu/drm/i915/gt/intel_workarounds.c
@ -7,6 +7,7 @@
 #include "i915_drv.h"
 #include "intel_context.h"
 #include "intel_gt.h"
+#include "intel_ring.h"
 #include "intel_workarounds.h"

 /**
@ -1215,6 +1216,26 @@ static void icl_whitelist_build(struct intel_engine_cs *engine)

 static void tgl_whitelist_build(struct intel_engine_cs *engine)
 {
+	struct i915_wa_list *w = &engine->whitelist;
+
+	switch (engine->class) {
+	case RENDER_CLASS:
+		/*
+		 * WaAllowPMDepthAndInvocationCountAccessFromUMD:tgl
+		 *
+		 * This covers 4 registers which are next to one another :
+		 *   - PS_INVOCATION_COUNT
+		 *   - PS_INVOCATION_COUNT_UDW
+		 *   - PS_DEPTH_COUNT
+		 *   - PS_DEPTH_COUNT_UDW
+		 */
+		whitelist_reg_ext(w, PS_INVOCATION_COUNT,
+				  RING_FORCE_TO_NONPRIV_ACCESS_RD |
+				  RING_FORCE_TO_NONPRIV_RANGE_4);
+		break;
+	default:
+		break;
+	}
 }

 void intel_engine_init_whitelist(struct intel_engine_cs *engine)
--- a/drivers/gpu/drm/i915/gt/mock_engine.c
+++ b/drivers/gpu/drm/i915/gt/mock_engine.c
@ -23,6 +23,7 @@
 */

 #include "gem/i915_gem_context.h"
+#include "gt/intel_ring.h"

 #include "i915_drv.h"
 #include "intel_context.h"
--- a/drivers/gpu/drm/i915/gt/selftest_context.c
+++ b/drivers/gpu/drm/i915/gt/selftest_context.c
@ -103,9 +103,6 @@ static int __live_context_size(struct intel_engine_cs *engine,
 	 *
 	 * TLDR; this overlaps with the execlists redzone.
 	 */
-	if (HAS_EXECLISTS(engine->i915))
-		vaddr += LRC_HEADER_PAGES * PAGE_SIZE;
-
 	vaddr += engine->context_size - I915_GTT_PAGE_SIZE;
 	memset(vaddr, POISON_INUSE, I915_GTT_PAGE_SIZE);

--- a/drivers/gpu/drm/i915/gt/selftest_engine_heartbeat.c
+++ b/drivers/gpu/drm/i915/gt/selftest_engine_heartbeat.c
@ -0,0 +1,350 @@
+/*
+ * SPDX-License-Identifier: MIT
+ *
+ * Copyright © 2018 Intel Corporation
+ */
+
+#include <linux/sort.h>
+
+#include "i915_drv.h"
+
+#include "intel_gt_requests.h"
+#include "i915_selftest.h"
+
+struct pulse {
+	struct i915_active active;
+	struct kref kref;
+};
+
+static int pulse_active(struct i915_active *active)
+{
+	kref_get(&container_of(active, struct pulse, active)->kref);
+	return 0;
+}
+
+static void pulse_free(struct kref *kref)
+{
+	kfree(container_of(kref, struct pulse, kref));
+}
+
+static void pulse_put(struct pulse *p)
+{
+	kref_put(&p->kref, pulse_free);
+}
+
+static void pulse_retire(struct i915_active *active)
+{
+	pulse_put(container_of(active, struct pulse, active));
+}
+
+static struct pulse *pulse_create(void)
+{
+	struct pulse *p;
+
+	p = kmalloc(sizeof(*p), GFP_KERNEL);
+	if (!p)
+		return p;
+
+	kref_init(&p->kref);
+	i915_active_init(&p->active, pulse_active, pulse_retire);
+
+	return p;
+}
+
+static void pulse_unlock_wait(struct pulse *p)
+{
+	mutex_lock(&p->active.mutex);
+	mutex_unlock(&p->active.mutex);
+	flush_work(&p->active.work);
+}
+
+static int __live_idle_pulse(struct intel_engine_cs *engine,
+			     int (*fn)(struct intel_engine_cs *cs))
+{
+	struct pulse *p;
+	int err;
+
+	GEM_BUG_ON(!intel_engine_pm_is_awake(engine));
+
+	p = pulse_create();
+	if (!p)
+		return -ENOMEM;
+
+	err = i915_active_acquire(&p->active);
+	if (err)
+		goto out;
+
+	err = i915_active_acquire_preallocate_barrier(&p->active, engine);
+	if (err) {
+		i915_active_release(&p->active);
+		goto out;
+	}
+
+	i915_active_acquire_barrier(&p->active);
+	i915_active_release(&p->active);
+
+	GEM_BUG_ON(i915_active_is_idle(&p->active));
+	GEM_BUG_ON(llist_empty(&engine->barrier_tasks));
+
+	err = fn(engine);
+	if (err)
+		goto out;
+
+	GEM_BUG_ON(!llist_empty(&engine->barrier_tasks));
+
+	if (intel_gt_retire_requests_timeout(engine->gt, HZ / 5)) {
+		err = -ETIME;
+		goto out;
+	}
+
+	GEM_BUG_ON(READ_ONCE(engine->serial) != engine->wakeref_serial);
+
+	pulse_unlock_wait(p); /* synchronize with the retirement callback */
+
+	if (!i915_active_is_idle(&p->active)) {
+		struct drm_printer m = drm_err_printer("pulse");
+
+		pr_err("%s: heartbeat pulse did not flush idle tasks\n",
+		       engine->name);
+		i915_active_print(&p->active, &m);
+
+		err = -EINVAL;
+		goto out;
+	}
+
+out:
+	pulse_put(p);
+	return err;
+}
+
+static int live_idle_flush(void *arg)
+{
+	struct intel_gt *gt = arg;
+	struct intel_engine_cs *engine;
+	enum intel_engine_id id;
+	int err = 0;
+
+	/* Check that we can flush the idle barriers */
+
+	for_each_engine(engine, gt, id) {
+		intel_engine_pm_get(engine);
+		err = __live_idle_pulse(engine, intel_engine_flush_barriers);
+		intel_engine_pm_put(engine);
+		if (err)
+			break;
+	}
+
+	return err;
+}
+
+static int live_idle_pulse(void *arg)
+{
+	struct intel_gt *gt = arg;
+	struct intel_engine_cs *engine;
+	enum intel_engine_id id;
+	int err = 0;
+
+	/* Check that heartbeat pulses flush the idle barriers */
+
+	for_each_engine(engine, gt, id) {
+		intel_engine_pm_get(engine);
+		err = __live_idle_pulse(engine, intel_engine_pulse);
+		intel_engine_pm_put(engine);
+		if (err && err != -ENODEV)
+			break;
+
+		err = 0;
+	}
+
+	return err;
+}
+
+static int cmp_u32(const void *_a, const void *_b)
+{
+	const u32 *a = _a, *b = _b;
+
+	return *a - *b;
+}
+
+static int __live_heartbeat_fast(struct intel_engine_cs *engine)
+{
+	struct intel_context *ce;
+	struct i915_request *rq;
+	ktime_t t0, t1;
+	u32 times[5];
+	int err;
+	int i;
+
+	ce = intel_context_create(engine->kernel_context->gem_context,
+				  engine);
+	if (IS_ERR(ce))
+		return PTR_ERR(ce);
+
+	intel_engine_pm_get(engine);
+
+	err = intel_engine_set_heartbeat(engine, 1);
+	if (err)
+		goto err_pm;
+
+	for (i = 0; i < ARRAY_SIZE(times); i++) {
+		/* Manufacture a tick */
+		do {
+			while (READ_ONCE(engine->heartbeat.systole))
+				flush_delayed_work(&engine->heartbeat.work);
+
+			engine->serial++; /* quick, pretend we are not idle! */
+			flush_delayed_work(&engine->heartbeat.work);
+			if (!delayed_work_pending(&engine->heartbeat.work)) {
+				pr_err("%s: heartbeat did not start\n",
+				       engine->name);
+				err = -EINVAL;
+				goto err_pm;
+			}
+
+			rcu_read_lock();
+			rq = READ_ONCE(engine->heartbeat.systole);
+			if (rq)
+				rq = i915_request_get_rcu(rq);
+			rcu_read_unlock();
+		} while (!rq);
+
+		t0 = ktime_get();
+		while (rq == READ_ONCE(engine->heartbeat.systole))
+			yield(); /* work is on the local cpu! */
+		t1 = ktime_get();
+
+		i915_request_put(rq);
+		times[i] = ktime_us_delta(t1, t0);
+	}
+
+	sort(times, ARRAY_SIZE(times), sizeof(times[0]), cmp_u32, NULL);
+
+	pr_info("%s: Heartbeat delay: %uus [%u, %u]\n",
+		engine->name,
+		times[ARRAY_SIZE(times) / 2],
+		times[0],
+		times[ARRAY_SIZE(times) - 1]);
+
+	/* Min work delay is 2 * 2 (worst), +1 for scheduling, +1 for slack */
+	if (times[ARRAY_SIZE(times) / 2] > jiffies_to_usecs(6)) {
+		pr_err("%s: Heartbeat delay was %uus, expected less than %dus\n",
+		       engine->name,
+		       times[ARRAY_SIZE(times) / 2],
+		       jiffies_to_usecs(6));
+		err = -EINVAL;
+	}
+
+	intel_engine_set_heartbeat(engine, CONFIG_DRM_I915_HEARTBEAT_INTERVAL);
+err_pm:
+	intel_engine_pm_put(engine);
+	intel_context_put(ce);
+	return err;
+}
+
+static int live_heartbeat_fast(void *arg)
+{
+	struct intel_gt *gt = arg;
+	struct intel_engine_cs *engine;
+	enum intel_engine_id id;
+	int err = 0;
+
+	/* Check that the heartbeat ticks at the desired rate. */
+	if (!CONFIG_DRM_I915_HEARTBEAT_INTERVAL)
+		return 0;
+
+	for_each_engine(engine, gt, id) {
+		err = __live_heartbeat_fast(engine);
+		if (err)
+			break;
+	}
+
+	return err;
+}
+
+static int __live_heartbeat_off(struct intel_engine_cs *engine)
+{
+	int err;
+
+	intel_engine_pm_get(engine);
+
+	engine->serial++;
+	flush_delayed_work(&engine->heartbeat.work);
+	if (!delayed_work_pending(&engine->heartbeat.work)) {
+		pr_err("%s: heartbeat not running\n",
+		       engine->name);
+		err = -EINVAL;
+		goto err_pm;
+	}
+
+	err = intel_engine_set_heartbeat(engine, 0);
+	if (err)
+		goto err_pm;
+
+	engine->serial++;
+	flush_delayed_work(&engine->heartbeat.work);
+	if (delayed_work_pending(&engine->heartbeat.work)) {
+		pr_err("%s: heartbeat still running\n",
+		       engine->name);
+		err = -EINVAL;
+		goto err_beat;
+	}
+
+	if (READ_ONCE(engine->heartbeat.systole)) {
+		pr_err("%s: heartbeat still allocated\n",
+		       engine->name);
+		err = -EINVAL;
+		goto err_beat;
+	}
+
+err_beat:
+	intel_engine_set_heartbeat(engine, CONFIG_DRM_I915_HEARTBEAT_INTERVAL);
+err_pm:
+	intel_engine_pm_put(engine);
+	return err;
+}
+
+static int live_heartbeat_off(void *arg)
+{
+	struct intel_gt *gt = arg;
+	struct intel_engine_cs *engine;
+	enum intel_engine_id id;
+	int err = 0;
+
+	/* Check that we can turn off heartbeat and not interrupt VIP */
+	if (!CONFIG_DRM_I915_HEARTBEAT_INTERVAL)
+		return 0;
+
+	for_each_engine(engine, gt, id) {
+		if (!intel_engine_has_preemption(engine))
+			continue;
+
+		err = __live_heartbeat_off(engine);
+		if (err)
+			break;
+	}
+
+	return err;
+}
+
+int intel_heartbeat_live_selftests(struct drm_i915_private *i915)
+{
+	static const struct i915_subtest tests[] = {
+		SUBTEST(live_idle_flush),
+		SUBTEST(live_idle_pulse),
+		SUBTEST(live_heartbeat_fast),
+		SUBTEST(live_heartbeat_off),
+	};
+	int saved_hangcheck;
+	int err;
+
+	if (intel_gt_is_wedged(&i915->gt))
+		return 0;
+
+	saved_hangcheck = i915_modparams.enable_hangcheck;
+	i915_modparams.enable_hangcheck = INT_MAX;
+
+	err = intel_gt_live_subtests(tests, &i915->gt);
+
+	i915_modparams.enable_hangcheck = saved_hangcheck;
+	return err;
+}
--- a/drivers/gpu/drm/i915/gt/selftest_hangcheck.c
+++ b/drivers/gpu/drm/i915/gt/selftest_hangcheck.c
@ -826,6 +826,8 @@ static int __igt_reset_engines(struct intel_gt *gt,
 			get_task_struct(tsk);
 		}

+		yield(); /* start all threads before we begin */
+
 		intel_engine_pm_get(engine);
 		set_bit(I915_RESET_ENGINE + id, &gt->reset.flags);
 		do {
@ -1016,7 +1018,7 @@ static int igt_reset_wait(void *arg)
 {
 	struct intel_gt *gt = arg;
 	struct i915_gpu_error *global = &gt->i915->gpu_error;
-	struct intel_engine_cs *engine = gt->i915->engine[RCS0];
+	struct intel_engine_cs *engine = gt->engine[RCS0];
 	struct i915_request *rq;
 	unsigned int reset_count;
 	struct hang h;
@ -1143,14 +1145,18 @@ static int __igt_reset_evict_vma(struct intel_gt *gt,
 				 int (*fn)(void *),
 				 unsigned int flags)
 {
-	struct intel_engine_cs *engine = gt->i915->engine[RCS0];
+	struct intel_engine_cs *engine = gt->engine[RCS0];
 	struct drm_i915_gem_object *obj;
 	struct task_struct *tsk = NULL;
 	struct i915_request *rq;
 	struct evict_vma arg;
 	struct hang h;
+	unsigned int pin_flags;
 	int err;

+	if (!gt->ggtt->num_fences && flags & EXEC_OBJECT_NEEDS_FENCE)
+		return 0;
+
 	if (!engine || !intel_engine_can_store_dword(engine))
 		return 0;

@ -1186,10 +1192,12 @@ static int __igt_reset_evict_vma(struct intel_gt *gt,
 		goto out_obj;
 	}

-	err = i915_vma_pin(arg.vma, 0, 0,
-			   i915_vma_is_ggtt(arg.vma) ?
-			   PIN_GLOBAL | PIN_MAPPABLE :
-			   PIN_USER);
+	pin_flags = i915_vma_is_ggtt(arg.vma) ? PIN_GLOBAL : PIN_USER;
+
+	if (flags & EXEC_OBJECT_NEEDS_FENCE)
+		pin_flags |= PIN_MAPPABLE;
+
+	err = i915_vma_pin(arg.vma, 0, 0, pin_flags);
 	if (err) {
 		i915_request_add(rq);
 		goto out_obj;
@ -1493,7 +1501,7 @@ static int igt_handle_error(void *arg)
 {
 	struct intel_gt *gt = arg;
 	struct i915_gpu_error *global = &gt->i915->gpu_error;
-	struct intel_engine_cs *engine = gt->i915->engine[RCS0];
+	struct intel_engine_cs *engine = gt->engine[RCS0];
 	struct hang h;
 	struct i915_request *rq;
 	struct i915_gpu_state *error;
@ -1563,7 +1571,7 @@ static int __igt_atomic_reset_engine(struct intel_engine_cs *engine,
 	GEM_TRACE("i915_reset_engine(%s:%s) under %s\n",
 		  engine->name, mode, p->name);

-	tasklet_disable_nosync(t);
+	tasklet_disable(t);
 	p->critical_section_begin();

 	err = intel_engine_reset(engine, NULL);
@ -1686,7 +1694,6 @@ int intel_hangcheck_live_selftests(struct drm_i915_private *i915)
 	};
 	struct intel_gt *gt = &i915->gt;
 	intel_wakeref_t wakeref;
-	bool saved_hangcheck;
 	int err;

 	if (!intel_has_gpu_reset(gt))
@ -1696,12 +1703,9 @@ int intel_hangcheck_live_selftests(struct drm_i915_private *i915)
 		return -EIO; /* we're long past hope of a successful reset */

 	wakeref = intel_runtime_pm_get(gt->uncore->rpm);
-	saved_hangcheck = fetch_and_zero(&i915_modparams.enable_hangcheck);
-	drain_delayed_work(&gt->hangcheck.work); /* flush param */

 	err = intel_gt_live_subtests(tests, gt);

-	i915_modparams.enable_hangcheck = saved_hangcheck;
 	intel_runtime_pm_put(gt->uncore->rpm, wakeref);

 	return err;
--- a/drivers/gpu/drm/i915/gt/selftest_llc.c
+++ b/drivers/gpu/drm/i915/gt/selftest_llc.c
@ -6,6 +6,7 @@

 #include "intel_pm.h" /* intel_gpu_freq() */
 #include "selftest_llc.h"
+#include "intel_rps.h"

 static int gen6_verify_ring_freq(struct intel_llc *llc)
 {
@ -25,6 +26,8 @@ static int gen6_verify_ring_freq(struct intel_llc *llc)
 	for (gpu_freq = consts.min_gpu_freq;
 	     gpu_freq <= consts.max_gpu_freq;
 	     gpu_freq++) {
+		struct intel_rps *rps = &llc_to_gt(llc)->rps;
+
 		unsigned int ia_freq, ring_freq, found;
 		u32 val;

@ -44,7 +47,7 @@ static int gen6_verify_ring_freq(struct intel_llc *llc)
 		if (found != ia_freq) {
 			pr_err("Min freq table(%d/[%d, %d]):%dMHz did not match expected CPU freq, found %d, expected %d\n",
 			       gpu_freq, consts.min_gpu_freq, consts.max_gpu_freq,
-			       intel_gpu_freq(i915, gpu_freq * (INTEL_GEN(i915) >= 9 ? GEN9_FREQ_SCALER : 1)),
+			       intel_gpu_freq(rps, gpu_freq * (INTEL_GEN(i915) >= 9 ? GEN9_FREQ_SCALER : 1)),
 			       found, ia_freq);
 			err = -EINVAL;
 			break;
@ -54,7 +57,7 @@ static int gen6_verify_ring_freq(struct intel_llc *llc)
 		if (found != ring_freq) {
 			pr_err("Min freq table(%d/[%d, %d]):%dMHz did not match expected ring freq, found %d, expected %d\n",
 			       gpu_freq, consts.min_gpu_freq, consts.max_gpu_freq,
-			       intel_gpu_freq(i915, gpu_freq * (INTEL_GEN(i915) >= 9 ? GEN9_FREQ_SCALER : 1)),
+			       intel_gpu_freq(rps, gpu_freq * (INTEL_GEN(i915) >= 9 ? GEN9_FREQ_SCALER : 1)),
 			       found, ring_freq);
 			err = -EINVAL;
 			break;
--- a/drivers/gpu/drm/i915/gt/selftest_lrc.c
+++ b/drivers/gpu/drm/i915/gt/selftest_lrc.c
@ -7,6 +7,7 @@
 #include <linux/prime_numbers.h>

 #include "gem/i915_gem_pm.h"
+#include "gt/intel_engine_heartbeat.h"
 #include "gt/intel_reset.h"

 #include "i915_selftest.h"
@ -168,12 +169,7 @@ static int live_unlite_restore(struct intel_gt *gt, int prio)
 		}
 		GEM_BUG_ON(!ce[1]->ring->size);
 		intel_ring_reset(ce[1]->ring, ce[1]->ring->size / 2);
-
-		local_irq_disable(); /* appease lockdep */
-		__context_pin_acquire(ce[1]);
 		__execlists_update_reg_state(ce[1], engine);
-		__context_pin_release(ce[1]);
-		local_irq_enable();

 		rq[0] = igt_spinner_create_request(&spin, ce[0], MI_ARB_CHECK);
 		if (IS_ERR(rq[0])) {
@ -444,6 +440,8 @@ static int live_timeslice_preempt(void *arg)
 	 * need to preempt the current task and replace it with another
 	 * ready task.
 	 */
+	if (!IS_ACTIVE(CONFIG_DRM_I915_TIMESLICE_DURATION))
+		return 0;

 	obj = i915_gem_object_create_internal(gt->i915, PAGE_SIZE);
 	if (IS_ERR(obj))
@ -518,6 +516,11 @@ static void wait_for_submit(struct intel_engine_cs *engine,
 	} while (!i915_request_is_active(rq));
 }

+static long timeslice_threshold(const struct intel_engine_cs *engine)
+{
+	return 2 * msecs_to_jiffies_timeout(timeslice(engine)) + 1;
+}
+
 static int live_timeslice_queue(void *arg)
 {
 	struct intel_gt *gt = arg;
@ -535,6 +538,8 @@ static int live_timeslice_queue(void *arg)
 	 * ELSP[1] is already occupied, so must rely on timeslicing to
 	 * eject ELSP[0] in favour of the queue.)
 	 */
+	if (!IS_ACTIVE(CONFIG_DRM_I915_TIMESLICE_DURATION))
+		return 0;

 	obj = i915_gem_object_create_internal(gt->i915, PAGE_SIZE);
 	if (IS_ERR(obj))
@ -612,8 +617,8 @@ static int live_timeslice_queue(void *arg)
 			err = -EINVAL;
 		}

-		/* Timeslice every jiffie, so within 2 we should signal */
-		if (i915_request_wait(rq, 0, 3) < 0) {
+		/* Timeslice every jiffy, so within 2 we should signal */
+		if (i915_request_wait(rq, 0, timeslice_threshold(engine)) < 0) {
 			struct drm_printer p =
 				drm_info_printer(gt->i915->drm.dev);

@ -1165,6 +1170,325 @@ err_wedged:
 	goto err_client_b;
 }

+struct live_preempt_cancel {
+	struct intel_engine_cs *engine;
+	struct preempt_client a, b;
+};
+
+static int __cancel_active0(struct live_preempt_cancel *arg)
+{
+	struct i915_request *rq;
+	struct igt_live_test t;
+	int err;
+
+	/* Preempt cancel of ELSP0 */
+	GEM_TRACE("%s(%s)\n", __func__, arg->engine->name);
+	if (igt_live_test_begin(&t, arg->engine->i915,
+				__func__, arg->engine->name))
+		return -EIO;
+
+	clear_bit(CONTEXT_BANNED, &arg->a.ctx->flags);
+	rq = spinner_create_request(&arg->a.spin,
+				    arg->a.ctx, arg->engine,
+				    MI_ARB_CHECK);
+	if (IS_ERR(rq))
+		return PTR_ERR(rq);
+
+	i915_request_get(rq);
+	i915_request_add(rq);
+	if (!igt_wait_for_spinner(&arg->a.spin, rq)) {
+		err = -EIO;
+		goto out;
+	}
+
+	i915_gem_context_set_banned(arg->a.ctx);
+	err = intel_engine_pulse(arg->engine);
+	if (err)
+		goto out;
+
+	if (i915_request_wait(rq, 0, HZ / 5) < 0) {
+		err = -EIO;
+		goto out;
+	}
+
+	if (rq->fence.error != -EIO) {
+		pr_err("Cancelled inflight0 request did not report -EIO\n");
+		err = -EINVAL;
+		goto out;
+	}
+
+out:
+	i915_request_put(rq);
+	if (igt_live_test_end(&t))
+		err = -EIO;
+	return err;
+}
+
+static int __cancel_active1(struct live_preempt_cancel *arg)
+{
+	struct i915_request *rq[2] = {};
+	struct igt_live_test t;
+	int err;
+
+	/* Preempt cancel of ELSP1 */
+	GEM_TRACE("%s(%s)\n", __func__, arg->engine->name);
+	if (igt_live_test_begin(&t, arg->engine->i915,
+				__func__, arg->engine->name))
+		return -EIO;
+
+	clear_bit(CONTEXT_BANNED, &arg->a.ctx->flags);
+	rq[0] = spinner_create_request(&arg->a.spin,
+				       arg->a.ctx, arg->engine,
+				       MI_NOOP); /* no preemption */
+	if (IS_ERR(rq[0]))
+		return PTR_ERR(rq[0]);
+
+	i915_request_get(rq[0]);
+	i915_request_add(rq[0]);
+	if (!igt_wait_for_spinner(&arg->a.spin, rq[0])) {
+		err = -EIO;
+		goto out;
+	}
+
+	clear_bit(CONTEXT_BANNED, &arg->b.ctx->flags);
+	rq[1] = spinner_create_request(&arg->b.spin,
+				       arg->b.ctx, arg->engine,
+				       MI_ARB_CHECK);
+	if (IS_ERR(rq[1])) {
+		err = PTR_ERR(rq[1]);
+		goto out;
+	}
+
+	i915_request_get(rq[1]);
+	err = i915_request_await_dma_fence(rq[1], &rq[0]->fence);
+	i915_request_add(rq[1]);
+	if (err)
+		goto out;
+
+	i915_gem_context_set_banned(arg->b.ctx);
+	err = intel_engine_pulse(arg->engine);
+	if (err)
+		goto out;
+
+	igt_spinner_end(&arg->a.spin);
+	if (i915_request_wait(rq[1], 0, HZ / 5) < 0) {
+		err = -EIO;
+		goto out;
+	}
+
+	if (rq[0]->fence.error != 0) {
+		pr_err("Normal inflight0 request did not complete\n");
+		err = -EINVAL;
+		goto out;
+	}
+
+	if (rq[1]->fence.error != -EIO) {
+		pr_err("Cancelled inflight1 request did not report -EIO\n");
+		err = -EINVAL;
+		goto out;
+	}
+
+out:
+	i915_request_put(rq[1]);
+	i915_request_put(rq[0]);
+	if (igt_live_test_end(&t))
+		err = -EIO;
+	return err;
+}
+
+static int __cancel_queued(struct live_preempt_cancel *arg)
+{
+	struct i915_request *rq[3] = {};
+	struct igt_live_test t;
+	int err;
+
+	/* Full ELSP and one in the wings */
+	GEM_TRACE("%s(%s)\n", __func__, arg->engine->name);
+	if (igt_live_test_begin(&t, arg->engine->i915,
+				__func__, arg->engine->name))
+		return -EIO;
+
+	clear_bit(CONTEXT_BANNED, &arg->a.ctx->flags);
+	rq[0] = spinner_create_request(&arg->a.spin,
+				       arg->a.ctx, arg->engine,
+				       MI_ARB_CHECK);
+	if (IS_ERR(rq[0]))
+		return PTR_ERR(rq[0]);
+
+	i915_request_get(rq[0]);
+	i915_request_add(rq[0]);
+	if (!igt_wait_for_spinner(&arg->a.spin, rq[0])) {
+		err = -EIO;
+		goto out;
+	}
+
+	clear_bit(CONTEXT_BANNED, &arg->b.ctx->flags);
+	rq[1] = igt_request_alloc(arg->b.ctx, arg->engine);
+	if (IS_ERR(rq[1])) {
+		err = PTR_ERR(rq[1]);
+		goto out;
+	}
+
+	i915_request_get(rq[1]);
+	err = i915_request_await_dma_fence(rq[1], &rq[0]->fence);
+	i915_request_add(rq[1]);
+	if (err)
+		goto out;
+
+	rq[2] = spinner_create_request(&arg->b.spin,
+				       arg->a.ctx, arg->engine,
+				       MI_ARB_CHECK);
+	if (IS_ERR(rq[2])) {
+		err = PTR_ERR(rq[2]);
+		goto out;
+	}
+
+	i915_request_get(rq[2]);
+	err = i915_request_await_dma_fence(rq[2], &rq[1]->fence);
+	i915_request_add(rq[2]);
+	if (err)
+		goto out;
+
+	i915_gem_context_set_banned(arg->a.ctx);
+	err = intel_engine_pulse(arg->engine);
+	if (err)
+		goto out;
+
+	if (i915_request_wait(rq[2], 0, HZ / 5) < 0) {
+		err = -EIO;
+		goto out;
+	}
+
+	if (rq[0]->fence.error != -EIO) {
+		pr_err("Cancelled inflight0 request did not report -EIO\n");
+		err = -EINVAL;
+		goto out;
+	}
+
+	if (rq[1]->fence.error != 0) {
+		pr_err("Normal inflight1 request did not complete\n");
+		err = -EINVAL;
+		goto out;
+	}
+
+	if (rq[2]->fence.error != -EIO) {
+		pr_err("Cancelled queued request did not report -EIO\n");
+		err = -EINVAL;
+		goto out;
+	}
+
+out:
+	i915_request_put(rq[2]);
+	i915_request_put(rq[1]);
+	i915_request_put(rq[0]);
+	if (igt_live_test_end(&t))
+		err = -EIO;
+	return err;
+}
+
+static int __cancel_hostile(struct live_preempt_cancel *arg)
+{
+	struct i915_request *rq;
+	int err;
+
+	/* Preempt cancel non-preemptible spinner in ELSP0 */
+	if (!IS_ACTIVE(CONFIG_DRM_I915_PREEMPT_TIMEOUT))
+		return 0;
+
+	GEM_TRACE("%s(%s)\n", __func__, arg->engine->name);
+	clear_bit(CONTEXT_BANNED, &arg->a.ctx->flags);
+	rq = spinner_create_request(&arg->a.spin,
+				    arg->a.ctx, arg->engine,
+				    MI_NOOP); /* preemption disabled */
+	if (IS_ERR(rq))
+		return PTR_ERR(rq);
+
+	i915_request_get(rq);
+	i915_request_add(rq);
+	if (!igt_wait_for_spinner(&arg->a.spin, rq)) {
+		err = -EIO;
+		goto out;
+	}
+
+	i915_gem_context_set_banned(arg->a.ctx);
+	err = intel_engine_pulse(arg->engine); /* force reset */
+	if (err)
+		goto out;
+
+	if (i915_request_wait(rq, 0, HZ / 5) < 0) {
+		err = -EIO;
+		goto out;
+	}
+
+	if (rq->fence.error != -EIO) {
+		pr_err("Cancelled inflight0 request did not report -EIO\n");
+		err = -EINVAL;
+		goto out;
+	}
+
+out:
+	i915_request_put(rq);
+	if (igt_flush_test(arg->engine->i915))
+		err = -EIO;
+	return err;
+}
+
+static int live_preempt_cancel(void *arg)
+{
+	struct intel_gt *gt = arg;
+	struct live_preempt_cancel data;
+	enum intel_engine_id id;
+	int err = -ENOMEM;
+
+	/*
+	 * To cancel an inflight context, we need to first remove it from the
+	 * GPU. That sounds like preemption! Plus a little bit of bookkeeping.
+	 */
+
+	if (!HAS_LOGICAL_RING_PREEMPTION(gt->i915))
+		return 0;
+
+	if (preempt_client_init(gt, &data.a))
+		return -ENOMEM;
+	if (preempt_client_init(gt, &data.b))
+		goto err_client_a;
+
+	for_each_engine(data.engine, gt, id) {
+		if (!intel_engine_has_preemption(data.engine))
+			continue;
+
+		err = __cancel_active0(&data);
+		if (err)
+			goto err_wedged;
+
+		err = __cancel_active1(&data);
+		if (err)
+			goto err_wedged;
+
+		err = __cancel_queued(&data);
+		if (err)
+			goto err_wedged;
+
+		err = __cancel_hostile(&data);
+		if (err)
+			goto err_wedged;
+	}
+
+	err = 0;
+err_client_b:
+	preempt_client_fini(&data.b);
+err_client_a:
+	preempt_client_fini(&data.a);
+	return err;
+
+err_wedged:
+	GEM_TRACE_DUMP();
+	igt_spinner_end(&data.b.spin);
+	igt_spinner_end(&data.a.spin);
+	intel_gt_set_wedged(gt);
+	goto err_client_b;
+}
+
 static int live_suppress_self_preempt(void *arg)
 {
 	struct intel_gt *gt = arg;
@ -1702,6 +2026,105 @@ err_spin_hi:
 	return err;
 }

+static int live_preempt_timeout(void *arg)
+{
+	struct intel_gt *gt = arg;
+	struct i915_gem_context *ctx_hi, *ctx_lo;
+	struct igt_spinner spin_lo;
+	struct intel_engine_cs *engine;
+	enum intel_engine_id id;
+	int err = -ENOMEM;
+
+	/*
+	 * Check that we force preemption to occur by cancelling the previous
+	 * context if it refuses to yield the GPU.
+	 */
+	if (!IS_ACTIVE(CONFIG_DRM_I915_PREEMPT_TIMEOUT))
+		return 0;
+
+	if (!HAS_LOGICAL_RING_PREEMPTION(gt->i915))
+		return 0;
+
+	if (!intel_has_reset_engine(gt))
+		return 0;
+
+	if (igt_spinner_init(&spin_lo, gt))
+		return -ENOMEM;
+
+	ctx_hi = kernel_context(gt->i915);
+	if (!ctx_hi)
+		goto err_spin_lo;
+	ctx_hi->sched.priority =
+		I915_USER_PRIORITY(I915_CONTEXT_MAX_USER_PRIORITY);
+
+	ctx_lo = kernel_context(gt->i915);
+	if (!ctx_lo)
+		goto err_ctx_hi;
+	ctx_lo->sched.priority =
+		I915_USER_PRIORITY(I915_CONTEXT_MIN_USER_PRIORITY);
+
+	for_each_engine(engine, gt, id) {
+		unsigned long saved_timeout;
+		struct i915_request *rq;
+
+		if (!intel_engine_has_preemption(engine))
+			continue;
+
+		rq = spinner_create_request(&spin_lo, ctx_lo, engine,
+					    MI_NOOP); /* preemption disabled */
+		if (IS_ERR(rq)) {
+			err = PTR_ERR(rq);
+			goto err_ctx_lo;
+		}
+
+		i915_request_add(rq);
+		if (!igt_wait_for_spinner(&spin_lo, rq)) {
+			intel_gt_set_wedged(gt);
+			err = -EIO;
+			goto err_ctx_lo;
+		}
+
+		rq = igt_request_alloc(ctx_hi, engine);
+		if (IS_ERR(rq)) {
+			igt_spinner_end(&spin_lo);
+			err = PTR_ERR(rq);
+			goto err_ctx_lo;
+		}
+
+		/* Flush the previous CS ack before changing timeouts */
+		while (READ_ONCE(engine->execlists.pending[0]))
+			cpu_relax();
+
+		saved_timeout = engine->props.preempt_timeout_ms;
+		engine->props.preempt_timeout_ms = 1; /* in ms, -> 1 jiffie */
+
+		i915_request_get(rq);
+		i915_request_add(rq);
+
+		intel_engine_flush_submission(engine);
+		engine->props.preempt_timeout_ms = saved_timeout;
+
+		if (i915_request_wait(rq, 0, HZ / 10) < 0) {
+			intel_gt_set_wedged(gt);
+			i915_request_put(rq);
+			err = -ETIME;
+			goto err_ctx_lo;
+		}
+
+		igt_spinner_end(&spin_lo);
+		i915_request_put(rq);
+	}
+
+	err = 0;
+err_ctx_lo:
+	kernel_context_close(ctx_lo);
+err_ctx_hi:
+	kernel_context_close(ctx_hi);
+err_spin_lo:
+	igt_spinner_fini(&spin_lo);
+	return err;
+}
+
 static int random_range(struct rnd_state *rnd, int min, int max)
 {
 	return i915_prandom_u32_max_state(max - min, rnd) + min;
@ -1829,6 +2252,8 @@ static int smoke_crescendo(struct preempt_smoke *smoke, unsigned int flags)
 		get_task_struct(tsk[id]);
 	}

+	yield(); /* start all threads before we kthread_stop() */
+
 	count = 0;
 	for_each_engine(engine, smoke->gt, id) {
 		int status;
@ -2599,10 +3024,12 @@ int intel_execlists_live_selftests(struct drm_i915_private *i915)
 		SUBTEST(live_preempt),
 		SUBTEST(live_late_preempt),
 		SUBTEST(live_nopreempt),
+		SUBTEST(live_preempt_cancel),
 		SUBTEST(live_suppress_self_preempt),
 		SUBTEST(live_suppress_wait_preempt),
 		SUBTEST(live_chain_preempt),
 		SUBTEST(live_preempt_hang),
+		SUBTEST(live_preempt_timeout),
 		SUBTEST(live_preempt_smoke),
 		SUBTEST(live_virtual_engine),
 		SUBTEST(live_virtual_mask),
@ -2749,6 +3176,100 @@ static int live_lrc_layout(void *arg)
 	return err;
 }

+static int find_offset(const u32 *lri, u32 offset)
+{
+	int i;
+
+	for (i = 0; i < PAGE_SIZE / sizeof(u32); i++)
+		if (lri[i] == offset)
+			return i;
+
+	return -1;
+}
+
+static int live_lrc_fixed(void *arg)
+{
+	struct intel_gt *gt = arg;
+	struct intel_engine_cs *engine;
+	enum intel_engine_id id;
+	int err = 0;
+
+	/*
+	 * Check the assumed register offsets match the actual locations in
+	 * the context image.
+	 */
+
+	for_each_engine(engine, gt, id) {
+		const struct {
+			u32 reg;
+			u32 offset;
+			const char *name;
+		} tbl[] = {
+			{
+				i915_mmio_reg_offset(RING_START(engine->mmio_base)),
+				CTX_RING_BUFFER_START - 1,
+				"RING_START"
+			},
+			{
+				i915_mmio_reg_offset(RING_CTL(engine->mmio_base)),
+				CTX_RING_BUFFER_CONTROL - 1,
+				"RING_CTL"
+			},
+			{
+				i915_mmio_reg_offset(RING_HEAD(engine->mmio_base)),
+				CTX_RING_HEAD - 1,
+				"RING_HEAD"
+			},
+			{
+				i915_mmio_reg_offset(RING_TAIL(engine->mmio_base)),
+				CTX_RING_TAIL - 1,
+				"RING_TAIL"
+			},
+			{
+				i915_mmio_reg_offset(RING_MI_MODE(engine->mmio_base)),
+				lrc_ring_mi_mode(engine),
+				"RING_MI_MODE"
+			},
+			{
+				engine->mmio_base + 0x110,
+				CTX_BB_STATE - 1,
+				"BB_STATE"
+			},
+			{ },
+		}, *t;
+		u32 *hw;
+
+		if (!engine->default_state)
+			continue;
+
+		hw = i915_gem_object_pin_map(engine->default_state,
+					     I915_MAP_WB);
+		if (IS_ERR(hw)) {
+			err = PTR_ERR(hw);
+			break;
+		}
+		hw += LRC_STATE_PN * PAGE_SIZE / sizeof(*hw);
+
+		for (t = tbl; t->name; t++) {
+			int dw = find_offset(hw, t->reg);
+
+			if (dw != t->offset) {
+				pr_err("%s: Offset for %s [0x%x] mismatch, found %x, expected %x\n",
+				       engine->name,
+				       t->name,
+				       t->reg,
+				       dw,
+				       t->offset);
+				err = -EINVAL;
+			}
+		}
+
+		i915_gem_object_unpin_map(engine->default_state);
+	}
+
+	return err;
+}
+
 static int __live_lrc_state(struct i915_gem_context *fixme,
 			    struct intel_engine_cs *engine,
 			    struct i915_vma *scratch)
@ -3021,6 +3542,7 @@ int intel_lrc_live_selftests(struct drm_i915_private *i915)
 {
 	static const struct i915_subtest tests[] = {
 		SUBTEST(live_lrc_layout),
+		SUBTEST(live_lrc_fixed),
 		SUBTEST(live_lrc_state),
 		SUBTEST(live_gpr_clear),
 	};
--- a/drivers/gpu/drm/i915/gt/selftest_reset.c
+++ b/drivers/gpu/drm/i915/gt/selftest_reset.c
@ -126,7 +126,7 @@ static int igt_atomic_engine_reset(void *arg)
 		goto out_unlock;

 	for_each_engine(engine, gt, id) {
-		tasklet_disable_nosync(&engine->execlists.tasklet);
+		tasklet_disable(&engine->execlists.tasklet);
 		intel_engine_pm_get(engine);

 		for (p = igt_atomic_phases; p->name; p++) {
--- a/Show More
+++ b/Show More