Commit 94411593 authored by Kenneth Graunke's avatar Kenneth Graunke Committed by Daniel Vetter

drm/i915: Make sample_c messages go faster on Haswell.

Haswell significantly improved the performance of sampler_c messages,
but the optimization appears to be off by default.  Later platforms
remove this bit, and apparently always enable the optimization.

Improves performance in "Counter Strike: Global Offensive" by 18%
at default settings on Iris Pro.

This may break sampling of paletted formats (P8/A8P8/P8A8).  It's
unclear whether it affects sampling of paletted formats in general,
or just the sample_c message (which is never used).

While libva does have support for using paletted formats (primarily
for OSDs), that support appears to have been broken for at least a
year, so I couldn't observe a regression from this:

I tried to get libva-intel to use paletted formats, and observe a
regression...but the only thing I found that used it was mplayer's OSD
(on screen display).  Even without my patch, the colors were totally
wrong with that, and it's according to a few distro wikis, that's been
the case for over a year.

If libva's code for paletted formats /is/ broken, they could always
add code to disable this bit using the command validator when fixing
it.

Further investigation from Haihao shows that libva mplayer OSD seems
to work at least on his setup (still unclear what's wron with Ken's),
and that it's not affected by this patch. Quoting the discussion
between Haihao and Ken:

> > > If you use "-vo gl" or "-vo xv", the OSD is solid white text with a black
> > > border around it.  I presume that it's supposed to be white with vaapi as
> > > well, but I guess I'm not entirely sure.
> > >
> > > It's possible that the optimization doesn't affect the palette as long as
> > > you never use sample_c with the paletted textures.
> >
> > I verified the palette takes effect in the following way:
> >
> > 1. Only support P8A8 format in the driver
> >
> > 2. ran the above command and I saw white OSD text
> >
> > 3. Only support P4A4 format in the driver and don't use
> > 3DSTATE_SAMPLER_PALETTE_LOAD0 to load the value to the texture palette,
> > so the palette keeps unchanged.
> >
> > 4. ran the above command and I saw black OSD text.
> >
> > 5. Load the right value to the texture palette and ran the above command
> > again, I saw white OSD text.
> >
> > Hence I think sample_c with the paletted textures is used in the driver.
>
> That sounds like the palette is actually working, then.  Great :)
>
> I doubt that libva would use sample_c - sampling with a shadow comparison?
> It looks like it just uses sample and sample+killpix.

You are right, libva driver doesn't use sample_c message.

> I'm pretty sure the sample_c optimization just uses the palette memory as
> storage for some stuff, so it's quite possible it just works if you're
> only using sample and sample+killpix.

Thanks for the explanation, it makes sense to me.
Signed-off-by: default avatarKenneth Graunke <kenneth@whitecape.org>
[danvet: Add wa name from Ville's review to the comment and copypaste
the explanation why we don't care about libva (already broken) from
Ken. Also add conclusion from libva devs that&why this is all fine.]
Reviewed-by: default avatarVille Syrjälä <ville.syrjala@linux.intel.com>
Cc: "Xiang, Haihao" <haihao.xiang@intel.com>
Cc: libva@lists.freedesktop.org
Signed-off-by: default avatarDaniel Vetter <daniel.vetter@ffwll.ch>
parent 0e2cfc00
...@@ -6152,6 +6152,7 @@ enum punit_power_well { ...@@ -6152,6 +6152,7 @@ enum punit_power_well {
#define HSW_ROW_CHICKEN3_L3_GLOBAL_ATOMICS_DISABLE (1 << 6) #define HSW_ROW_CHICKEN3_L3_GLOBAL_ATOMICS_DISABLE (1 << 6)
#define HALF_SLICE_CHICKEN3 0xe184 #define HALF_SLICE_CHICKEN3 0xe184
#define HSW_SAMPLE_C_PERFORMANCE (1<<9)
#define GEN8_CENTROID_PIXEL_OPT_DIS (1<<8) #define GEN8_CENTROID_PIXEL_OPT_DIS (1<<8)
#define GEN8_SAMPLER_POWER_BYPASS_DIS (1<<1) #define GEN8_SAMPLER_POWER_BYPASS_DIS (1<<1)
......
...@@ -5972,6 +5972,10 @@ static void haswell_init_clock_gating(struct drm_device *dev) ...@@ -5972,6 +5972,10 @@ static void haswell_init_clock_gating(struct drm_device *dev)
I915_WRITE(GEN7_GT_MODE, I915_WRITE(GEN7_GT_MODE,
GEN6_WIZ_HASHING_MASK | GEN6_WIZ_HASHING_16x4); GEN6_WIZ_HASHING_MASK | GEN6_WIZ_HASHING_16x4);
/* WaSampleCChickenBitEnable:hsw */
I915_WRITE(HALF_SLICE_CHICKEN3,
_MASKED_BIT_ENABLE(HSW_SAMPLE_C_PERFORMANCE));
/* WaSwitchSolVfFArbitrationPriority:hsw */ /* WaSwitchSolVfFArbitrationPriority:hsw */
I915_WRITE(GAM_ECOCHK, I915_READ(GAM_ECOCHK) | HSW_ECOCHK_ARB_PRIO_SOL); I915_WRITE(GAM_ECOCHK, I915_READ(GAM_ECOCHK) | HSW_ECOCHK_ARB_PRIO_SOL);
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment