[X11/G2D] xf86-video-armsoc with hw-accelerated EXA

Moderators: mdrjr, odroid

[X11/G2D] xf86-video-armsoc with hw-accelerated EXA

Unread postby LiquidAcid » Sun May 08, 2016 8:49 am

Posting this here in the U3 section, since it's X2/U2/U3 specific and not applicable to AMLogic hardware. And because the X2 is EOL and hence I assume that the respective section of the board isn't frequented that often.

I've spend a bit of time and leveraged my work on libdrm's Exynos subsystem to write a hardware-accelerated EXA backend for xf86-video-armsoc.

The repository (g2d branch / is default):
Code: Select all
https://github.com/tobiasjakobi/xf86-video-armsoc


Hardware-acceleration currently only affects pixmaps with a GEM backing buffers. Also the acceleration only covers the simplest forms of solid fill and copy operations. More in a second.

The backend makes heavy use of the new command stream submission API. A recent version of libdrm from my repo is needed.

libdrm repository (exynos branch / is default):
Code: Select all
https://github.com/tobiasjakobi/libdrm


kernel repository (odroid-4.5.y is needed):
Code: Select all
https://github.com/tobiasjakobi/linux-odroid-public/tree/odroid-4.5.y


I did a quick test with the r5p0 X11 blob and glmark2-es2 in windowed mode (default window size).

glmark2-es2 run (display resolution = 1280x1024):
Code: Select all
DISPLAY=:0.0 LD_LIBRARY_PATH=$HOME/local/lib/mali-r5p0-x11/ glmark2-es2
=======================================================
    glmark2 2014.03
=======================================================
    OpenGL Information
    GL_VENDOR:     ARM
    GL_RENDERER:   Mali-400 MP
    GL_VERSION:    OpenGL ES 2.0
=======================================================
[build] use-vbo=false: FPS: 232 FrameTime: 4.310 ms
[build] use-vbo=true: FPS: 278 FrameTime: 3.597 ms
[texture] texture-filter=nearest: FPS: 321 FrameTime: 3.115 ms
[texture] texture-filter=linear: FPS: 311 FrameTime: 3.215 ms
[texture] texture-filter=mipmap: FPS: 337 FrameTime: 2.967 ms
[shading] shading=gouraud: FPS: 203 FrameTime: 4.926 ms
[shading] shading=blinn-phong-inf: FPS: 218 FrameTime: 4.587 ms
[shading] shading=phong: FPS: 179 FrameTime: 5.587 ms
[shading] shading=cel: FPS: 172 FrameTime: 5.814 ms
[bump] bump-render=high-poly: FPS: 100 FrameTime: 10.000 ms
[bump] bump-render=normals: FPS: 373 FrameTime: 2.681 ms
[bump] bump-render=height: FPS: 348 FrameTime: 2.874 ms
libpng warning: iCCP: known incorrect sRGB profile
[effect2d] kernel=0,1,0;1,-4,1;0,1,0;: FPS: 244 FrameTime: 4.098 ms
libpng warning: iCCP: known incorrect sRGB profile
[effect2d] kernel=1,1,1,1,1;1,1,1,1,1;1,1,1,1,1;: FPS: 131 FrameTime: 7.634 ms
[pulsar] light=false:quads=5:texture=false: FPS: 391 FrameTime: 2.558 ms
libpng warning: iCCP: known incorrect sRGB profile
[desktop] blur-radius=5:effect=blur:passes=1:separable=true:windows=4: FPS: 67 FrameTime: 14.925 ms
libpng warning: iCCP: known incorrect sRGB profile
[desktop] effect=shadow:windows=4: FPS: 186 FrameTime: 5.376 ms
[buffer] columns=200:interleave=false:update-dispersion=0.9:update-fraction=0.5:update-method=map: FPS: 59 FrameTime: 16.949 ms
[buffer] columns=200:interleave=false:update-dispersion=0.9:update-fraction=0.5:update-method=subdata: FPS: 60 FrameTime: 16.667 ms
[buffer] columns=200:interleave=true:update-dispersion=0.9:update-fraction=0.5:update-method=map: FPS: 65 FrameTime: 15.385 ms
[ideas] speed=duration: FPS: 194 FrameTime: 5.155 ms
[jellyfish] <default>: FPS: 213 FrameTime: 4.695 ms
Error: SceneTerrain requires Vertex Texture Fetch support, but GL_MAX_VERTEX_TEXTURE_IMAGE_UNITS is 0
[terrain] <default>: Unsupported
[shadow] <default>: FPS: 145 FrameTime: 6.897 ms
[refract] <default>: FPS: 22 FrameTime: 45.455 ms
[conditionals] fragment-steps=0:vertex-steps=0: FPS: 374 FrameTime: 2.674 ms
[conditionals] fragment-steps=5:vertex-steps=0: FPS: 310 FrameTime: 3.226 ms
[conditionals] fragment-steps=0:vertex-steps=5: FPS: 346 FrameTime: 2.890 ms
[function] fragment-complexity=low:fragment-steps=5: FPS: 353 FrameTime: 2.833 ms
[function] fragment-complexity=medium:fragment-steps=5: FPS: 253 FrameTime: 3.953 ms
[loop] fragment-loop=false:fragment-steps=5:vertex-steps=5: FPS: 305 FrameTime: 3.279 ms
[loop] fragment-steps=5:fragment-uniform=false:vertex-steps=5: FPS: 306 FrameTime: 3.268 ms
[loop] fragment-steps=5:fragment-uniform=true:vertex-steps=5: FPS: 306 FrameTime: 3.268 ms
=======================================================
                                  glmark2 Score: 231
=======================================================


TODOs:
- complex solid fill operations (should just need translation from X11 ALU to G2D ROP4)
- complex copy operations (same thing)
- composite operations (this should especially help font rendering)
- acceleration for non-GEM pixmaps (tricky -- either via userptr, which could make things slower because of cache invalidation, or by migrating non-GEM to GEM pixmaps)
LiquidAcid
 
Posts: 1076
Joined: Fri Oct 11, 2013 11:07 pm
languages_spoken: english
ODROIDs: X2

Re: [X11/G2D] xf86-video-armsoc with hw-accelerated EXA

Unread postby memeka » Sun May 08, 2016 9:31 am

this is awesome work.
did you ever have a look at the exynos samsung exa implementation? it's also using g2d, but old implementation with old calls (3.x kernel series)
(it's on the tizen repository)
User avatar
memeka
 
Posts: 3621
Joined: Mon May 20, 2013 10:22 am
languages_spoken: english
ODROIDs: XU rev2 + eMMC + UART
U3 + eMMC + IO Shield + UART

Re: [X11/G2D] xf86-video-armsoc with hw-accelerated EXA

Unread postby LiquidAcid » Fri May 20, 2016 11:12 pm

I've started to extend accelerated solid ops to non-GEM pixmaps by using the userptr API. Works quite OK and naturally exposed some bugs in the kernel which I've already fixed. Chromium 41.y (yeah, a bit old) was used to stress-test the DDX since it does a lot of pixmap allocations/deallocations.

So for solid this seems to increase the ratio of accelerated to over 50%:
Code: Select all
(II) PERF: EXA solid: accel = 9453, nonaccel = 7872


Probably a lot of these 7872 ops are for 1-bit pixel formats (the engine can't handle these).
LiquidAcid
 
Posts: 1076
Joined: Fri Oct 11, 2013 11:07 pm
languages_spoken: english
ODROIDs: X2

Re: [X11/G2D] xf86-video-armsoc with hw-accelerated EXA

Unread postby tnt » Fri Jun 24, 2016 7:29 am

Looks like awesome work.

What are the chances of running this on a XU4 ?

AFAIU the armsoc driver would be the same and the G2D core is also the same, but since it depends on your newer kernel that might be an issue. Can your 4.5 kernel run on a XU4 ? With what limitations ? And if not, is the DRM code for G2D separable enough to backport to 3.10 ?

I actually just spend the lastfew hours coding a basic version of EXA/G2D accel on the XU4 and found this post and your code when looking for G2D examples :p I got something somewhat working ... but only works for the first app somehow, after that it becomes corrupted.
tnt
 
Posts: 1
Joined: Fri Jun 24, 2016 7:19 am
languages_spoken: english
ODROIDs: C1,XU4

Re: [X11/G2D] xf86-video-armsoc with hw-accelerated EXA

Unread postby LiquidAcid » Fri Jun 24, 2016 5:36 pm

I don't own a XU4 so I can't answer your questions. The best approach is to check for yourself.
LiquidAcid
 
Posts: 1076
Joined: Fri Oct 11, 2013 11:07 pm
languages_spoken: english
ODROIDs: X2

Re: [X11/G2D] xf86-video-armsoc with hw-accelerated EXA

Unread postby LiquidAcid » Thu Aug 10, 2017 11:09 pm

Pushed some small changes so that the DDX properly compiles against xorg-server-1.19.3.
LiquidAcid
 
Posts: 1076
Joined: Fri Oct 11, 2013 11:07 pm
languages_spoken: english
ODROIDs: X2


Return to Projects

Who is online

Users browsing this forum: No registered users and 1 guest