Silicon & Lithium: assembly

In my last post I talked about how to get FFmpeg to compile under Windows with the Intel compiler. This has advantages in that the Intel compiler supports compilation of AT&T style inline assembly. This means that its possible to use the hand tuned optimised code found in FFmpeg while natively compiling for Windows (something that is otherwise not possible). Unfortunately the assembly support in the Windows version of the compiler is not complete and so the inline asm wont compile without some changes.

The previous post on this subject had a patch that allowed for compilation with inline asm. However since then I have cleaned up the patch and fixed a few things that I wasnt entirely happy about. In fact after talking to Michael Niedermayer and others on the FFmpeg mailing list I ended up writing an entirely new patch. This patch is currently still pending review but for those who are interested in getting this working now they can grab the patch from the end of this post.

The main changes in this version is the way the inline asm was changed from using direct symbol references to something defined in an asm-interface. From my previous post I mentioned that Intel compiler does not support direct symbol references in code so previously I had changed all of these to asm-interfaces. However the FFmpeg developers didn't want to change any existing code and there was some concerns over how moving variables from direct symbols into the interface may affect Position Independent Code compilation. So based on a suggestion from Michael the patch was changed to use named constraints. For those familiar with named constraints you'll know that to use these all you have to do is replace a direct symbol reference with a named constraint.

Example of existing direct symbol reference:

"movq "MANGLE(ff_pb_80)", %%mm0     \n\t"
"lea         (%3, %3, 2), %1        \n\t"

Since FFmpeg already had a macro for name mangling the direct symbol references it was rather simple to just change this macro to generate a named constraint instead. In order to do this the definition of MANGLE was changed so that with Intel compiler on Windows it generates a named constraint.

// Determine if compiler supports direct symbol references in inline asm
#if defined(__INTEL_COMPILER) && defined(_MSC_VER)
#   define HAVE_DIRECT_SYMBOL_REF 0
#else
#   define HAVE_DIRECT_SYMBOL_REF 1
#endif

#if HAVE_DIRECT_SYMBOL_REF
    //The standard version of MANGLE for direct symbol references
#   define MANGLE(a) EXTERN_PREFIX LOCAL_MANGLE(a)
#else
    //A version of mangle that instead generates named constraints
#   define MANGLE(a) "%["#a"]"
#endif

Of course using a named constraint by itself wont work as the constraint still needs to be added to the asm-interface. Since this additional interface is only required for Intel on Windows it is not desirable to have it all the time. So in keeping with not changing any existing code the asm-interfaces were added using a new macro that simply does nothing for all other build chains. Using this the above inline asm becomes:

__asm__ volatile (
   "movq "MANGLE(ff_pb_80)", %%mm0     \n\t"
   "lea         (%3, %3, 2), %1        \n\t"
   put_signed_pixels_clamped_mmx_half(0)
   "lea         (%0, %3, 4), %0        \n\t"
   put_signed_pixels_clamped_mmx_half(64)
   : "+&r"(pixels), "=&r"(line_skip3)
   : "r"(block), "r"(line_skip)
     NAMED_CONSTRAINTS_ADD(ff_pb_80)
   : "memory"
);

The resulting changes are rather minimal and when compiling using any previously supported build chains there is no apparent difference in the code before and after this change. The macro NAMED_CONSTRAINTS is used to add each of the names of any directly accessed symbols. This macro just needs the name and can take a comma separated list of up to 10 values.

However adding the macro NAMED_CONSTRAINTS required slightly more work than I would have liked but it at least worked as required. The difficulty was due to a bug in both the Intel and Microsoft compilers where variadic arguments where not properly expanded (see my post on the subject http://siliconandlithium.blogspot.com/2014/01/macro-variadic-argument-expansion-on.html). Using the working variadic for-each from my previous post the implementation of NAMED_CONSTRAINTS is:

#if HAVE_DIRECT_SYMBOL_REF
#   define NAMED_CONSTRAINTS_ADD(...)
#   define NAMED_CONSTRAINTS(...)
#else
#   define NAME_CONSTRAINT(x) [x] "m"(x)
    // Parameters are a list of each symbol reference required
#   define NAMED_CONSTRAINTS_ADD(...) , FOR_EACH_VA(NAME_CONSTRAINT,__VA_ARGS__)
    // Same but without comma for when there are no previously defined constraints
#   define NAMED_CONSTRAINTS(...) FOR_EACH_VA(NAME_CONSTRAINT,__VA_ARGS__)
#endif

Putting this all together and for the most part all existing inline asm will compile without any problems. However you'll notice I said 'most'. The Intel compiler is extremely fussy about the use of inline AT&T assembly. This is most likely because this is considered a rarely used feature and so does not see much in the way of support. Unfortunately this means that using inline asm is much harder than it needs to be. So even after the missing support for direct symbol references the compiler still has many issues. The main one is that the compiler will sporadically decide it doesn't like some particular assembly line and generate an error. The same assembly line will be working fine until you change a compilation option (or in some cases just move the assembly block somewhere else) then all of a sudden it will generate an error. Ideally it would be nice if Intel cleaned up some these inconsistencies but in the mean time the patch had to work around them.

So the patch does change a couple of things. In fact there are 2 instances (in x86/motion_est.c and vf_fspp.c) where a direct symbol reference had to be removed and replaced with an asm-interface. This may be seen as changing the original code but in both instances the variable in question already existed in an asm-interface so there additional inclusion should have absolutely zero affect. In fact just to be sure I checked the generated code from gcc before and after the patch and ensured that every line was identical.

The patch passes all FATE tests and compiles on Intel (under normal release compilation options) and has zero impact on any other build chain. For those interested the full patch can be downloaded below.

Update 2: New and improved patches have been created and these are now available directly in FFmpeg master (no patching required). See my post for more information (Building FFmpeg on Windows with in-line asm and the Intel compiler (Part 3)).
Update: A new patch is available that adds support for an extra file. This file is only used when FAST_CMOV is enabled so was missed previously. The new patch should be grabbed from here:
https://github.com/ShiftMediaProject/FFmpeg/commit/26acab2672f27b923510ec35cbbade69a7776c9d.diff

Download patch file:
https://github.com/ShiftMediaProject/FFmpeg/commit/6eea177a4761629df5d6f02359fccc4efb116aed.diff

Note: Unlike my last patch this one is done directly against the current master (at time of writing).

Building FFmpeg for Windows is generally done using MinGW compiler in either a MSYS or Cygwin environment. But the resulting libraries don't always play all that well when you try and use them in a native Windows program (i.e. something built with the Microsoft compiler -msvc). For those who work on Windows it is often slightly more convenient to be able to use a native build chain when creating FFmpeg binaries.

Unfortunately FFmpeg (much like most opensource projects) uses C99 code which will not build using the msvc compiler found in Visual Studio. This is because Microsoft in all there wisdom don't support anything past C89. With the standardisation of C++11 newer versions of msvc are slowly starting to support C99 features as the C++11 specification mandates that they must be supported. So supporting C++11 generally also means supporting C99. However this support started to feature in Visual Studio 2013 and it is still far from complete and can not compile many C99 based projects (at time of writing).

Things are made worse as FFmpeg doesnt just use C99 but also has large amounts of in-line assembly. This assembly is written in the AT&T assembly syntax (or more specifically GAS - GNU Assembler) as opposed to Intel assembly syntax. Since msvc only supports Intel syntax then even with C99 support it will never be possible to compile FFmpeg with in-line assembly natively. Of-course FFmpeg has a nice config option that can be used to disable all in-line asm which would mean that it would compile on a C99 compliant compiler without needing AT&T assembly support. But then we don't get all the nicely hand tuned assembly optimisations that were written specifically to maximise performance.

Luckily there is the Intel compiler to the rescue. The Intel compiler for Windows will generate a native windows library that can be easily linked and debugged like any other msvc library. It also has the added benefit of supporting AT&T assembly compilation on Windows.

Unfortunately Intels AT&T support is not fully featured on Windows. On Linux the Intel compiler just uses GAS so it supports everything GAS does. But on Windows Intel had to implement the support themselves and as a result it is not as feature complete as GAS. The main issue is that the Intel compiler on Windows does not support 'Direct symbol references' which are used in various places within FFmpeg. In order to compile on Windows these lines need to be updated in order to remove the symbol references.

An example of the problem and the fix can be found in libavcodec/x86/dsputil_mmx.c:

--- a/libavcodec/x86/dsputil_mmx.c
+++ b/libavcodec/x86/dsputil_mmx.c
@@ -115,13 +115,13 @@ void ff_put_signed_pixels_clamped_mmx(const int16_t *block, uint8_t *pixels,
     x86_reg line_skip3;
 
     __asm__ volatile (
-        "movq "MANGLE(ff_pb_80)", %%mm0     \n\t"
+        "movq %4, %%mm0                     \n\t"
         "lea         (%3, %3, 2), %1        \n\t"
         put_signed_pixels_clamped_mmx_half(0)
         "lea         (%0, %3, 4), %0        \n\t"
         put_signed_pixels_clamped_mmx_half(64)
         : "+&r"(pixels), "=&r"(line_skip3)
-        : "r"(block), "r"(line_skip)
+        : "r"(block), "r"(line_skip), "m"(ff_pb_80)
         : "memory");
 }

Here the symbol reference is replaced with an asm-interface.

There are many occurrences of direct symbol references in FFmpeg (to many to list here) so I will instead direct people to check out the attached patch.

The attached patch includes updates to the in-line asm that allow most of it to compile under the Intel compiler. Unfortunately there are 2 places where the asm uses slightly more complicated features than just symbol references so these cant be easily converted. These instances are in libavcodec/x86/mlpdsp.c where the use of labels (combined with a jump list) is not supported by Intel on Windows and is not easily modified. The second instance is in the BRANCHLESS_GET_CABAC macro in libavcodec/x86/cabac.h. This contains slightly more complex use of symbol references that if just swapped out like previous occurrences still doesn't compile without some additional work.

However despite a couple of problem functions the attached patch will allow for FFmpeg compilation on Windows with the Intel compiler and in-line asm optimisations enabled.

Update 2: New and improved patches have been created and these are now available directly in FFmpeg master (no patching required). See my post for more information (Building FFmpeg on Windows with in-line asm and the Intel compiler (Part 3)).
Update: A new patch is available in my newer post (Building FFmpeg on Windows with in-line asm and the Intel compiler (Part 2)).

Download patch file:
https://github.com/ShiftMediaProject/FFmpeg/commit/a810d0822fd9f894a3750bdb0cbea3014229ae8b.diff

Lastly for those who don't want to apply the modifications themselves and would like a nicer build method than using MSYS/Cygwin then you can check out my git repository which has working Visual Studio build projects. These projects are in the 'SMP' directory and of course require a valid install of the Intel Windows compiler.
https://github.com/ShiftMediaProject/FFmpeg

Silicon & Lithium

Sunday, 5 January 2014

Building FFmpeg on Windows with in-line asm and the Intel compiler (Part 2).

Sunday, 22 December 2013

Building FFmpeg on Windows with in-line asm and the Intel compiler.

Labels

About Me