Sunday 5 January 2014

Building FFmpeg on Windows with in-line asm and the Intel compiler (Part 2).

In my last post I talked about how to get FFmpeg to compile under Windows with the Intel compiler. This has advantages in that the Intel compiler supports compilation of AT&T style inline assembly. This means that its possible to use the hand tuned optimised code found in FFmpeg while natively compiling for Windows (something that is otherwise not possible). Unfortunately the assembly support in the Windows version of the compiler is not complete and so the inline asm wont compile without some changes.

The previous post on this subject had a patch that allowed for compilation with inline asm. However since then I have cleaned up the patch and fixed a few things that I wasnt entirely happy about. In fact after talking to Michael Niedermayer and others on the FFmpeg mailing list I ended up writing an entirely new patch. This patch is currently still pending review but for those who are interested in getting this working now they can grab the patch from the end of this post.

The main changes in this version is the way the inline asm was changed from using direct symbol references to something defined in an asm-interface. From my previous post I mentioned that Intel compiler does not support direct symbol references in code so previously I had changed all of these to asm-interfaces. However the FFmpeg developers didn't want to change any existing code and there was some concerns over how moving variables from direct symbols into the interface may affect Position Independent Code compilation. So based on a suggestion from Michael the patch was changed to use named constraints. For those familiar with named constraints you'll know that to use these all you have to do is replace a direct symbol reference with a named constraint.

Example of existing direct symbol reference:
"movq "MANGLE(ff_pb_80)", %%mm0     \n\t"
"lea         (%3, %3, 2), %1        \n\t"

Since FFmpeg already had a macro for name mangling the direct symbol references it was rather simple to just change this macro to generate a named constraint instead. In order to do this the definition of MANGLE was changed so that with Intel compiler on Windows it generates a named constraint.

// Determine if compiler supports direct symbol references in inline asm
#if defined(__INTEL_COMPILER) && defined(_MSC_VER)
#   define HAVE_DIRECT_SYMBOL_REF 0
#else
#   define HAVE_DIRECT_SYMBOL_REF 1
#endif

#if HAVE_DIRECT_SYMBOL_REF
    //The standard version of MANGLE for direct symbol references
#   define MANGLE(a) EXTERN_PREFIX LOCAL_MANGLE(a)
#else
    //A version of mangle that instead generates named constraints
#   define MANGLE(a) "%["#a"]"
#endif

Of course using a named constraint by itself wont work as the constraint still needs to be added to the asm-interface. Since this additional interface is only required for Intel on Windows it is not desirable to have it all the time. So in keeping with not changing any existing code the asm-interfaces were added using a new macro that simply does nothing for all other build chains. Using this the above inline asm becomes:

__asm__ volatile (
   "movq "MANGLE(ff_pb_80)", %%mm0     \n\t"
   "lea         (%3, %3, 2), %1        \n\t"
   put_signed_pixels_clamped_mmx_half(0)
   "lea         (%0, %3, 4), %0        \n\t"
   put_signed_pixels_clamped_mmx_half(64)
   : "+&r"(pixels), "=&r"(line_skip3)
   : "r"(block), "r"(line_skip)
     NAMED_CONSTRAINTS_ADD(ff_pb_80)
   : "memory"
);

The resulting changes are rather minimal and when compiling using any previously supported build chains there is no apparent difference in the code before and after this change. The macro NAMED_CONSTRAINTS is used to add each of the names of any directly accessed symbols. This macro just needs the name and can take a comma separated list of up to 10 values.

However adding the macro NAMED_CONSTRAINTS required slightly more work than I would have liked but it at least worked as required. The difficulty was due to a bug in both the Intel and Microsoft compilers where variadic arguments where not properly expanded (see my post on the subject http://siliconandlithium.blogspot.com/2014/01/macro-variadic-argument-expansion-on.html). Using the working variadic for-each from my previous post the implementation of NAMED_CONSTRAINTS is:

#if HAVE_DIRECT_SYMBOL_REF
#   define NAMED_CONSTRAINTS_ADD(...)
#   define NAMED_CONSTRAINTS(...)
#else
#   define NAME_CONSTRAINT(x) [x] "m"(x)
    // Parameters are a list of each symbol reference required
#   define NAMED_CONSTRAINTS_ADD(...) , FOR_EACH_VA(NAME_CONSTRAINT,__VA_ARGS__)
    // Same but without comma for when there are no previously defined constraints
#   define NAMED_CONSTRAINTS(...) FOR_EACH_VA(NAME_CONSTRAINT,__VA_ARGS__)
#endif

Putting this all together and for the most part all existing inline asm will compile without any problems. However you'll notice I said 'most'. The Intel compiler is extremely fussy about the use of inline AT&T assembly. This is most likely because this is considered a rarely used feature and so does not see much in the way of support. Unfortunately this means that using inline asm is much harder than it needs to be. So even after the missing support for direct symbol references the compiler still has many issues. The main one is that the compiler will sporadically decide it doesn't like some particular assembly line and generate an error. The same assembly line will be working fine until you change a compilation option (or in some cases just move the assembly block somewhere else) then all of a sudden it will generate an error. Ideally it would be nice if Intel cleaned up some these inconsistencies but in the mean time the patch had to work around them.

So the patch does change a couple of things. In fact there are 2 instances (in x86/motion_est.c and vf_fspp.c) where a direct symbol reference had to be removed and replaced with an asm-interface. This may be seen as changing the original code but in both instances the variable in question already existed in an asm-interface so there additional inclusion should have absolutely zero affect. In fact just to be sure I checked the generated code from gcc before and after the patch and ensured that every line was identical.

The patch passes all FATE tests and compiles on Intel (under normal release compilation options) and has zero impact on any other build chain. For those interested the full patch can be downloaded below.

Update 2: New and improved patches have been created and these are now available directly in FFmpeg master (no patching required). See my post for more information (Building FFmpeg on Windows with in-line asm and the Intel compiler (Part 3)).
Update: A new patch is available that adds support for an extra file. This file is only used when FAST_CMOV is enabled so was missed previously. The new patch should be grabbed from here:
https://github.com/ShiftMediaProject/FFmpeg/commit/26acab2672f27b923510ec35cbbade69a7776c9d.diff

Download patch file:
https://github.com/ShiftMediaProject/FFmpeg/commit/6eea177a4761629df5d6f02359fccc4efb116aed.diff

Note: Unlike my last patch this one is done directly against the current master (at time of writing).

No comments:

Post a Comment