------------------------------------------------------------------------------- Matrox Imaging Library (7.0) mmx.txt Readme File June 29, 2001 Copyright © 2000 by Matrox Electronic Systems Ltd. All rights reserved. ------------------------------------------------------------------------------- The following file contains a list of all the functions that have been optimized with MMX code. A supplementary section also suggests the data alignment to obtain the best performance with MMX when a buffer is created with the MbufCreate2d()/MbufCreateColor() function. Another section indicates how MIL can enable/disable the use of MMX optimization. Contents 1. Image processing commands. 2. Buffer management commands. 3. Measurements commands. 4. Pattern matching commands. 5. Blob analysis commands. 6. Graphics command. 7. Data alignment. ------------------------------------------------------------------------------- Symbols used in the file ------------------------------------------------------------------------------- Buffers: Dst : Destination Src : Source Cnd : Condition Data type UChar : unsigned char Char : signed char UShort: unsigned short Short : signed short ULong : unsigned long Long : signed long Float : float Bin : binary The buffer bit and sign refers to all buffers except floating point. ******************************************************************************* 1. Image processing commands. ******************************************************************************* 1.1 MimArith(). 1.1.1 Optimized versions: Dst Src1 Src2 ------ ------ ------ UChar UChar UChar Char Char Char UChar UShort UChar Char Short Char UChar UChar UShort Char Char Short UShort UChar UChar Short Char Char UShort UShort UChar Short Short Char UShort UChar UShort Short Char Short UShort UShort UShort Short Short Short ULong ULong ULong Long Long Long 1.1.2 Notes. - Saturation is not optimized for 32-bit src buffers. - M_MULT / M_MULT_CONST is not optimized for 32-bit src buffers. - Saturation is not optimized for unsigned mixes with an 8-bit destination. 1.1.3 Options optimized. 1.1.3.1 Operations where saturation is optimized. M_ADD_CONST M_ADD M_SUB_CONST M_SUB M_CONST_SUB 1.1.3.2 Operations without saturation (optimized). M_AND M_AND_CONST M_OR M_OR_CONST M_XOR M_XOR_CONST M_NAND M_NAND_CONST M_NOR M_NOR_CONST M_XNOR M_XNOR_CONST M_MAX M_MAX_CONST M_MIN M_MIN_CONST M_MULT M_MULT_CONST M_ABS M_SUB_ABS M_NOT M_NEG 1.1.3.3 Operations not optimized. M_PASS M_CONST_PASS M_DIV M_DIV+M_FIXED_POINT M_DIV_CONST M_DIV_CONST+M_FIXED_POINT M_CONST_DIV M_CONST_DIV+M_FIXED_POINT 1.1.3.4 Particular cases optimized. M_MULT+M_SATURATION 8,8->8 SIGNED M_MULT_CONST+M_SATURATION 8,Constant->8 SIGNED 1.1.3.5 Mixed sign cases optimized. - M_ABS: All mixes are optimized (size & sign). - Logical (M_AND, M_NOR, etc.) operations on mixed types are optimized only when the size of all buffers is the same. - M_NEG operation is optimized on mixed types only when the size of all buffers is the same. 1.2 MimArithMultiple(). 1.2.1 M_OFFSET_GAIN. Optimized versions: Buffers UChar Buffers UChar + M_SATURATION Buffers Char Buffers Char + M_SATURATION Buffers Short Buffers Short + M_SATURATION Buffers UShort + M_SATURATION Buffers UShort Dst Src1 Src2 Src3 ------ ------ ------ ------ UChar UChar Short Short + M_SATURATION UChar UChar Char UChar + M_SATURATION UChar UChar Char Char + M_SATURATION NOTE: For in place operation, the SizeX * data size in byte must be multiple of 8. 1.2.2 M_WEIGHTED_AVERAGE. Optimized versions: Buffers UChar Buffers UChar + M_SATURATION Buffers Char Buffers Char + M_SATURATION Buffers Short Buffers Short + M_SATURATION Buffers UShort Buffers UShort + M_SATURATION 1.2.3 M_MULTIPLY_ACCUMULATE_1. Optimized versions: Buffers UChar Buffers Char Buffers Char + M_SATURATION Buffers Short Buffers Short + M_SATURATION Buffers UChar + M_SATURATION (*) Buffers UShort + M_SATURATION (*) Buffers UShort (*) (*) For these versions to be optimized by MMX, the values in the source buffer cannot exceed the maximum value of a corresponding signed buffer (ex: 127 for an 8-bit buffer). NOTE: For in place operation, the SizeX * data size in byte must be multiple of 8. 1.2.4 M_MULTIPLY_ACCUMULATE_2. Optimized versions: Buffers UChar Buffers Char Buffers Char + M_SATURATION Buffers Short Buffers Short + M_SATURATION Buffers UChar + M_SATURATION (*) Buffers UShort + M_SATURATION (*) Buffers UShort (*) (*) For these versions to be optimized by MMX, the values in the source buffer cannot exceed the maximum value of a corresponding signed buffer (ex: 127 for an 8-bit buffer). NOTE: For in place operation, the SizeX * data size in byte must be multiple of 8. 1.3 MimBinarize(). Only the following versions of the function are optimized with MMX: Dst Src ------ ------ UChar UChar UChar Char Char UChar Char Char UShort UShort UShort Short Short UShort Short Short ULong ULong ULong Long Long ULong Long Long UShort UChar UShort Char Short UChar Short Char UChar UShort UChar Short Char UShort Char Short Binary UChar Binary Char Binary UShort Binary Short 1.4 MimClip(). All integer versions for which source and destination are of the same size are optimized with MMX. Dst Src ------ ------ UChar UChar Char UChar UChar Char Char Char UShort UShort Short UShort UShort Short Short Short ULong ULong Long ULong ULong Long Long Long 1.5 MimClose(). M_GRAYSCALE operation: All the buffer depths and sign combinations are optimized in MMX. M_BINARY operation: Not optimized in MMX. 1.6 MimConnectMap(). Not optimized with MMX. 1.7 MimConvert(). Only the following versions of the function are optimized with MMX: Dst Src ------ ------ UChar UChar UChar Char Char UChar Char Char Src Dst Restriction ------ ------ ----------- M_RGB24+M_PLANAR HLS (8-bit) DstSizeX > 7 M_RGB24+M_PLANAR H (8-bit) DstSizeX > 7 M_RGB24+M_PLANAR L (8-bit) DstSizeX > 7 HLS M_RGB24+M_PLANAR DstSizeX > 7 For other conversions, see MbufCopy() and MbufBayer(). 1.8 MimConvolve(). 1.8.1 Custom kernels. Only the following versions of the function are optimized with MMX: Dst Src Kernel ------ ------ ------ Char Char Char (*) (128, -128, 128, -128) Char Char UChar (*) (256, , 256, ) Char UChar Char (*) (128, -128, 128, -128) Char UChar UChar (*) (256, , 128, ) UChar Char Char (*) (128, -128, 128, -128) UChar Char UChar (*) (256, , 256, ) UChar UChar Char (*) (128, -128, 128, -128) UChar UChar UChar (*) (256, , 128, ) Short Char Char (**) (128, -128, 128, -128) Short Char UChar (**) (256, , 256, ) Short UChar Char (**) (128, -128, 128, -128) Short UChar UChar (**) (256, , 128, ) UShort Char Char (**) (128, -128, 128, -128) UShort Char UChar (**) (256, , 256, ) UShort UChar Char (**) (128, -128, 128, -128) UShort UChar UChar (**) (256, , 128, ) Short Char Short (**) (128, -128, 128, -128) Short Char UShort (**) (256, , 256, ) Short UChar Short (**) (128, -128, 128, -128) Short UChar UShort (**) (256, , 128, ) UShort Char Short (**) (128, -128, 128, -128) UShort Char UShort (**) (256, , 256, ) UShort UChar Short (**) (128, -128, 128, -128) UShort UChar UShort (**) (256, , 128, ) Short Short Char (***) (32768, -32768, 32768, -32768) Short Short UChar (***) (65536, , 65536, ) Short UShort Char (***) (32768, -32768, 32768, -32768) Short UShort UChar (***) (65536, , 32768, ) UShort Short Char (***) (32768, -32768, 32768, -32768) UShort Short UChar (***) (65536, , 65536, ) UShort UShort Char (***) (32768, -32768, 32768, -32768) UShort UShort UChar (***) (65536, , 32768, ) Short Short Short (***) (32768, -32768, 32768, -32768) Short Short UShort (***) (65536, , 65536, ) Short UShort Short (***) (32768, -32768, 32768, -32768) Short UShort UShort (***) (65536, , 32768, ) UShort Short Short (***) (32768, -32768, 32768, -32768) UShort Short UShort (***) (65536, , 65536, ) UShort UShort Short (***) (32768, -32768, 32768, -32768) UShort UShort UShort (***) (65536, , 32768, ) (*) For these versions, the sum of the kernel values is verified to be below or equal (greater or equal for negative values) to the values specified in parenthesis. The first value is the sum of the positive values in the kernel, the second is the sum of the negative values in the kernel, the third is the sum of the positive values divided by the normalization factor, and the fourth is the sum of the negative values divided by the normalization factor. If these conditions are respected, the MMX version with a 16-bit accumulator is called. If these conditions are not respected and the number of elements in the kernel is smaller than 32025, the MMX function with a 32-bit accumulator is called. If the number of elements in the kernel is greater or equal to 32025 the non-MMX version is called. The internal accumulator contains the sum of the products of kernel elements by image values before normalization. (**) Note that for these versions, the internal accumulator is 16-bit unsigned when the source AND the kernel are unsigned. It is 16-bit signed in all other cases. To ensure that an overflow did not occur, the sum of the kernel values is verified to be below or equal (greater or equal for negative values) to the values specified in parenthesis. The first value is the sum of the positive values in the kernel, the second is the sum of the negative values in the kernel, the third is the sum of the positive values divided by the normalization factor, and the fourth is the sum of the negative values divided by the normalization factor. If these conditions are not respected, we resort to the non-MMX function. The internal accumulator contains the sum of the products of kernel elements by image values before normalization. (***)For these versions to be optimized by MMX, the values in UNSIGNED source and UNSIGNED kernel buffers should not exceed the maximum value of a corresponding signed buffer. (ex: 127 for an 8-bit buffer) For these versions, the accumulator is 32-bit signed. We have a similar verification on the kernel's sum that the previous versions. Note that for a 5x5 convolution, the MMX version of the operations called only on a Pentium processor or when saturation is needed. On a Pentium Pro and Pentium II, the C++ version is faster when saturation is not needed. If these conditions are not met, no MMX optimization will be used. 1.8.2 Predefined MIL kernels. Optimized with MMX in the following cases: All platforms: 8 bits source buffers with 8- or 16-bit designations. Pentium Pro (Pentium-II) and subsequent versions only: 8-bit and 16-bit source buffers with 8- or 16-bit designations. 1.9 MimCountDifference(). General guideline: source and destination buffers must be of the same sign. Src1 Src2 ------ ------ UChar UChar Char Char UChar UShort Char Short UShort UChar Short Char UShort UShort Short Short ULong ULong 1.10 MimDilate(). M_GRAYSCALE operation: All the buffer depths and sign combinations are optimized with MMX. M_BINARY operation: Not optimized in MMX. 1.11 MimDistance(). CHAMFER_3_4 8-bit and 16-bit signed and unsigned are optimized with MMX. CHESSBOARD 8-bit signed and unsigned is optimized with MMX. All version of CITY_BLOCK aren't optimized with MMX 1.12 MimEdgeDetect(). Only the following versions of the function are optimized with MMX Angle Gradient Source ------ -------- ------ UChar UChar UChar UChar UChar Char Char UChar UChar Char UChar Char UShort UChar UChar UShort UChar Char Short UChar UChar Short UChar Char UChar Char UChar UChar Char Char UShort Char UChar UShort Char Char Char Char UChar Char Char Char Short Char UChar Short Char Char UChar UShort UChar UChar UShort Char UShort UShort UChar UShort UShort Char Char UShort UChar Char UShort Char Short UShort UChar Short UShort Char UChar Short UChar UChar Short Char UShort Short UChar UShort Short Char Char Short UChar Char Short Char Short Short UChar Short Short Char 1.13 MimErode. M_GRAYSCALE operation: All the buffer depths and sign combinations are optimized in MMX. M_BINARY operation: Not optimized in MMX. 1.14 MimFindExtreme(). All 8-bit and 16-bit integer buffer combinations are optimized with MMX. 1.15 MimFlip(). M_FLIP_VERTICAL : Optimized (8 and 16-bit versions) M_FLIP_HORIZONTAL : Optimized (8 and 16-bit versions) 1.16 MimHistogram(). Not optimized with MMX. 1.17 MimHistogramEqualize(). Not optimized with MMX. 1.18 MimLabel(). Not optimized with MMX. 1.19 MimLocateEvent(). Only the following versions of the function are optimized with MMX. Src ------ UChar Char UShort Short 1.20 MimLutMap(). Not optimized with MMX. 1.21 MimMorphic(). M_ERODE, M_DILATE, M_THIN, M_THICK, M_HIT_OR_MISS, M_MATCH: M_GRAYSCALE operation: All the integer buffer depths and sign combinations are optimized with MMX. EXCEPTION: M_ERODE with a 32-bit + M_UNSIGNED source buffer is not optimized. M_BINARY operation: Not optimized with MMX. 1.22 MimOpen(). M_GRAYSCALE operation: All the integer buffer depths and sign combinations are optimized with MMX. M_BINARY operation: Not optimized with MMX. 1.23 MimPolarTransform(). M_RECTANGULAR_TO_POLAR(): Not optimized with MMX. M_POLAR_TO_RECTANGULAR: Not optimized with MMX. 1.24 MimProject(). Only the following versions of the function are optimized with MMX for M_0_DEGREE and M_90_DEGREE: Src ------ UChar Char UShort Short 1.25 MimRank(). All the integer buffer depths and sign combinations are optimized with MMX. The 3x3 median structuring element has been further optimized. 1.26 MimResize(). Only the following versions of the function are optimized with MMX (with M_NEAREST_NEIGHBOR interpolation only): Dst Src ------ ------ Char Char UChar UChar Char UChar Char UChar UShort UShort UShort Short Short UShort Short Short Special Notes: To Zoom in X with 8-bit buffers: 1. The width (SizeX) of the source buffer must be a multiple of 4. 2. The zooming factor must be 2 or 4. To Zoom in X with 16-bit buffers: 1. The width (SizeX) of the source buffer must be a multiple of 2. 2. The width (SizeX) of the destination buffer must be a multiple of 8. 3. The zooming factor must be 2 or 4. To Zoom out in X with 8-bit buffers: 1. The width (SizeX) of the source buffer must be a multiple of 8. 2. The zooming factor must be 1/2 or 1/4. Zooming out in X with 16-bit buffers is not optimized. To Zoom in Y, the factor must be an integer value. To Zoom out in Y, the factor must be (1/integer value). New version: Resize M_AVERAGE, 1/2 x 1/2 with unsigned char buffers only. No restrictions on the size of the buffers. 1.27 MimRotate(). Not optimized with MMX, although greatly accelerated by the addition of fixed point in calculations. 1.28 MimShift(). Only the following versions of the function are optimized with MMX: Dst Src ------ ------ Char Char UChar UChar UChar UShort Char Short UShort UChar Short Char UShort UShort Short Short 1.29 MimThick(). M_GRAYSCALE operation: All the buffer depth and sign combinations are optimized with MMX. M_BINARY operation: Not optimized with MMX. 1.30 MimThin(). M_GRAYSCALE operation: All the buffer depth and sign combinations are optimized with MMX. M_BINARY operation: Not optimized with MMX. 1.31 MimTransform(). M_FFT: This function supports only fixed-point versions of the transform. For the forward transform, the sources must be 8-bit signed or unsigned and the destination must be long. The M_NORMALIZE flag must be set to avoid internal overflows in the fixed-point computations. For the reverse transform, the restrictions are the same but the source and destination are swapped. M_DCT8x8: Not optimized with MMX. 1.32 MimTranslate(). Optimization for translations by fractional factors are optimized with MMX in the same way as MimConvolve, where 'Src' and 'Dst' are both the destination type of the MimTranslate operation and 'Kernel' is always Char. 1.33 MimWarp(): Not optimized with MMX. 1.34 MimZoneOfInfluence(). Not optimized with MMX. ******************************************************************************* 2. Buffer management commands. ******************************************************************************* 2.1 MbufCopy(). Only the following versions of the function are optimized with MMX: Src Dst Restriction ------ ------ ----------- M_MONO8 M_YUV16_YUYV DstSizeX multiple of 2 M_MONO8 M_YUV16_UYVY DstSizeX multiple of 2 M_MONO8 M_YUV16+M_PACKED DstSizeX multiple of 2 M_RGB24+M_PLANAR M_YUV24+PLANAR DstSizeX > 4 M_RGB24+M_PLANAR M_YUV16_YUYV DstSizeX multiple of 8 M_RGB24+M_PLANAR M_YUV16_UYVY DstSizeX multiple of 8 M_RGB24+M_PLANAR M_YUV16 DstSizeX multiple of 8 M_RGB24+M_PLANAR M_YUV12+M_PLANAR DstSizeX multiple of 8 and DstSizeY multiple of 2 M_RGB24+M_PLANAR M_YUV9+M_PLANAR DstSizeX multiple of 16 and DstSizeY multiple of 4 M_BGR24+M_PACKED M_YUV16_YUYV DstSizeX multiple of 8 M_BGR24+M_PACKED M_YUV16+M_PACKED DstSizeX multiple of 8 M_YUV16_YUYV M_MONO8 DstSizeX multiple of 2 M_YUV16_UYVY M_MONO8 DstSizeX multiple of 2 M_YUV16+M_PACKED M_MONO8 DstSizeX multiple of 2 M_YUV24+M_PLANAR M_RGB24+M_PLANAR SrcSizeX > 4 M_YUV16+M_PLANAR M_RGB24+M_PLANAR SrcSizeX multiple of 8 M_YUV16_YUYV M_BGR24+M_PACKED SrcSizeX multiple of 8 M_YUV16_UYVY M_BGR24+M_PACKED SrcSizeX multiple of 8 M_YUV16+M_PACKED M_BGR24+M_PACKED SrcSizeX multiple of 8 M_MONO8 M_COMPRESS+M_MONO8+M_JPEG_LOSSLESS DstSizeX multiple of 4 M_RGB24+M_PLANAR M_COMPRESS+M_RGB24+M_JPEG_LOSSLESS+M_PLANAR DstSizeX multiple of 4 M_MONO8 M_COMPRESS+M_MONO8+M_JPEG_LOSSLESS_INTERLACED DstSizeX multiple of 4 M_MONO8 M_COMPRESS+M_JPEG_LOSSY+M_MONO8 DstSizeY multiple of 8 M_RGB24+M_PLANAR M_COMPRESS+M_JPEG_LOSSY+M_RGB24+M_PLANAR DstSizeY multiple of 8 M_RGB24+M_PLANAR M_COMPRESS+M_JPEG_LOSSY+M_YUV24+M_PLANAR DstSizeY multiple of 8 M_RGB24+M_PLANAR M_COMPRESS+M_JPEG_LOSSY+M_YUV16+M_PLANAR DstSizeY multiple of 8 M_RGB24+M_PLANAR M_COMPRESS+M_JPEG_LOSSY+M_YUV16+M_PACKED DstSizeX multiple of 16 and DstSizeY multiple of 8 M_RGB24+M_PLANAR M_COMPRESS+M_JPEG_LOSSY+M_YUV12+M_PLANAR DstSizeY multiple of 16 M_RGB24+M_PLANAR M_COMPRESS+M_JPEG_LOSSY+M_YUV9+M_PLANAR DstSizeY multiple of 32 M_BGR24+M_PACKED M_COMPRESS+M_JPEG_LOSSY+M_YUV16+M_PACKED DstSizeX multiple of 16 and DstSizeY multiple of 8 M_BGR32+M_PACKED M_COMPRESS+M_JPEG_LOSSY+M_YUV16+M_PACKED DstSizeX multiple of 16 and DstSizeY multiple of 8 M_YUV24+M_PLANAR M_COMPRESS+M_JPEG_LOSSY+M_YUV24+M_PLANAR DstSizeY multiple of 8 M_YUV16+M_PLANAR M_COMPRESS+M_JPEG_LOSSY+M_YUV16+M_PLANAR DstSizeY multiple of 8 M_YUV16 M_COMPRESS+M_JPEG_LOSSY+M_YUV16+M_PACKED DstSizeX multiple of 16 and DstSizeY multiple of 8 M_YUV12+M_PLANAR M_COMPRESS+M_JPEG_LOSSY+M_YUV12+M_PLANAR DstSizeY multiple of 16 M_YUV9+M_PLANAR M_COMPRESS+M_JPEG_LOSSY+M_YUV9+M_PLANAR DstSizeY multiple of 32 M_YUV16_YUYV M_COMPRESS+M_JPEG_LOSSY+M_YUV16+M_PACKED DstSizeX multiple of 16 and DstSizeY multiple of 8 M_YUV16_UYVY M_COMPRESS+M_JPEG_LOSSY+M_YUV16+M_PACKED DstSizeX multiple of 16 and DstSizeY multiple of 8 M_MONO8 M_COMPRESS+M_JPEG_LOSSY_INTERLACED+M_MONO DstSizeY multiple of 16 M_RGB24+M_PLANAR M_COMPRESS+M_JPEG_LOSSY_INTERLACED+M_YUV16+M_PACKED DstSizeX multiple of 16 and DstSizeY multiple of 16 M_BGR24+M_PACKED M_COMPRESS+M_JPEG_LOSSY_INTERLACED+M_YUV16+M_PACKED DstSizeX multiple of 16 and DstSizeY multiple of 16 M_BGR32+M_PACKED M_COMPRESS+M_JPEG_LOSSY_INTERLACED+M_YUV16+M_PACKED DstSizeX multiple of 16 and DstSizeY multiple of 16 M_YUV16 M_COMPRESS+M_JPEG_LOSSY_INTERLACED+M_YUV16+M_PACKED DstSizeX multiple of 16 and DstSizeY multiple of 16 M_YUV16_YUYV M_COMPRESS+M_JPEG_LOSSY_INTERLACED+M_YUV16+M_PACKED DstSizeX multiple of 16 and DstSizeY multiple of 16 M_YUV16_UYVY M_COMPRESS+M_JPEG_LOSSY_INTERLACED+M_YUV16+M_PACKED DstSizeX multiple of 16 and DstSizeY multiple of 16 Src Dst Restriction ------ ------ ----------- M_COMPRESS+M_JPEG_LOSSY+M_MONO8 M_MONO8 SrcSizeY multiple of 8 M_COMPRESS+M_JPEG_LOSSY+M_RGB24+M_PLANAR M_RGB24+M_PLANAR SrcSizeY multiple of 8 M_COMPRESS+M_JPEG_LOSSY+M_YUV24+M_PLANAR M_RGB24+M_PLANAR SrcSizeY multiple of 8 M_COMPRESS+M_JPEG_LOSSY+M_YUV16 M_RGB24+M_PLANAR SrcSizeY multiple of 8 M_COMPRESS+M_JPEG_LOSSY+M_YUV12+M_PLANAR M_RGB24+M_PLANAR SrcSizeY multiple of 16 M_COMPRESS+M_JPEG_LOSSY+M_YUV9+M_PLANAR M_RGB24+M_PLANAR SrcSizeY multiple of 32 M_COMPRESS+M_JPEG_LOSSY+M_YUV16+M_PACKED M_BGR24+M_PACKED SrcSizeX multiple of 16 and SrcSizeY multiple of 8 M_COMPRESS+M_JPEG_LOSSY+M_YUV16+M_PACKED M_BGR32+M_PACKED SrcSizeX multiple of 16 and SrcSizeY multiple of 8 M_COMPRESS+M_JPEG_LOSSY+M_YUV24+M_PLANAR M_YUV24+M_PLANAR SrcSizeY multiple of 8 M_COMPRESS+M_JPEG_LOSSY+M_YUV16 M_YUV16+M_PLANAR SrcSizeY multiple of 8 M_COMPRESS+M_JPEG_LOSSY+M_YUV16+M_PACKED M_YUV16+M_PACKED SrcSizeX multiple of 16 and SrcSizeY multiple of 8 M_COMPRESS+M_JPEG_LOSSY+M_YUV12+M_PLANAR M_YUV12+M_PLANAR SrcSizeY multiple of 16 M_COMPRESS+M_JPEG_LOSSY+M_YUV9+M_PLANAR M_YUV9+M_PLANAR SrcSizeY multiple of 32 M_COMPRESS+M_JPEG_LOSSY+M_YUV16+M_PACKED M_YUV16_YUYV SrcSizeX multiple of 16 and SrcSizeY multiple of 8 M_COMPRESS+M_JPEG_LOSSY+M_YUV16+M_PACKED M_YUV16_UYVY SrcSizeX multiple of 16 and SrcSizeY multiple of 8 M_COMPRESS+M_JPEG_LOSSY_INTERLACED+M_MONO8 M_MONO8 SrcSizeY multiple of 16 M_COMPRESS+M_JPEG_LOSSY_INTERLACED+M_YUV16+M_PACKED M_RGB24 SrcSizeX multiple of 16 and SrcSizeY multiple of 16 M_COMPRESS+M_JPEG_LOSSY_INTERLACED+M_YUV16+M_PACKED M_BGR24 SrcSizeX multiple of 16 and SrcSizeY multiple of 16 M_COMPRESS+M_JPEG_LOSSY_INTERLACED+M_YUV16+M_PACKED M_BGR32 SrcSizeX multiple of 16 and SrcSizeY multiple of 16 M_COMPRESS+M_JPEG_LOSSY_INTERLACED+M_YUV16 M_YUV16+M_PLANAR SrcSizeX multiple of 16 and SrcSizeY multiple of 16 M_COMPRESS+M_JPEG_LOSSY_INTERLACED+M_YUV16+M_PACKED M_YUV16_YUYV SrcSizeX multiple of 16 and SrcSizeY multiple of 16 M_COMPRESS+M_JPEG_LOSSY_INTERLACED+M_YUV16+M_PACKED M_YUV16_UYVY SrcSizeX multiple of 16 and SrcSizeY multiple of 16 2.2 MbufCopyCond(). Only the following versions of the function are optimized with MMX: Dst Src Cnd ------ ------ ------ UChar UChar UChar UChar UChar Char UChar Char UChar UChar Char Char Char UChar UChar Char UChar Char Char Char UChar Char Char Char UShort UShort UShort UShort UShort Short UShort Short UShort UShort Short Short Short UShort UShort Short UShort Short Short Short UShort Short Short Short 2.3 MbufCopyMask(). Only the following versions of the function are optimized with MMX: Dst Src ------ ------ UChar UChar UChar Char Char UChar Char Char UShort UShort UShort Short Short UShort Short Short ULong ULong ULong Long Long ULong Long Long UChar UShort UChar Short Char UShort Char Short UShort UChar UShort Char Short UChar Short Char 2.4 MbufBayer(). Only the following versions of the function are optimized with MMX: Dst Src ------ ------ UChar UChar UShort UShort Src Dst Restriction ------ ------ ----------- M_MONO8 (Bayer) M_MONO8 DstSizeX > 10, DstSizeY > 3, SrcSizeX > 10, SrcSizeY > 3 M_MONO8 (Bayer) M_RGB24+M_PLANAR DstSizeX > 10, DstSizeY > 3, SrcSizeX > 10, SrcSizeY > 3 M_MONO8 (Bayer) M_BGR32+M_PACKED DstSizeX > 10, DstSizeY > 3, SrcSizeX > 10, SrcSizeY > 3 M_MONO16 (Bayer) M_RGB48+M_PLANAR DstSizeX > 6, DstSizeY > 3, SrcSizeX > 6, SrcSizeY > 3 M_MONO8 (Bayer) M_YUV_YUYV See table below. If the Src is M_MONO8 (Bayer) and the Dst is M_YUV16_YUYV, the restrictions depend on the SizeX and the AncestorOffsetX as follows: AncestorOffsetX SizeX Restriction --------------- ----- ----------- Odd Even DstSizeX > 10, DstSizeY > 3, SrcSizeX > 10, SrcSizeY > 3 Odd Odd DstSizeX > 11, DstSizeY > 3, SrcSizeX > 11, SrcSizeY > 3 Even Even DstSizeX > 12, DstSizeY > 3, SrcSizeX > 12, SrcSizeY > 3 Even Odd DstSizeX > 11, DstSizeY > 3, SrcSizeX > 11, SrcSizeY > 3 Finally, there are restrictions on the values of the white balance coefficients to respect in order to use the MMX-optimized version of the function: Dst Coefficient #0 Coefficient #1 Coefficient #2 ------ -------------- -------------- -------------- M_MONO8 < 64 Don't care Don't care M_RGB24+M_PLANAR < 64 < 64 < 64 M_BGR32+M_PACKED < 64 < 64 < 64 M_RGB48+M_PLANAR < 64 < 64 < 64 M_YUV_YUYV < 64 Don't care Don't care For other conversions, see MbufCopy() and MimConvert(). ******************************************************************************* 3. Measurements commands. ******************************************************************************* 3.1 MmeasAllocContext(). Not optimized with MMX. 3.2 MmeasAllocMarker(). Not optimized with MMX. 3.3 MmeasAllocResult(). Not optimized with MMX. 3.4 MmeasCalculate(). Not optimized with MMX. 3.5 MmeasControl(). Not optimized with MMX. 3.6 MmeasFindMarker(). Optimized with MMX. 3.7 MmeasFree(). Not optimized with MMX. 3.8 MmeasGetResult(). Not optimized with MMX. 3.9 MmeasInquire(). Not optimized with MMX. 3.10 MmeasRestoreMarker(). Not optimized with MMX. 3.11 MmeasSaveMarker(). Not optimized with MMX. 3.12 MmeasSetMarker(). Not optimized with MMX. ******************************************************************************* 4. Pattern matching commands. ******************************************************************************* 4.1 MpatAllocAutoModel(). Optimized with MMX. 4.2 MpatAllocModel(). Not optimized with MMX. 4.3 MpatAllocResult(). Not optimized with MMX. 4.4 MpatAllocRotatedModel(). Not optimized with MMX. 4.5 MpatCopy(). Not optimized with MMX. 4.6 MpatFindModel(). Optimized with MMX. 4.7 MpatFindMultipleModel(). Optimized with MMX. 4.8 MpatFindOrientation(). Optimized with MMX. 4.9 MpatFree(). Not optimized with MMX. 4.10 MpatGetNumber(). Not optimized with MMX. 4.11 MpatGetResult(). Not optimized with MMX. 4.12 MpatInquire(). Not optimized with MMX. 4.13 MpatPreprocModel(). Not optimized with MMX. 4.14 MpatRead(). - Genesis model support has been added. - Not optimized with MMX. 4.15 MpatRestore(). - Genesis model support has been added. - Not optimized with MMX. 4.16 MpatSave(). Not optimized with MMX. 4.17 MpatSetAcceptance(). Not optimized with MMX. 4.18 MpatSetAccuracy(). Not optimized with MMX. 4.19 MpatSetAngle(). Not optimized with MMX. 4.20 MpatSetCenter(). Not optimized with MMX. 4.21 MpatSetCertainty(). Not optimized with MMX. 4.22 MpatSetDontCare(). Not optimized with MMX. 4.23 MpatSetNumber(). Not optimized with MMX. 4.24 MpatSetPosition(). Not optimized with MMX. 4.25 MpatSetSearchParameter(). Not optimized with MMX. 4.26 MpatSetSpeed(). Not optimized with MMX. 4.27 MpatWrite(). Not optimized with MMX. ******************************************************************************* 5. Blob analysis commands. ******************************************************************************* All the blob analysis commands that perform calculation have been optimized with MMX. 5.1 MblobAllocFeatureList(). Not optimized with MMX. 5.2 MblobAllocResult(). Not optimized with MMX. 5.3 MblobCalculate(). Only the following versions of the function are optimized with MMX: Src Foreground ------ ------------ UChar M_ZERO Char M_ZERO UShort M_ZERO Short M_ZERO 5.4 MblobControl(). Not optimized with MMX. 5.5 MblobFill(). Not optimized with MMX. 5.6 MblobFree(). Not optimized with MMX. 5.7 MblobGetLabel(). Not optimized with MMX. 5.8 MblobGetNumber(). Not optimized with MMX. 5.9 MblobGetResult(). Not optimized with MMX. 5.10 MblobGetResultSingle(). Not optimized with MMX. 5.11 MblobGetRuns(). Not optimized with MMX. 5.12 MblobInquire(). Not optimized with MMX. 5.13 MblobLabel(). Not optimized with MMX. 5.14 MblobReconstruct(). Optimized with MMX. 5.15 MblobSelect(). Not optimized with MMX. 5.16 MblobSelectFeature(). Not optimized with MMX. 5.17 MblobSelectFeret(). Not optimized with MMX. 5.18 MblobSelectMoment(). Not optimized with MMX. ******************************************************************************* 6. Graphics command. ******************************************************************************* 6.1 MgraAlloc(). Not optimized with MMX. 6.2 MgraArc(). Not optimized with MMX. 6.3 MgraArcFill(). Not optimized with MMX. 6.4 MgraBackColor(). Not optimized with MMX. 6.5 MgraClear(). Not optimized with MMX. 6.6 MgraColor(). Not optimized with MMX. 6.7 MgraControl(). Not optimized with MMX. 6.8 MgraDot(). Not optimized with MMX. 6.9 MgraFill(). Not optimized with MMX. 6.10 MgraFont(). Not optimized with MMX. 6.11 MgraFontScale(). Not optimized with MMX. 6.12 MgraFree(). Not optimized with MMX. 6.13 MgraInquire(). Not optimized with MMX. 6.14 MgraLine(). Not optimized with MMX. 6.15 MgraRect(). Not optimized with MMX. 6.16 MgraRectFill(). Not optimized with MMX. 6.17 MgraText(). Not optimized with MMX. ******************************************************************************* 7. Data alignment. ******************************************************************************* When a MIL buffer is created using MbufCreate2d()/MbufCreateColor(), its image row data (scanline) should be aligned on 32-byte boundaries to give the best performance in conjunction with the MMX-enabled functions. When it is not possible to align on 32-byte boundaries, the buffer should at least be aligned on quadword (64-bit) or doubleword (32-bit) boundaries. Note that by using the MbufAlloc2b()/ MbufAllocColor() function, the user does not have to worry about data alignment, since in that case MIL automatically allocates the buffer with the proper alignment. Moreover, 32 extra bytes should be available in reading at the beginning and end of the buffer in order for the MMX-enabled algorithms to be able to perform pre-fetching. The performance could decrease dramatically if those extra pixels are not available. When they are available, then the define M_MMX_ENABLED must be added to the attribute parameter at buffer creation time (MbufCreate2d()/ MbufCreateColor()) so that the MMX-enabled algorithms know that the pre-fetching can be performed on them. It is also possible to set this flag after buffer creation time by the use of the MbufControl(M_FORMAT) command. In this case, the following syntax should appear like this: MbufControl(MilImage, M_FORMAT, M_MMX_ENABLED|MbufInquire(MilImage, M_FORMAT, NULL)); (Note that this control is usually reserved for internal use only and thus does not appear in the official documentation)