-------------------------------------------------------------------------------
                Matrox Imaging Library (7.0) SSE2.txt Readme File
                                  August 21, 2001
    Copyright © 2001 by Matrox Electronic Systems Ltd. All rights reserved.
-------------------------------------------------------------------------------


The following file contains a list of all the functions that have been 
optimized with SSE2 code. A supplementary section also suggests the data
alignment required to obtain the best performance with SSE2 when a buffer is 
created with the MbufCreate2d()/MbufCreateColor() function. Another section 
indicates how to enable/disable the use of SSE2 optimization by MIL. 


Contents

1. Image processing commands.
2. Buffer management commands.
3. Measurements commands.
4. Pattern matching commands.
5. Blob analysis commands.
6. Graphics commands.
7. Data alignment.
8. Known difference with FPU equivalent instruction.


-------------------------------------------------------------------------------
Symbols used in the file
-------------------------------------------------------------------------------

Buffers:    Dst   : Destination
            Src   : Source
            Cnd   : Condition

Data type   UChar : unsigned char
            Char  : signed char
            UShort: unsigned short
            Short : signed short
            ULong : unsigned long
            Long  : signed long
            Float : float
            Bin   : binary

All the buffer bit and sign means all but floating point buffers.


*******************************************************************************
1. Image processing commands.
*******************************************************************************

1.1   MimConvolve ().	

      1.1.1 Optimized versions:

         Dst     Src     Kernel          
         ------  ------  ------
         Char    Char    Char         (*) (128, -128, 128, -128)
         Char    Char    UChar        (*) (256,     , 256,     )
         Char    UChar   Char         (*) (128, -128, 128, -128)
         Char    UChar   UChar        (*) (256,     , 128,     )
         UChar   Char    Char         (*) (128, -128, 128, -128)
         UChar   Char    UChar        (*) (256,     , 256,     )
         UChar   UChar   Char         (*) (128, -128, 128, -128)
         UChar   UChar   UChar        (*) (256,     , 128,     )

         Char    Char    Char         (*) (128, -128, 128, -128)
         Char    Char    UChar        (*) (256,     , 256,     )
         Char    UChar   Char         (*) (128, -128, 128, -128)
         Char    UChar   UChar        (*) (256,     , 128,     )
         UChar   Char    Char         (*) (128, -128, 128, -128)
         UChar   Char    UChar        (*) (256,     , 256,     )
         UChar   UChar   Char         (*) (128, -128, 128, -128)
         UChar   UChar   UChar        (*) (256,     , 128,     )
		
         (*)  For these versions, the sum of the kernel values is verified to 
              be below or equal (greater or equal for negative values) to the 
              values specified in parenthesis. The first value is the sum of 
              the positive values in the kernel, the second is the sum of the 
              negative values in the kernel, the third is the sum of the 
              positive values divided by the normalization factor, and the 
              fourth is the sum of the negative values divided by the 
              normalization factor. If these conditions are respected, the 
              MMX version with a 16-bit accumulator is called.
              If these conditions are not respected and the number of elements 
              in the kernel is smaller than 32025, the MMX function with a 32-bit 
              accumulator is called. If the number of elements in the kernel is greater 
              or equal to 32025 the non-MMX version is called. 	  

              The internal accumulator contains the sum of the products of kernel
              elements by image values before normalization.
   
      1.1.2 Aditionnal restriction:
	
	Src and Dst buffer pitchbytes must be multiples of 16.



*******************************************************************************
2. Buffer management commands.
*******************************************************************************

*******************************************************************************
3. Measurements commands.
*******************************************************************************

*******************************************************************************
4. Pattern matching commands.
*******************************************************************************

*******************************************************************************
5. Blob analysis commands.
*******************************************************************************

*******************************************************************************
6. Graphics commands.
*******************************************************************************

*******************************************************************************
7. Data alignment.
*******************************************************************************

   When a MIL buffer is created using MbufCreate2d()/MbufCreateColor(), its 
   image row data (scanline) should be aligned on 32-byte boundaries to give 
   the best performance in conjunction with the SSE2-enabled functions. When it 
   is not possible to align on 32-byte boundaries, then the buffer should at 
   least be aligned on xmmword (128-bit) or doubleword (32-bit) boundaries. 
   Note that, by using the MbufAlloc2d()/MbufAllocColor() function, you don't 
   have to worry about data alignment since in this case, MIL automatically 
   allocates the buffer with the proper alignment.

   Moreover, 32 extra bytes should be available in reading at the beginning and 
   end of the buffer in order for the MMX-enabled algorithms to be able to 
   perform prefetching. The performance could decrease dramatically if those 
   extra pixels are not available. When they are available, then the define 
   M_SSE2_ENABLED must be added to the attribute parameter at buffer creation 
   time (MbufCreate2d()/MbufCreateColor()) so that the SSE2-enabled algorithms
   know that prefetching can be performed on them. It is also possible to set 
   this flag after buffer creation time using the MbufControl(...M_FORMAT...) 
   command. In which case, the following syntax should appear:

   MbufControl(MilImage,
               M_FORMAT,
               M_SSE2_ENABLED|MbufInquire(MilImage, M_FORMAT, NULL));

   (Note that this control is usually reserved for internal use only and thus 
   does not appear in the official documentation.)


*******************************************************************************
8. Known difference with FPU equivalent instruction.
*******************************************************************************

   8.1 We have denoted some difference in the conversion instructions from 
       float to int. Those are due to the fact that the conversion function in 
       MSDEV makes the conversion in a __int64 before copying it in the long. 
       The SSE2 instructions, however, make the conversion on 32-bit directly. 
       This gives exactly the same value for values that fit the range of a 
       long, but not for values that overflow the range of a long.