The performance of S4 is directly dependent on the performance of the BLAS library that it is linked against. Without a BLAS library, the default RNP routines are used, which are suboptimal. It is recommended that either vendor specific routines (Intel MKL or AMD AMCL) are used, or tuned libraries such as ATLAS or OpenBLAS are used. Vendor specific routines have incompatible calling/naming conventions than the standard Fortran interface, so some porting effort is required. Note that OpenBLAS should be compiled without threading if threading is enabled within S4; otherwise results may not be correct. Installing these packages from official distribution repositories is almost always a bad idea, since the code must be tuned to the particular machine it will run on.
If a precompiled LAPACK library is used, ensure ZGEEV is thread safe for large matrices. Specifically, bug0061 in the LAPACK errata means that the maximum stack space reserved by the Fortran compiler used to build LAPACK must be sufficient to hold certain temporaries on the stack instead of in static (shared) memory. The best way to avoid this with GFortran is to use the -frecursive flag. A simple test of thread safety is provided in the testing/Lapackthreadtest/ directory.
If no LAPACK library is available, the default RNP routines are used, which are suboptimal due to lack of blocked routines. However, they are definitely thread-safe.
Although not required, it is recommended when polarization decomposition settings are enabled, since without it, a conjugate gradient solver is used instead. The conjugate gradient solver does not always converge for fine discretizations.
This library is optional; the Kiss FFT library is used instead when FFTW is not present. When available the version must be 3.x. This does not have a great effect on simulation speed since the FFT is required only a handful of times, and tends to be rather small in problem size.
The code is developed in Windows under MinGW+MSYS and runs primarily in UNIX-like environments. For other platforms, the following list details the main problem areas.
In main.c near the top dealing with threads (this is not a problem if Pthreads are disabled). In pattern/predicates.c, the settings may need modification to enable robust computations. In numalloc.c, a more efficient aligned allocator may be supplied instead of the default.
S4 is built as a library on top of the Lua programming language, or as a Python extension.
S4 has 3 main layers:
The code is mostly C-styled C++. The need to use C++ is mainly for the complex number type. There should never be non-trivial objects (only Plain Old Data structs), and certainly no inheritance, polymorphism, or templates (except in RNP).
Indentation should be 4 spaces per tab. Always use tabs instead of spaces. The actual tab width should never matter for readability, meaning tabs can only exist contiguously starting from the left-most column, and comment blocks should never sit at the end of lines of code. Lines should be kept to around 72 or 80 characters in length if possible, especially for comment blocks.
Functions generally are in Lapack-style, where there are a large number of well defined inputs, and a number of outputs returned by pointers. Functions return an integer code, with negative values corresponding to invalid parameters. When appropriate, workspaces can be passed in to reduce the number of dynamic allocations. Also, when convenient, workspace querying should be supported.
The code should compile cleanly with all warnings enabled. The only exemptions are external libraries like Kiss FFT or the geometric predicates sources.