SciPy Roadmap
=============

Most of this roadmap is intended to provide a high-level view on what is
most needed per SciPy submodule in terms of new functionality, bug fixes, etc.
Part of those are must-haves for the ``1.0`` version of Scipy.
Furthermore it contains ideas for major new features - those are marked as such,
and are not needed for SciPy to become 1.0.  Things not mentioned in this roadmap are
not necessarily unimportant or out of scope, however we (the SciPy developers)
want to provide to our users and contributors a clear picture of where SciPy is
going and where help is needed most urgently.

When a module is in a 1.0-ready state, it means that it has the functionality
we consider essential and has an API and code quality (including documentation
and tests) that's of high enough quality.


General
-------
This roadmap will be evolving together with SciPy.  Updates can be submitted as
pull requests.  For large or disruptive changes you may want to discuss
those first on the scipy-dev mailing list.


API changes
```````````
In general, we want to take advantage of the major version change to fix some
known warts in the API.  The change from 0.x.x to 1.x.x is the chance to fix
those API issues that we all know are ugly warts.  Example: unify the
convention for specifying tolerances (including absolute, relative, argument
and function value tolerances) of the optimization functions.  More API issues
will be noted in the module sections below.  However, there should be
*clear value* in making a breaking change.  The 1.0 version label is not a
license to just break things - see it as a normal release with a somewhat
more aggressive/extensive set of cleanups.

It should be made more clear what is public and what is private in SciPy.
Everything private should be underscored as much as possible.  Now this is done
consistently when we add new code, but for 1.0 it should also be done for
existing code.


Test coverage
`````````````
Test coverage of code added in the last few years is quite good, and we aim for
a high coverage for all new code that is added.  However, there is still a
significant amount of old code for which coverage is poor.  Bringing that up to
the current standard is probably not realistic, but we should plug the biggest
holes.  Additionally the coverage should be tracked over time and we should
ensure it only goes up.

Besides coverage there is also the issue of correctness - older code may have a
few tests that provide decent statement coverage, but that doesn't necessarily
say much about whether the code does what it says on the box.  Therefore code
review of some parts of the code (``stats`` and ``signal`` in particular) is
necessary.


Documentation
`````````````
The documentation is in decent shape.  Expanding of current docstrings and
putting them in the standard NumPy format should continue, so the number of
reST errors and glitches in the html docs decreases.  Most modules also have a
tutorial in the reference guide that is a good introduction, however there are
a few missing or incomplete tutorials - this should be fixed.


Other
`````
Scipy 1.0 will likely contain more backwards-incompatible changes than a minor
release.  Therefore we will have a longer-lived maintenance branch of the last
0.X release.

Regarding Cython code:

  - It's not clear how much functionality can be Cythonized without making the
    .so files too large.  This needs measuring.
  - Cython's old syntax for using NumPy arrays should be removed and replaced
    with Cython memoryviews.
  - New feature idea: more of the currently wrapped libraries should export
    Cython-importable versions that can be used without linking.

Regarding build environments:

  - NumPy and SciPy should both build from source on Windows with a MinGW-w64
    toolchain and be compatible with Python
    installations compiled with either the same MinGW or with MSVC.
  - Bento development has stopped, so will remain having an experimental,
    use-at-your-own-risk status.  Only the people that use it will be
    responsible for keeping the Bento build updated.

A more complete continuous integration setup is needed; at the moment we often
find out right before a release that there are issues on some less-often used
platform or Python version.  At least needed are Windows (MSVC and MingwPy),
Linux and OS X builds, coverage of the lowest and highest Python and NumPy
versions that are supported.

Modules
-------

cluster
```````
This module is in good shape.


constants
`````````
This module is basically done, low-maintenance and without open issues.


fftpack
```````
Needed:

  - solve issues with single precision: large errors, disabled for difficult sizes
  - fix caching bug
  - Bluestein algorithm (or chirp Z-transform)
  - deprecate fftpack.convolve as public function (was not meant to be public)

There's a large overlap with ``numpy.fft``.  This duplication has to change
(both are too widely used to deprecate one); in the documentation we should
make clear that ``scipy.fftpack`` is preferred over ``numpy.fft``.
If there are differences in signature or functionality, the best version
should be picked case by case (example: numpy's ``rfft`` is preferred, see
gh-2487).


integrate
`````````
Needed for ODE solvers:

  - Documentation is pretty bad, needs fixing
  - A promising new ODE solver interface is in progress: gh-6326.
    This needs to be finished and merged.  After that, older API can
    possibly be deprecated.

The numerical integration functions are in good shape.  Support for integrating
complex-valued functions and integrating multiple intervals (see gh-3325) could
be added, but is not required for SciPy 1.0.


interpolate
```````````
Needed:

  - Both fitpack and fitpack2 interfaces will be kept.
  - splmake is deprecated; is different spline representation, we need exactly one
  - interp1d/interp2d are somewhat ugly but widely used, so we keep them.


Ideas for new features:

  - Spline fitting routines with better user control.
  - Integration and differentiation and arithmetic routines for splines
  - Transparent tensor-product splines.
  - NURBS support.
  - Mesh refinement and coarsening of B-splines and corresponding tensor products.

io
``
wavfile;

    - PCM float will be supported, for anything else use ``audiolab`` or other
      specialized libraries.
    - Raise errors instead of warnings if data not understood.

Other sub-modules (matlab, netcdf, idl, harwell-boeing, arff, matrix market)
are in good shape.


linalg
``````
Needed:

  - Remove functions that are duplicate with ``numpy.linalg``
  - ``get_lapack_funcs`` should always use ``flapack``
  - Wrap more LAPACK functions
  - One too many funcs for LU decomposition, remove one

Ideas for new features:

  - Add type-generic wrappers in the Cython BLAS and LAPACK
  - Make many of the linear algebra routines into gufuncs


misc
````
``scipy.misc`` will be removed as a public module.  The functions in it can be
moved to other modules:

  - ``pilutil``, images : ``ndimage``
  - ``comb``, ``factorial``, ``logsumexp``, ``pade`` : ``special``
  - ``doccer`` : move to ``scipy._lib``
  - ``info``, ``who`` : these are NumPy functions
  - ``derivative``, ``central_diff_weight`` : remove, replace with more extensive
    functionality for numerical differentiation - likely in a new module
    ``scipy.diff`` (see below)


ndimage
```````
Underlying ``ndimage`` is a powerful interpolation engine.  Unfortunately, it was
never decided whether to use a pixel model (``(1, 1)`` elements with centers
``(0.5, 0.5)``) or a data point model (values at points on a grid).  Over time,
it seems that the data point model is better defined and easier to implement.
We therefore propose to move to this data representation for 1.0, and to vet
all interpolation code to ensure that boundary values, transformations, etc.
are correctly computed.  Addressing this issue will close several issues,
including #1323, #1903, #2045 and #2640.

The morphology interface needs to be standardized:

  - binary dilation/erosion/opening/closing take a "structure" argument,
    whereas their grey equivalent take size (has to be a tuple, not a scalar),
    footprint, or structure.
  - a scalar should be acceptable for size, equivalent to providing that same
    value for each axis.
  - for binary dilation/erosion/opening/closing, the structuring element is
    optional, whereas it's mandatory for grey.  Grey morphology operations
    should get the same default.
  - other filters should also take that default value where possible.


odr
```
Rename the module to ``regression`` or ``fitting``, include
``optimize.curve_fit``. This module will then provide a home for other fitting
functionality - what exactly needs to be worked out in more detail, a
discussion can be found at https://github.com/scipy/scipy/pull/448.


optimize
````````
Overall this module is in reasonably good shape, however it is missing a few
more good global optimizers as well as large-scale optimizers.  These should be
added.  Other things that are needed:

  - deprecate the ``fmin_*`` functions in the documentation, ``minimize`` is
    preferred.
  - clearly define what's out of scope for this module.


signal
``````
*Convolution and correlation*: (Relevant functions are convolve, correlate,
fftconvolve, convolve2d, correlate2d, and sepfir2d.) Eliminate the overlap with
`ndimage` (and elsewhere).  From ``numpy``, ``scipy.signal`` and ``scipy.ndimage``
(and anywhere else we find them), pick the "best of class" for 1-D, 2-D and n-d
convolution and correlation, put the implementation somewhere, and use that
consistently throughout SciPy.

*B-splines*: (Relevant functions are bspline, cubic, quadratic, gauss_spline,
cspline1d, qspline1d, cspline2d, qspline2d, cspline1d_eval, and spline_filter.)
Move the good stuff to `interpolate` (with appropriate API changes to match how
things are done in `interpolate`), and eliminate any duplication.

*Filter design*: merge `firwin` and `firwin2` so `firwin2` can be removed.

*Continuous-Time Linear Systems*: remove `lsim2`, `impulse2`, `step2`.  Make
`lsim`, `impulse` and `step` "just work" for any input system.  Improve
performance of ltisys (less internal transformations between different
representations). Fill gaps in lti system conversion functions.

*Second Order Sections*: Make SOS filtering equally capable as existing
methods. This includes ltisys objects, an `lfiltic` equivalent, and numerically
stable conversions to and from other filter representations. SOS filters could
be considered as the default filtering method for ltisys objects, for their
numerical stability.

*Wavelets*: what's there now doesn't make much sense.  Continous wavelets
only at the moment - decide whether to completely rewrite or remove them.
Discrete wavelet transforms are out of scope (PyWavelets does a good job
for those).


sparse
``````
The sparse matrix formats are getting feature-complete but are slow ...
reimplement parts in Cython?

    - Small matrices are slower than PySparse, needs fixing

There are a lot of formats.  These should be kept, but
improvements/optimizations should go into CSR/CSC, which are the preferred
formats.  LIL may be the exception, it's inherently inefficient.  It could be
dropped if DOK is extended to support all the operations LIL currently
provides.  Alternatives are being worked on, see  https://github.com/ev-br/sparr
and https://github.com/perimosocordiae/sparray.

Ideas for new features:

    - Sparse arrays now act like np.matrix.  We want sparse *arrays*.


sparse.csgraph
``````````````
This module is in good shape.


sparse.linalg
`````````````
Arpack is in good shape.

isolve:

    - callback keyword is inconsistent
    - tol keyword is broken, should be relative tol
    - Fortran code not re-entrant (but we don't solve, maybe re-use from
      PyKrilov)

dsolve:

    - add sparse Cholesky or incomplete Cholesky
    - look at CHOLMOD


Ideas for new features:

    - Wrappers for PROPACK for faster sparse SVD computation.

spatial
```````
QHull wrappers are in good shape.

Needed:

    - ``KDTree`` will be removed, and ``cKDTree`` will be renamed to ``KDTree``
      in a backwards-compatible way.
    - ``distance_wrap.c`` needs to be cleaned up (maybe rewrite in Cython).


special
```````
Though there are still a lot of functions that need improvements in precision,
probably the only show-stoppers are hypergeometric functions, parabolic cylinder
functions, and spheroidal wave functions. Three possible ways to handle this:

    1. Get good double-precision implementations. This is doable for parabolic
       cylinder functions (in progress). I think it's possible for hypergeometric
       functions, though maybe not in time. For spheroidal wavefunctions this is
       not possible with current theory.

    2. Port Boost's arbitrary precision library and use it under the hood to get
       double precision accuracy. This might be necessary as a stopgap measure
       for hypergeometric functions; the idea of using arbitrary precision has
       been suggested before by @nmayorov and in gh-5349. Likely necessary for
       spheroidal wave functions, this could be reused:
       https://github.com/radelman/scattering.

    3. Add clear warnings to the documentation about the limits of the existing
       implementations.


stats
`````

``stats.distributions`` is in good shape.

``gaussian_kde`` is in good shape but limited. It should not be expanded
probably, this fits better in Statsmodels (which already has a lot more KDE
functionality).

``stats.mstats`` is a useful module for worked with data with missing values.
One problem it has though is that in many cases the functions have diverged
from their counterparts in ``scipy.stats``.  The ``mstats`` functions should be
updated so that the two sets of functions are consistent.


weave
`````
This module is deprecated and will be removed for Scipy 1.0.  It has been
packaged as a separate package "weave", which users that still rely on the
functionality provided by ``scipy.weave`` should use.

Also note that this is the only module that was not ported to Python 3.


New modules under discussion
----------------------------

diff
````
Currently Scipy doesn't provide much support for numerical differentiation.
A new ``scipy.diff`` module for that is discussed in
https://github.com/scipy/scipy/issues/2035.  There's also a fairly detailed
GSoC proposal to build on, see `here <https://github.com/scipy/scipy/wiki/Proposal:-add-finite-difference-numerical-derivatives-as-scipy.diff>`_.

There is also ``approx_derivative`` in ``optimize``, which is still private
but could form a solid basis for this module.

transforms
``````````
This module was discussed previously, mainly to provide a home for
discrete wavelet transform functionality.  Other transforms could fit as well,
for example there's a PR for a Hankel transform .
*Note: this is on the back burner, because the plans to integrate PyWavelets
DWT code has been put on hold.*
