Development Guidelines¶
Repository usage¶
Public open-source development happens at https://github.com/geomdata/gda-public
To get started:
git clone -b dev git@github.com:geomdata/gda-public.git
Ongoing cleanup work should happen in the dev
branch. Specific feature
additions can happen in their own short-lived branch. Whenever possible, the
master
branch should not be committed to directly; instead, commit
changes to dev
, then apply a merge request from dev
to master
. This way,
dev
is always ahead of master
.
The contents of master
should always pass all the unit tests. Merges from
dev
to master
should not be accepted otherwise.
Versions of master
will be tagged VERSION_X.Y.Z
according to milestones.
See https://gitgub.com/geomdata/gda-public/tags
Profiling¶
When writing new code, try to think about what a good object model would look like, from an end-user’s perspective. Be Pythonic and avoid surprises.
Try writing the code first in pure Python to establish an object model and
input/putput format. After it works, then use a profiling system (such as the
Python builtin cProfile
or something interactive like
https://github.com/what-studio/profiling) to determine where it is slow.
Another simple way to profile is to use %prun
or %timeit
in IPython or
Jupyter. Transfer the slow parts of your code to a compiled Cython .pyx file.
Style¶
Try to follow PEP8 (https://www.python.org/dev/peps/pep-0008/). Specifically, keep to 80-character lines, use 4 spaces (not TABs), and avoid superfluous spacing. Longer lines are acceptable in Cython (.pyx) code when they are due to type declarations.
A lot of our data is arrays. (Coordinates, point-clouds, matrices, etc.)
We use Pandas DataFrames (or Series) as a presentation layer, because it has
nice labels and indexing. We use the underlying NumPy array as the data layer,
as it is fast, especially via Cython. That is, the user should use DF =
DataFrame( )
. Performance-sensitive code should access DF.index.values
for the index set and DF.values
for the actual content.
Documentation¶
Comments describing functions, classes, and modules should be in docstrings. Use #-comments only for short clarifying notes to future developers; however, if you need #-comments in your code to understand it, it is probably too messy.
Make sure the documentation is sane, via Sphinx
(gda_env) bash$ cd /path/to/gda-public/
(gda_env) bash$ python setup.py build_doc_html
(gda_env) bash$ python setup.py build_doc_latex
and open a browser to http:///path/to/gda-public/docs/index.html
Testing¶
Write tests before, during, and after writing .. code:
(gda_env) bash$ cd /path/to/gda-public/
(gda_env) bash$ py.test
Simple function tests can be in docstrings, run using doctest
or (preferably)
py.test
. More complicated tests should be written under the tests/
directory.
Try to keep dependencies to the Anaconda built-in libraries.
If newer versions of Python and NumPy break this code, debug by creating a conda env (or python venv) with different versions to compare.
Docker Build Environment¶
To build a new Docker environment, get Docker running on your machine and do this ..code:
bash$ cd /path/to/gda-public
bash$ docker build -t <name-for-container> -f .dockerfile .
The official container is at https://hub.docker.com/r/geomdata/builder/
This container is used for automated unit tests, as in .gitlab-ci.yml