KBI0026: Passing configurations#
- authors:
Adina Wagner <adina.wagner@t-online.de>
- discussion:
https://github.com/psychoinformatics-de/knowledge-base/pull/98
- keywords:
configurations, shared repository clone
- software-versions:
datalad_0.19.1, datalad-next_1.0.0b3, git_2.39.2
There are a variety of ways to configure datasets or DataLad operations. This KBI documents relevant resources, and highlights a few special cases and relevant recent developments about them.
Existing resources#
A good introduction to the topic of configurations can be found in the DataLad Handbook.
An overview of Git-specific configurations can be found by running
git help --config
, or in the Git documentation.An overview of DataLad-specific configurations is in the technical DataLad documentation.
Specific documentation and examples for the
datalad configuration
command are in its manpage. This command can be used to set, unset, or query configurations, and complements the capabilities ofgit config
with, among other things, additional scopes or recursive operations.
Recent developments#
In May 2023, a number of improvements to configuration handling were made in the DataLad extension datalad-next
.
They fix edge cases in the core datalad
Python package.
The following paragraphs illustrate these edge cases for transparency, and a work-around requiring the use of environment variables.
At the time of writing though, an installation of datalad-next and the configuration datalad.extensions.load next
, as detailed in the project’s documentation, would provide the necessary fixes to your DataLad installation to make the code function as expected.
Edge-cases in the core datalad package#
On a system with only the core datalad
library installed, the -c/--configuration
flag of the datalad
main command displays a number of shortcomings for specific applications.
One of them is that configurations provided this way do not make it to the target process in every situation.
Here are two examples for configurations that do not get passed to the necessary subprocess:
1) Configuring a different committer name:
One could attempt to override the user name (user.name
Git config) of a specific save
operation via
$ datalad -c user.name=someoneelse save
2) Passing initialization configurations at cloning:
Likewise, one could attempt to initialize a dataset as “shared” in a datalad clone
call:
datalad -c core.sharedRepository=0600 \
clone <source> <destination>``)
However, both cases would not yield the desired result.
Datalad or its sub-processes would continue to use the user name configured in the applicable .config
file, and the core.sharedRepository=0600
configuration would not be passed to the underlying internal git init
process.
Both code snippets would work as expected, however, if datalad-next is installed and loaded.
Workaround: Environment variables#
A workaround without datlad-next
requires the use of Git configurations in environment variables.
For example, in order to clone a dataset and initialize it as a shared repository without datalad-next, the following environment variables need to be set (line breaks added for readability):
$ GIT_CONFIG_COUNT=1 \
GIT_CONFIG_KEY_0=core.sharedRepository \
GIT_CONFIG_VALUE_0=0600 \
datalad clone <source> <destination>