KBI0026: Passing configurations#
Adina Wagner <firstname.lastname@example.org>
configurations, shared repository clone
datalad_0.19.1, datalad-next_1.0.0b3, git_2.39.2
There are a variety of ways to configure datasets or DataLad operations. This KBI documents relevant resources, and highlights a few special cases and relevant recent developments about them.
A good introduction to the topic of configurations can be found in the DataLad Handbook.
An overview of Git-specific configurations can be found by running
git help --config, or in the Git documentation.
An overview of DataLad-specific configurations is in the technical DataLad documentation.
Specific documentation and examples for the
datalad configurationcommand are in its manpage. This command can be used to set, unset, or query configurations, and complements the capabilities of
git configwith, among other things, additional scopes or recursive operations.
In May 2023, a number of improvements to configuration handling were made in the DataLad extension
They fix edge cases in the core
datalad Python package.
The following paragraphs illustrate these edge cases for transparency, and a work-around requiring the use of environment variables.
At the time of writing though, an installation of datalad-next and the configuration
datalad.extensions.load next, as detailed in the project’s documentation, would provide the necessary fixes to your DataLad installation to make the code function as expected.
Edge-cases in the core datalad package#
On a system with only the core
datalad library installed, the
-c/--configuration flag of the
datalad main command displays a number of shortcomings for specific applications.
One of them is that configurations provided this way do not make it to the target process in every situation.
Here are two examples for configurations that do not get passed to the necessary subprocess:
1) Configuring a different committer name:
One could attempt to override the user name (
user.name Git config) of a specific
save operation via
$ datalad -c user.name=someoneelse save
2) Passing initialization configurations at cloning:
Likewise, one could attempt to initialize a dataset as “shared” in a
datalad clone call:
datalad -c core.sharedRepository=0600 \
clone <source> <destination>``)
However, both cases would not yield the desired result.
Datalad or its sub-processes would continue to use the user name configured in the applicable
.config file, and the
core.sharedRepository=0600 configuration would not be passed to the underlying internal
git init process.
Both code snippets would work as expected, however, if datalad-next is installed and loaded.
Workaround: Environment variables#
A workaround without
datlad-next requires the use of Git configurations in environment variables.
For example, in order to clone a dataset and initialize it as a shared repository without datalad-next, the following environment variables need to be set (line breaks added for readability):
$ GIT_CONFIG_COUNT=1 \
datalad clone <source> <destination>