Exporting the complete environment to GNU Parallel
I find it somewhat annoying that I cannot use aliases in GNU Parallel:
alias gi="grep -i"
parallel gi bar ::: foo
/bin/bash: gi: command not found
I am somewhat resigned to the fact that it is so. But as I read Accessing Associative Arrays in GNU Parallel I start to think: does it really have to be this way?
Is it possible to create a bash function that collects the entire environment into a function, exports that function, and calls GNU Parallel, which then imports the environment in the spawned shell using this function?
So I'm not talking about a specialized solution for gi
-alias, but a bash function that will accept all aliases / functions / variables (without explicitly specifying them), package them into a function that can be activated by GNU Parallel.
Something similar to:
env_parallel() {
# [... gather all environment/all aliases/all functions into parallel_environment() ...]
foreach alias in all aliases {
append alias definition to definition of parallel_environment()
}
foreach variable in all variables (including assoc arrays) {
append variable definition to definition of parallel_environment()
# Code somewhat similar to /questions/2154825/accessing-associative-arrays-in-gnu-parallel
}
foreach function in all functions {
append function definition to definition of parallel_environment()
}
# make parallel_environment visible to GNU Parallel
export -f parallel_environment
# Running parallel_environment will now create an environment with
# all variables/all aliases/all functions set in current state
# (with the exception of the function parallel_environment of course)
# Inside GNU parallel:
# if set parallel_environment(): prepend it to the command to run
`which parallel` "$@"
}
# Set an example alias
alias fb="echo fubar"
# Set an example variable
BAZ=quux
# Make an example function
myfunc() {
echo $BAZ
}
# This will record the current environment including the 3 examples
# put it into parallel_environment
# run parallel_environment (to set the environment)
# use the 3 examples
env_parallel parallel_environment\; fb bar {}\; myfunc ::: foo
# It should give the same output as running:
fb bar foo; myfunc
# Outputs:
# fubar bar foo
# quux
Progress: It looks like this is close to what I want to activate:
env_parallel() {
export parallel_environment='() {
'"$(echo "shopt -s expand_aliases"; alias;typeset -p | grep -vFf <(readonly);typeset -f)"'
}'
`which parallel` "$@"
}
VAR=foo
myfunc() {
echo $VAR $1
}
alias myf=myfunc
env_parallel parallel_environment';
' myfunc ::: bar # Works (but gives errors)
env_parallel parallel_environment';
' myf ::: bar # Works, but requires the \n after ;
So now I got to 1 question:
- weeds out all variables that cannot be assigned a value (for example, BASH_ARGC)
How can I list them?
source to share
GNU Parallel 20140822 implements this. To activate it, you need to run it once (for example in .bashrc):
env_parallel() {
export parallel_bash_environment='() {
'"$(echo "shopt -s expand_aliases 2>/dev/null"; alias;typeset -p | grep -vFf <(readonly; echo GROUPS; echo FUNCNAME; echo DIRSTACK; echo _; echo PIPESTATUS; echo USERNAME) | grep -v BASH_;typeset -f)"'
}'
# Run as: env_parallel ...
`which parallel` "$@"
unset parallel_bash_environment
}
And call GNU Parallel like:
env_parallel ...
This should put an end to the myth that it is not possible to export aliases: all you need is a little Behändigkeit (Thanks a lot @rici for the inspiration).
source to share
In principle, this should be possible. But as usual, there are a lot of details.
First, in bash it is entirely possible for a name to be both a function, a variable (scalar or array), and an alias. In addition, function and variable can be exported independently.
Thus, there env_parallel foo ...
would be some uncertainty if it foo
has more than one definition. Perhaps the best solution would be to detect the situation and report the error using syntax like:
env_parallel -a foo -f bar
to be more specific if necessary.
An easier option is to just export the ambiguity, which I do below.
So the basic logic for the importer used in env_parallel
could be something like this, leaving a lot of bugs and other subtleties:
# Helper functions for clarity. In practice, since they are all short,
# I'd probably in-line all of these by hand to reduce name pollution.
get_alias_() { alias "$1" 2>/dev/null; }
get_func_() { declare -f "$1" 2>/dev/null; }
get_var_() { [[ -v "$1" ]] && declare -p "$1" | sed '1s/--\?/-g/'; }
make_importer() {
local name_
export $1='() {
'"$(for name_ in "${@:2}"; do
local got_=()
get_alias_ "$name_" && got_+=(alias)
get_func_ "$name_" && got_+=(function)
get_var_ "$name_" && got_+=(variable)
if [[ -z $got_ ]]; then
echo "Not found: $name_" >>/dev/stderr
elif (( ${#got_[@]} > 1 )); then
printf >>/dev/stderr \
"Ambiguous: %s is%s\n" \
$name_ "$(printf " %s" "${got_[@]}")"
fi
done)"'
}'
}
In practice, there is no real point of defining a function in the local environment if the sole purpose is to pass it to the remote shell. It is enough to type the command export
. And while it is convenient to import imports into a function, as in Accessing Associative Arrays in GNU Parallel , it is not strictly necessary. This makes it easier to pass definitions with utilities like Gnu parallel
, xargs
or find
whatever I usually use for this hack. But depending on how you intend to use the definitions, you could simplify this simply by adding a list of definitions to the given command. (If you do that, you won't have to mess with the global flag from sed
in get_var_
.)
Finding what is in the environment
In case it's useful, here's how to get a list of all aliases, functions and variables:
Functions
declare -F | cut -d' ' -f3
Aliases (Note 1)
alias | awk '/^alias /{print substr($2,1,index($2,"=")-1)}'
Variables (Note 1)
declare -p | awk '$1=="declare"{o=(index($3, "="));print o?substr($3,1,o-1):$3}'
In a program, awk
you can check the type of a variable by looking $2
at which will usually be --
, but can be -A
for an associative array, -A
for an array with integer keys, -i
for an integer, -x
for an exported, and -r
for readonly. (More than one parameter can be used; it -aix
is an "exported" (not implemented) integer array.
Note 1
The commands alias
and declare -p
produce "reusable" output that can be evaluated or piped to another bash, so the values will be quoted. Unfortunately, quoting is just enough for eval; it is not good enough to avoid confusion. You can define, for example:
x='
declare -a FAKE
'
in this case, the output declare -p
will include:
declare -x='
declare -a FAKE
'
Hence, lists of aliases and variables should be treated as "possible names": all names will be included, but perhaps all included is not a name. This basically means ignoring errors:
for a in "${_aliases[@]}"; do
if
defn=$(alias $a 2>>/dev/null)
then
# do something with $defn
fi
done
source to share