Exporting the complete environment to GNU Parallel

I find it somewhat annoying that I cannot use aliases in GNU Parallel:

alias gi="grep -i"
parallel gi bar ::: foo
/bin/bash: gi: command not found

      

I am somewhat resigned to the fact that it is so. But as I read Accessing Associative Arrays in GNU Parallel I start to think: does it really have to be this way?

Is it possible to create a bash function that collects the entire environment into a function, exports that function, and calls GNU Parallel, which then imports the environment in the spawned shell using this function?

So I'm not talking about a specialized solution for gi

-alias, but a bash function that will accept all aliases / functions / variables (without explicitly specifying them), package them into a function that can be activated by GNU Parallel.

Something similar to:

env_parallel() {
  # [... gather all environment/all aliases/all functions into parallel_environment() ...]
  foreach alias in all aliases {
     append alias definition to definition of parallel_environment()
  }
  foreach variable in all variables (including assoc arrays) {
     append variable definition to definition of parallel_environment()
     # Code somewhat similar to /questions/2154825/accessing-associative-arrays-in-gnu-parallel
  }
  foreach function in all functions {
     append function definition to definition of parallel_environment()
  }

  # make parallel_environment visible to GNU Parallel
  export -f parallel_environment

  # Running parallel_environment will now create an environment with
  # all variables/all aliases/all functions set in current state 
  # (with the exception of the function parallel_environment of course)

  # Inside GNU parallel:
  #    if set parallel_environment(): prepend it to the command to run
  `which parallel` "$@"
}

# Set an example alias
alias fb="echo fubar"
# Set an example variable
BAZ=quux
# Make an example function
myfunc() {
  echo $BAZ
}

# This will record the current environment including the 3 examples
# put it into parallel_environment
# run parallel_environment (to set the environment)
# use the 3 examples
env_parallel parallel_environment\; fb bar {}\; myfunc ::: foo

# It should give the same output as running:
fb bar foo; myfunc
# Outputs:
#   fubar bar foo
#   quux

      

Progress: It looks like this is close to what I want to activate:

env_parallel() {
  export parallel_environment='() {
    '"$(echo "shopt -s expand_aliases"; alias;typeset -p | grep -vFf <(readonly);typeset -f)"'
  }'
  `which parallel` "$@"
}

VAR=foo
myfunc() {
  echo $VAR $1
}
alias myf=myfunc
env_parallel parallel_environment';
' myfunc ::: bar # Works (but gives errors)
env_parallel parallel_environment';
' myf ::: bar # Works, but requires the \n after ;

      

So now I got to 1 question:

  • weeds out all variables that cannot be assigned a value (for example, BASH_ARGC)

How can I list them?

+3


source to share


3 answers


GNU Parallel 20140822 implements this. To activate it, you need to run it once (for example in .bashrc):

env_parallel() {
    export parallel_bash_environment='() {
       '"$(echo "shopt -s expand_aliases 2>/dev/null"; alias;typeset -p | grep -vFf <(readonly; echo GROUPS; echo FUNCNAME; echo DIRSTACK; echo _; echo PIPESTATUS; echo USERNAME) | grep -v BASH_;typeset -f)"'
       }'
     # Run as: env_parallel ...
     `which parallel` "$@"
     unset parallel_bash_environment
}

      

And call GNU Parallel like:



env_parallel ...

      

This should put an end to the myth that it is not possible to export aliases: all you need is a little Behändigkeit (Thanks a lot @rici for the inspiration).

+1


source


In principle, this should be possible. But as usual, there are a lot of details.

First, in bash it is entirely possible for a name to be both a function, a variable (scalar or array), and an alias. In addition, function and variable can be exported independently.

Thus, there env_parallel foo ...

would be some uncertainty if it foo

has more than one definition. Perhaps the best solution would be to detect the situation and report the error using syntax like:

env_parallel -a foo -f bar

      

to be more specific if necessary.

An easier option is to just export the ambiguity, which I do below.

So the basic logic for the importer used in env_parallel

could be something like this, leaving a lot of bugs and other subtleties:

# Helper functions for clarity. In practice, since they are all short,
# I'd probably in-line all of these by hand to reduce name pollution.
get_alias_() { alias "$1" 2>/dev/null; }
get_func_()  { declare -f "$1" 2>/dev/null; }
get_var_()   { [[ -v "$1" ]] && declare -p "$1" | sed '1s/--\?/-g/'; }

make_importer() {
  local name_
  export $1='() {
    '"$(for name_ in "${@:2}"; do
          local got_=()
          get_alias_ "$name_" && got_+=(alias)
          get_func_  "$name_" && got_+=(function)
          get_var_   "$name_" && got_+=(variable)
          if [[ -z $got_ ]]; then
            echo "Not found: $name_" >>/dev/stderr
          elif (( ${#got_[@]} > 1 )); then
            printf >>/dev/stderr \
                   "Ambiguous: %s is%s\n" \
                   $name_ "$(printf " %s" "${got_[@]}")"
          fi
        done)"'
  }'
}

      

In practice, there is no real point of defining a function in the local environment if the sole purpose is to pass it to the remote shell. It is enough to type the command export

. And while it is convenient to import imports into a function, as in Accessing Associative Arrays in GNU Parallel , it is not strictly necessary. This makes it easier to pass definitions with utilities like Gnu parallel

, xargs

or find

whatever I usually use for this hack. But depending on how you intend to use the definitions, you could simplify this simply by adding a list of definitions to the given command. (If you do that, you won't have to mess with the global flag from sed

in get_var_

.)

Finding what is in the environment

In case it's useful, here's how to get a list of all aliases, functions and variables:

Functions

declare -F | cut -d' ' -f3

      



Aliases (Note 1)

alias | awk '/^alias /{print substr($2,1,index($2,"=")-1)}'

      

Variables (Note 1)

declare -p | awk '$1=="declare"{o=(index($3, "="));print o?substr($3,1,o-1):$3}'

      

In a program, awk

you can check the type of a variable by looking $2

at which will usually be --

, but can be -A

for an associative array, -A

for an array with integer keys, -i

for an integer, -x

for an exported, and -r

for readonly. (More than one parameter can be used; it -aix

is an "exported" (not implemented) integer array.

Note 1

The commands alias

and declare -p

produce "reusable" output that can be evaluated or piped to another bash, so the values ​​will be quoted. Unfortunately, quoting is just enough for eval; it is not good enough to avoid confusion. You can define, for example:

x='
declare -a FAKE
'

      

in this case, the output declare -p

will include:

declare -x='
declare -a FAKE
'

      

Hence, lists of aliases and variables should be treated as "possible names": all names will be included, but perhaps all included is not a name. This basically means ignoring errors:

for a in "${_aliases[@]}"; do
  if
     defn=$(alias $a 2>>/dev/null)
  then
     # do something with $defn
  fi
done

      

+1


source


As is often the case, the solution is to use a function, not an alias. You must first export function (since parallel

, and bash

both designed the GNU, parallel

knows how to deal with functions exported via bash

).

gi () {
    grep -i "$@"
}
export -f go
parallel gi bar ::: foo

      

0


source







All Articles