How can I use an extension in snakemake when some special combinations of substitutions are not needed?

Suppose I have the following files on which I want to apply some processing automatically with snakemake:

test_input_C_1.txt
test_input_B_2.txt
test_input_A_2.txt
test_input_A_1.txt

      

The following snakefile is used expand

to identify the entire potential endpoint file:

rule all:
    input: expand("test_output_{text}_{num}.txt", text=["A", "B", "C"], num=[1, 2])

rule make_output:
    input: "test_input_{text}_{num}.txt"
    output: "test_output_{text}_{num}.txt"
    shell:
        """
        md5sum {input} > {output}
        """

      

Executing the above snake file results in the following error:

MissingInputException in line 4 of /tmp/Snakefile:
Missing input files for rule make_output:
test_input_B_1.txt

      

The cause of this error is what it expand

uses itertools.product

under the hood to generate wildcard combinations, some of which correspond to missing files.

How to filter out combinations of unwanted combinations?

+1


source to share


1 answer


The function expand

takes a second optional argument without a keyword to use another default function to combine wildcard values.

You can create a filtered version itertools.product

by wrapping it in a higher-order generator that checks that the resulting wildcard combination is not included in the predefined blacklist:

from itertools import product

def filter_combinator(combinator, blacklist):
    def filtered_combinator(*args, **kwargs):
        for wc_comb in combinator(*args, **kwargs):
            # Use frozenset instead of tuple
            # in order to accomodate
            # unpredictable wildcard order
            if frozenset(wc_comb) not in blacklist:
                yield wc_comb
    return filtered_combinator

# "B_1" and "C_2" are undesired
forbidden = {
    frozenset({("text", "B"), ("num", 1)}),
    frozenset({("text", "C"), ("num", 2)})}

filtered_product = filter_combinator(product, forbidden)

rule all:
    input:
        # Override default combination generator
        expand("test_output_{text}_{num}.txt", filtered_product, text=["A", "B", "C"], num=[1, 2])

rule make_output:
    input: "test_input_{text}_{num}.txt"
    output: "test_output_{text}_{num}.txt"
    shell:
        """
        md5sum {input} > {output}
        """

      


The missing wildcard combinations can be read from the configuration file.



Here's an example in json format:

{
    "missing" :
    [
        {
            "text" : "B",
            "num" : 1
        },
        {
            "text" : "C",
            "num" : 2
        }
    ]
}

      

The set forbidden

will look like this in the snake file:

forbidden = {frozenset(wc_comb.items()) for wc_comb in config["missing"]}

      

+4


source







All Articles