Implementation How to use an extension in snakemake when some special substitution combinations are not needed?

I tried to implement How to use an extension in snakemake when some special substitution combinations are not needed?

The goal is to only handle crossed combinations between SUPERGROUPS

:

from itertools import product

DOMAINS=["Metallophos"]
SUPERGROUPS=["2supergroups","5supergroups"]
SUPERGROUPS_INVERSED=["5supergroups","2supergroups"]
CUTOFFS=["0"]

def filter_combinator(combinator, blacklist):
    def filtered_combinator(*args, **kwargs):
        for wc_comb in combinator(*args, **kwargs):
            # Use frozenset instead of tuple
            # in order to accomodate
            # unpredictable wildcard order
            if frozenset(wc_comb) not in blacklist:
                yield wc_comb
    return filtered_combinator

# "2supergroups/5supergroups" and "5supergroups/2supergroups" are undesired
forbidden = {
    frozenset({("supergroup", "2supergroups"), ("supergroup_other", "2supergroups")}),
    frozenset({("supergroup", "5supergroups"), ("supergroup_other", "5supergroups")})}

filtered_product = filter_combinator(product, forbidden)

rule target :
    input:
        expand(expand("results/{{domain}}/{supergroup}/{supergroup_other}/OGSmapping.txt.list.{{cutoff}}.statistics", filtered_product, supergroup=SUPERGROUPS, supergroup_other = SUPERGROUPS_INVERSED), cutoff=CUTOFFS, domain = DOMAINS)

rule tree_measures:
    input:
        tree="results/{domain}/{supergroup}/RAxML_bipartitionsBranchLabels.bbhlist.txt.{domain}.fa.aligned.rp.me-25.id.phylip.supergroups.for.notung",
        list="results/{domain}/{supergroup}/hmmer_search_bbh_1/bbhlist.txt.{domain}.fa.OGs.tbl.txt.0.list.txt.nh.OGs.txt",
        mapping1="results/{domain}/{supergroup_other}/{supergroup}/OGSmapping.txt.list",
        categories="results/{domain}/{supergroup}/{supergroup_other}/OGSmapping.txt.categories",
        mapping2="results/{domain}/{supergroup}/{supergroup_other}/OGSmapping.txt.list",
        supergroups="results/{domain}/{supergroup}/hmmer_search_2/{domain}.fa.OGs.tbl.txt.{cutoff}.supergroups.csv"
    output:
        "results/{domain}/{supergroup}/{supergroup_other}/OGSmapping.txt.list.{cutoff}.statistics"
    shell:
        "~/tools/Python-2.7.11/python scripts/tree_measures.py {input.tree} {input.list} {input.mapping1} {input.categories} {input.mapping2} {input.supergroups} {wildcards.cutoff} results/{wildcards.domain}/{wildcards.supergroup}/{wildcards.supergroup_other}/"

      

But I still get the error:

Missing input files for rule tree_measures:
results/Metallophos/5supergroups/5supergroups/OGSmapping.txt.list
results/Metallophos/5supergroups/5supergroups/OGSmapping.txt.categories

      

What am I missing?

+3


source to share


1 answer


You seem to need to do the extension in 2 steps like below:

rule target :
    input:
        expand(expand("results/{{domain}}/{supergroup}/{supergroup_other}/OGSmapping.txt.list.{{cutoff}}.statistics", filtered_product, supergroup=SUPERGROUPS, supergroup_other = SUPERGROUPS_INVERSED), cutoff=CUTOFFS, domain = DOMAINS)

      

The internal extension uses a trick filtered_product

, while the external one uses the usual.

Another approach is to use itertools.permutations

for the inner list:



from itertools import permutations

DOMAINS=["Metallophos"]
SUPERGROUPS=["2supergroups","5supergroups"]
CUTOFFS=["0"]

rule target :
    input:
        expand(
            ["results/{{domain}}/{supergroup}/{supergroup_other}/OGSmapping.txt.list.{{cutoff}}.statistics".format(supergroup=sgrp1, supergroup_other=sgrp2)
                for (sgrp1, sgrp2) in permutations(SUPERGROUPS)],
            cutoff=CUTOFFS, domain = DOMAINS)

      

Another possibility is to use zip

:

rule target :
    input:
        expand(
            ["results/{{domain}}/{supergroup}/{supergroup_other}/OGSmapping.txt.list.{{cutoff}}.statistics".format(supergroup=sgrp1, supergroup_other=sgrp2)
                for (sgrp1, sgrp2) in zip(SUPERGROUPS, SUPERGROUPS_INVERSED)],
            cutoff=CUTOFFS, domain = DOMAINS)

      

+2


source







All Articles