CNTK Transfer Learning with LSTM: Add a pre-provisioned network to another network

I have a prebuilt Tagger Seq-to-Seq network, which in its simplest form looks like this:

Network_1 = Sequential ([
    Embedding(emb_dim)
    Recurrence(LSTM(LSTM_dim))
    Dense(num_labels)
])

      

I would like to use the output of this as starting levels on another network. Basically, I would like to merge the attachments from network_1 (preprocessed) into the embed layer in network_2 like this:

Network_2 = Sequential ([
    Concat_embeddings ( Embedding(emb_dim), Network_1_embed() )
    Recurrence(LSTM(LSTM_dim))
    (Label('encoded_h'), Label('encoded_c'))
])


def Network_1_embed():
    loaded_model = load_model(path_to_network_1_saved_model);
    cloned_model = loaded_model.clone(CloneMethod.freeze);
    return cloned_model

def Concat_embeddings(emb1, emb2):
    X=Placeholder();
    return splice(emb1(X), emb2(X))

      

This gives me the following ValueError: Times: 1 leading size of right operand with shape '[50360]' does not match left operand pulling dimensions with shape '[293]'

For reference, we get [293], since emb_dim = 256 and num_network_1_labels = 37, and [50360] is the size of the dictionary for the network_2 input. Network_1 also had the same vocabulary mapping when learning so that it could take the same input and output a 37-dimensional vector for each token. How to do it? Thanks to

+3


source to share


1 answer


I think your problem is that you are using the whole Network_1

as an attachment, not just its embedding layer.

One way would be to define embed

separately and navigate through Network_1

:

embed = Embedding(emb_dim)
Network_1 = Sequential ([
    embed,
    Recurrence(LSTM(LSTM_dim)),
    Dense(num_labels)
])

      

Then run the command Network_1

, but save embed

:

embed.save(EMBED_PATH)

      

Explanation: Because Network_1

just calls embed

, they share the parameters, so training Network_1

will train the parameters embed

. The persistence embed

then gives you an embedding layer trained Network_1

. In fact, it's straightforward.



Then, to prepare the second model (in the second script), download embed

from disk and just use it:

Network_1_embed = load_model(EMBED_PATH)
Network_2 = Sequential ([
    ( Embedding(emb_dim), Network_1_embed() ),
    splice,
    Recurrence(LSTM(LSTM_dim)),
    (Label('encoded_h'), Label('encoded_c'))
])

      

Note the use of a function tuple as the first element passed to Sequential()

. Tuple means use both functions to the same input and generates two outputs which are then input to the next function splice

.

To keep the constant embed

, clone it with the Freeze parameter as you did in your example.

(I am not in front of the computer with the latest CNTK and cannot verify this, so it is possible that I made a mistake.)

0


source







All Articles