Situation
When feeding the network with multiple types of inputs, eg. feeding both a sequence and an image would lead to initial 2 flows of processing until the joint point and the action before the joint point is called concatenation. Concatenation in the middle of network is a must because the output is only 1 kind of layer.