language-icon Old Web
English
Sign In

Multi-task learning

Multi-task learning (MTL) is a subfield of machine learning in which multiple learning tasks are solved at the same time, while exploiting commonalities and differences across tasks. This can result in improved learning efficiency and prediction accuracy for the task-specific models, when compared to training the models separately. Early versions of MTL were called 'hints'.Multitask Learning is an approach to inductive transfer that improves generalization by using the domain information contained in the training signals of related tasks as an inductive bias. It does this by learning tasks in parallel while using a shared representation; what is learned for each task can help other tasks be learned better.    (1)    (2)    (3)    (P)Regularizer — With the separable kernel, it can be shown (below) that | | f | | H 2 = ∑ s , t = 1 T A t , s † ⟨ f s , f t ⟩ H k { extstyle ||f||_{mathcal {H}}^{2}=sum _{s,t=1}^{T}A_{t,s}^{dagger }langle f_{s},f_{t} angle _{{mathcal {H}}_{k}}} , where A t , s † {displaystyle A_{t,s}^{dagger }} is the t , s {displaystyle t,s} element of the pseudoinverse of A {displaystyle A} , and H k {displaystyle {mathcal {H}}_{k}} is the RKHS based on the scalar kernel k {displaystyle k} , and f t ( x ) = ∑ i = 1 n k ( x , x i ) A t ⊤ c i { extstyle f_{t}(x)=sum _{i=1}^{n}k(x,x_{i})A_{t}^{ op }c_{i}} . This formulation shows that A t , s † {displaystyle A_{t,s}^{dagger }} controls the weight of the penalty associated with ⟨ f s , f t ⟩ H k { extstyle langle f_{s},f_{t} angle _{{mathcal {H}}_{k}}} . (Note that ⟨ f s , f t ⟩ H k { extstyle langle f_{s},f_{t} angle _{{mathcal {H}}_{k}}} arises from | | f t | | H k = ⟨ f t , f t ⟩ H k { extstyle ||f_{t}||_{{mathcal {H}}_{k}}=langle f_{t},f_{t} angle _{{mathcal {H}}_{k}}} .) ‖ f ‖ H 2 = ⟨ ∑ i = 1 n γ ( ( x i , t i ) , ⋅ ) c i t i , ∑ j = 1 n γ ( ( x j , t j ) , ⋅ ) c j t j ⟩ H = ∑ i , j = 1 n c i t i c j t j γ ( ( x i , t i ) , ( x j , t j ) ) = ∑ i , j = 1 n ∑ s , t = 1 T c i t c j s k ( x i , x j ) A s , t = ∑ i , j = 1 n k ( x i , x j ) ⟨ c i , A c j ⟩ R T = ∑ i , j = 1 n k ( x i , x j ) ⟨ c i , A A † A c j ⟩ R T = ∑ i , j = 1 n k ( x i , x j ) ⟨ A c i , A † A c j ⟩ R T = ∑ i , j = 1 n ∑ s , t = 1 T ( A c i ) t ( A c j ) s k ( x i , x j ) A s , t † = ∑ s , t = 1 T A s , t † ⟨ ∑ i = 1 n k ( x i , ⋅ ) ( A c i ) t , ∑ j = 1 n k ( x j , ⋅ ) ( A c j ) s ⟩ H k = ∑ s , t = 1 T A s , t † ⟨ f t , f s ⟩ H k {displaystyle {egin{aligned}|f|_{mathcal {H}}^{2}&=leftlangle sum _{i=1}^{n}gamma ((x_{i},t_{i}),cdot )c_{i}^{t_{i}},sum _{j=1}^{n}gamma ((x_{j},t_{j}),cdot )c_{j}^{t_{j}} ight angle _{mathcal {H}}\&=sum _{i,j=1}^{n}c_{i}^{t_{i}}c_{j}^{t_{j}}gamma ((x_{i},t_{i}),(x_{j},t_{j}))\&=sum _{i,j=1}^{n}sum _{s,t=1}^{T}c_{i}^{t}c_{j}^{s}k(x_{i},x_{j})A_{s,t}\&=sum _{i,j=1}^{n}k(x_{i},x_{j})langle c_{i},Ac_{j} angle _{mathbb {R} ^{T}}\&=sum _{i,j=1}^{n}k(x_{i},x_{j})langle c_{i},AA^{dagger }Ac_{j} angle _{mathbb {R} ^{T}}\&=sum _{i,j=1}^{n}k(x_{i},x_{j})langle Ac_{i},A^{dagger }Ac_{j} angle _{mathbb {R} ^{T}}\&=sum _{i,j=1}^{n}sum _{s,t=1}^{T}(Ac_{i})^{t}(Ac_{j})^{s}k(x_{i},x_{j})A_{s,t}^{dagger }\&=sum _{s,t=1}^{T}A_{s,t}^{dagger }langle sum _{i=1}^{n}k(x_{i},cdot )(Ac_{i})^{t},sum _{j=1}^{n}k(x_{j},cdot )(Ac_{j})^{s} angle _{{mathcal {H}}_{k}}\&=sum _{s,t=1}^{T}A_{s,t}^{dagger }langle f_{t},f_{s} angle _{{mathcal {H}}_{k}}end{aligned}}} Output metric — an alternative output metric on Y T {displaystyle {mathcal {Y}}^{T}} can be induced by the inner product ⟨ y 1 , y 2 ⟩ Θ = ⟨ y 1 , Θ y 2 ⟩ R T {displaystyle langle y_{1},y_{2} angle _{Theta }=langle y_{1},Theta y_{2} angle _{mathbb {R} ^{T}}} . With the squared loss there is an equivalence between the separable kernels k ( ⋅ , ⋅ ) I T {displaystyle k(cdot ,cdot )I_{T}} under the alternative metric, and k ( ⋅ , ⋅ ) Θ {displaystyle k(cdot ,cdot )Theta } , under the canonical metric.Output mapping — Outputs can be mapped as L : Y T → Y ~ {displaystyle L:{mathcal {Y}}^{T} ightarrow {mathcal { ilde {Y}}}} to a higher dimensional space to encode complex structures such as trees, graphs and strings. For linear maps L, with appropriate choice of separable kernel, it can be shown that A = L ⊤ L {displaystyle A=L^{ op }L} .    (Q)    (R)    (S) Multi-task learning (MTL) is a subfield of machine learning in which multiple learning tasks are solved at the same time, while exploiting commonalities and differences across tasks. This can result in improved learning efficiency and prediction accuracy for the task-specific models, when compared to training the models separately. Early versions of MTL were called 'hints'.

[ "Unsupervised learning", "Machine learning", "Artificial intelligence", "Pattern recognition", "task", "Sample exclusion dimension", "Inductive bias", "Inductive transfer", "Contrast set learning" ]
Parent Topic
Child Topic
    No Parent Topic