Unsupervised Text Embedding Space Generation Using Generative Adversarial Networks for Text Synthesis

Generative Adversarial Networks (GAN) is a model for data synthesis, which creates plausible data through the competition of generator and discriminator. Although GAN application to image synthesis is extensively studied, it has inherent limitations to natural language generation. Because natural la...

Full description

Saved in:

Bibliographic Details
Main Authors:	Jun-Min Lee, Tae-Bin Ha
Format:	Article
Language:	English
Published:	Linköping University Electronic Press 2023-10-01
Series:	Northern European Journal of Language Technology
Online Access:	https://nejlt.ep.liu.se/article/view/4855
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1832591274963632128
author	Jun-Min Lee Tae-Bin Ha
author_facet	Jun-Min Lee Tae-Bin Ha
author_sort	Jun-Min Lee
collection	DOAJ
description	Generative Adversarial Networks (GAN) is a model for data synthesis, which creates plausible data through the competition of generator and discriminator. Although GAN application to image synthesis is extensively studied, it has inherent limitations to natural language generation. Because natural language is composed of discrete tokens, a generator has difficulty updating its gradient through backpropagation; therefore, most text-GAN studies generate sentences starting with a random token based on a reward system. Thus, the generators of previous studies are pre-trained in an autoregressive way before adversarial training, causing data memorization that synthesized sentences reproduce the training data. In this paper, we synthesize sentences using a framework similar to the original GAN. More specifically, we propose Text Embedding Space Generative Adversarial Networks (TESGAN) which generate continuous text embedding spaces instead of discrete tokens to solve the gradient backpropagation problem. Furthermore, TESGAN conducts unsupervised learning which does not directly refer to the text of the training data to overcome the data memorization issue. By adopting this novel method, TESGAN can synthesize new sentences, showing the potential of unsupervised learning for text synthesis. We expect to see extended research combining Large Language Models with a new perspective of viewing text as an continuous space.
format	Article
id	doaj-art-f34039ae81fe4ff284ee70612d56b47c
institution	Kabale University
issn	2000-1533
language	English
publishDate	2023-10-01
publisher	Linköping University Electronic Press
record_format	Article
series	Northern European Journal of Language Technology
spelling	doaj-art-f34039ae81fe4ff284ee70612d56b47c2025-01-22T15:25:14ZengLinköping University Electronic PressNorthern European Journal of Language Technology2000-15332023-10-019110.3384/nejlt.2000-1533.2023.4855Unsupervised Text Embedding Space Generation Using Generative Adversarial Networks for Text SynthesisJun-Min Lee0Tae-Bin HaKorea Advanced Institute of Science and TechnologyGenerative Adversarial Networks (GAN) is a model for data synthesis, which creates plausible data through the competition of generator and discriminator. Although GAN application to image synthesis is extensively studied, it has inherent limitations to natural language generation. Because natural language is composed of discrete tokens, a generator has difficulty updating its gradient through backpropagation; therefore, most text-GAN studies generate sentences starting with a random token based on a reward system. Thus, the generators of previous studies are pre-trained in an autoregressive way before adversarial training, causing data memorization that synthesized sentences reproduce the training data. In this paper, we synthesize sentences using a framework similar to the original GAN. More specifically, we propose Text Embedding Space Generative Adversarial Networks (TESGAN) which generate continuous text embedding spaces instead of discrete tokens to solve the gradient backpropagation problem. Furthermore, TESGAN conducts unsupervised learning which does not directly refer to the text of the training data to overcome the data memorization issue. By adopting this novel method, TESGAN can synthesize new sentences, showing the potential of unsupervised learning for text synthesis. We expect to see extended research combining Large Language Models with a new perspective of viewing text as an continuous space.https://nejlt.ep.liu.se/article/view/4855
spellingShingle	Jun-Min Lee Tae-Bin Ha Unsupervised Text Embedding Space Generation Using Generative Adversarial Networks for Text Synthesis Northern European Journal of Language Technology
title	Unsupervised Text Embedding Space Generation Using Generative Adversarial Networks for Text Synthesis
title_full	Unsupervised Text Embedding Space Generation Using Generative Adversarial Networks for Text Synthesis
title_fullStr	Unsupervised Text Embedding Space Generation Using Generative Adversarial Networks for Text Synthesis
title_full_unstemmed	Unsupervised Text Embedding Space Generation Using Generative Adversarial Networks for Text Synthesis
title_short	Unsupervised Text Embedding Space Generation Using Generative Adversarial Networks for Text Synthesis
title_sort	unsupervised text embedding space generation using generative adversarial networks for text synthesis
url	https://nejlt.ep.liu.se/article/view/4855
work_keys_str_mv	AT junminlee unsupervisedtextembeddingspacegenerationusinggenerativeadversarialnetworksfortextsynthesis AT taebinha unsupervisedtextembeddingspacegenerationusinggenerativeadversarialnetworksfortextsynthesis

Unsupervised Text Embedding Space Generation Using Generative Adversarial Networks for Text Synthesis

Similar Items