Translator

The place to discuss your conlangs and conlanging.

Re: Translator

Postby linguoboy » Fri 27 Apr 2012 2:44 am

benny335 wrote:now what

How many words is your corpus?
english*deutsch*nederlands*català*castellano*gaelainn*cymraeg*français*svenska*韓國말*漢語
linguoboy
 
Posts: 1029
Joined: Sun 19 Apr 2009 9:02 am

Re: Translator

Postby benny335 » Wed 02 May 2012 8:56 pm

about 100. There will be more but the process of making a new word is a some what long one, and I dont need to worry about negetive words because if the word has a specific tone it becomes a negative word.
benny335
 
Posts: 76
Joined: Sat 28 May 2011 2:46 pm

Re: Translator

Postby linguoboy » Wed 02 May 2012 10:39 pm

benny335 wrote:about 100. There will be more but the process of making a new word is a some what long one, and I dont need to worry about negetive words because if the word has a specific tone it becomes a negative word.

No, your corpus not your lexicon. A lexicon is just a list of words with definitions. A text corpus is a body of texts where these words are used in context. A huge source of texts for machine translation, for instance, are EU and UN documents, since these agencies produce thousands of documents translated into various languages.

The designer of Google Translate, Franz Josef Och, says that it takes a bilingual corpus of about a million words and monolingual corpora of a billion words each to make a good base for each language pair you want to translate between. Better get writing!
english*deutsch*nederlands*català*castellano*gaelainn*cymraeg*français*svenska*韓國말*漢語
linguoboy
 
Posts: 1029
Joined: Sun 19 Apr 2009 9:02 am

Re: Translator

Postby benny335 » Thu 03 May 2012 8:16 pm

linguoboy wrote:
benny335 wrote:about 100. There will be more but the process of making a new word is a some what long one, and I dont need to worry about negetive words because if the word has a specific tone it becomes a negative word.

No, your corpus not your lexicon. A lexicon is just a list of words with definitions. A text corpus is a body of texts where these words are used in context. A huge source of texts for machine translation, for instance, are EU and UN documents, since these agencies produce thousands of documents translated into various languages.

The designer of Google Translate, Franz Josef Och, says that it takes a bilingual corpus of about a million words and monolingual corpora of a billion words each to make a good base for each language pair you want to translate between. Better get writing!


Oh my!
So basically i could find a U.N. or some long document and translate it into my language using colloqueolism (I know it isn't spelled right sorry) context and all that. Right. If so would that be the base of it?
benny335
 
Posts: 76
Joined: Sat 28 May 2011 2:46 pm

Previous

Return to Conlangery

Who is online

Users browsing this forum: No registered users and 1 guest

cron