All times are UTC [ DST ]





Post new topic Reply to topic  [ 24 posts ]  Go to page Previous  1, 2, 3
  Print view

Re: Translator
Author Message
PostPosted: Fri 27 Apr 2012 2:44 am 
Offline

Joined:Sun 19 Apr 2009 9:02 am
Posts:1010
benny335 wrote:
now what

How many words is your corpus?

_________________
english*deutsch*nederlands*català*castellano*gaelainn*cymraeg*français*svenska*韓國말*漢語


Top
 Profile  
 

Re: Translator
PostPosted: Wed 02 May 2012 8:56 pm 
Offline

Joined:Sat 28 May 2011 2:46 pm
Posts:76
about 100. There will be more but the process of making a new word is a some what long one, and I dont need to worry about negetive words because if the word has a specific tone it becomes a negative word.


Top
 Profile  
 

Re: Translator
PostPosted: Wed 02 May 2012 10:39 pm 
Offline

Joined:Sun 19 Apr 2009 9:02 am
Posts:1010
benny335 wrote:
about 100. There will be more but the process of making a new word is a some what long one, and I dont need to worry about negetive words because if the word has a specific tone it becomes a negative word.

No, your corpus not your lexicon. A lexicon is just a list of words with definitions. A text corpus is a body of texts where these words are used in context. A huge source of texts for machine translation, for instance, are EU and UN documents, since these agencies produce thousands of documents translated into various languages.

The designer of Google Translate, Franz Josef Och, says that it takes a bilingual corpus of about a million words and monolingual corpora of a billion words each to make a good base for each language pair you want to translate between. Better get writing!

_________________
english*deutsch*nederlands*català*castellano*gaelainn*cymraeg*français*svenska*韓國말*漢語


Top
 Profile  
 

Re: Translator
PostPosted: Thu 03 May 2012 8:16 pm 
Offline

Joined:Sat 28 May 2011 2:46 pm
Posts:76
linguoboy wrote:
benny335 wrote:
about 100. There will be more but the process of making a new word is a some what long one, and I dont need to worry about negetive words because if the word has a specific tone it becomes a negative word.

No, your corpus not your lexicon. A lexicon is just a list of words with definitions. A text corpus is a body of texts where these words are used in context. A huge source of texts for machine translation, for instance, are EU and UN documents, since these agencies produce thousands of documents translated into various languages.

The designer of Google Translate, Franz Josef Och, says that it takes a bilingual corpus of about a million words and monolingual corpora of a billion words each to make a good base for each language pair you want to translate between. Better get writing!


Oh my!
So basically i could find a U.N. or some long document and translate it into my language using colloqueolism (I know it isn't spelled right sorry) context and all that. Right. If so would that be the base of it?


Top
 Profile  
 

Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 24 posts ]  Go to page Previous  1, 2, 3

All times are UTC [ DST ]


  Who is online

Users browsing this forum: No registered users and 2 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
Discount worldwide hotel reservations from DirectRooms best choice and lowest rates


Powered by phpBB © 2000, 2002, 2005, 2007 phpBB Group