Word of the day – corpus

A text corpus (pl. corpora) is a large and structured set of texts usually stored, processed and analysed electronically. They are used to do statistical analysis, checking occurrences or validating linguistic rules. They are also used by dictionary makers to find definitions of words. The word corpus comes from the Latin for body.

According to an article in the New York Times on this topic that I found today, the verb migrate is used much more frequently with the direction south than with north. Pink things tend to be fluffy, while green things are more likely to be fuzzy. We tend to chide ourselves but we are more likely to lambaste others. The word fake is most commonly associated with smiles, tans, IDs, passports, fur and boobs.

The article contains many other interesting examples, all taken from the Oxford English Corpus (OEC), a 1.8-billion-word database of written and spoken English.

I found another corpus of English today that’s accessible online: the British National Corpus – it’s smaller than the OEC – only 100 million words – and covers mainly British English.

Do you know of similar corpora for other languages?

Duxes and testamurs

Two words I came across recently that left me puzzled were dux and testamur. After some investigation, I discovered that dux is the title given to the top academic student in a graduating class of a school, and that it’s used in Scotland, Australia, New Zealand and Iceland. I understand that the US equivalent is valedictorian. I don’t know if there’s an equivalent in England or Wales.

Dux comes from the Latin word for leader, via the verb ducere, to lead, and is also the root of the English word duke, the French duc, the Italian duce, and the Venetian doge.

A testamur or testimonium is a certificate issued by a university to signify that a student has satisfied the requirements of a particular course and has graduated, according to this site. It’s used mainly in Australia. Elsewhere I believe such documents are usually called diplomas.

Testamur comes from the Latin Ita testamur, meaning “We testify/certify” – the words used to begin such certificiates, according to Wikipedia.

Aonbheannaigh / Uncyrn

Ar an mbus go Gleann Cholm Cille, bhuail mé le buachaill as Baile Átha Cliath agus rinne muid comhrá faoi gach cineál seafóid. Dúirt sé liom go raibh sé ag caint le turasóirí o Meiriceá uair amháin agus dúirt sé leo go bhfuil aonbheannaigh agus leipreachain i nDún na nGall. Chreid siad go bhfuil aonbheannaigh ann, ach ní thug siad isteach go bhfuil leipreachain ann, go dúirt sé leo “Is mise ceathrú leipreachán”!

Ar y bws i Gleann Cholm Cille, mi gwrddais â llanc o Ddulyn a sgwrison ni am llawer o bethau lol. Mi ddywedodd e oedd e’n sgwrsio gyda twristiaid o’r Unol Daliethiau unwaith a dywedodd e bod uncyrn a leprechaun yn Donegal. Mi goelion nhw mewn yr uncyrn, ond na goelion nhw bod leprechaun yn bodoli, hyd y ddywedodd e, “chwarter leprechaun ydw i”!

On the bus to Glencolmcille I met a lad from Dublin and we chatted about all sorts of nonsense. He told me that he was talking to some American tourist one time and told them that there are unicorns and leprechauns in Dongel. They believed in the unicorns, but not in the leprechauns, until he mentioned that he himself quarter leprechaun!

Word of the day – Bowser

Today’s word, bowser, has been mentioned a lot on the radio and on TV here recently. In the UK a bowser is a mobile water tank used to supply fresh water in emergency situations, such as the recent/current floods, where normal supplies have broken down or are insufficient.

You can see some examples of water bowsers here.

Bowsers got their name from Sylvanus Bowser, an early designer of petrol pumps in Australia who founded the the company, S.F. Bowser, Inc., a pioneer in the production of fuel handling and oil purification equipment. Bowser is used as a trade name for petrol pumps in Australia and Canada, and the word’s meaning has expanded to cover other kinds of pumps, and also water tanks and fuel tanks.

Source: http://en.wikipedia.org/wiki/Bowser

Oideas Gael

Tháinig mé ar ais go Brighton oíche Shathairn i ndiaidh turas an-fhada ar bhusanna, eitleán agus traein. Bhí an Scoil Shamhraidh in Oideas Gael ar fheabhas ar fad. Chas mé le go leor daoine an deas agus tréitheach, lena n-áirítear an tUachtarán na hÉirinn, cé bhí sa rang ceana agus mise, d’fhoghlaim mé níos mó Gaeilge agus amhráin Gaelach, agus chuala mé ceol agus filíocht den chéad scoth. Beidh mé ag dul ar ais go Gleann Cholm Cille ag an am chéanna don bhliain seo chugainn gan amhras ar bith.

Mi ddes i ‘nôl i Brighton nos Sadwrn ar ôl taith hir iawn ar bysiau, awyren a thrên. Oedd yr ysgol haf yn Oideas Gael yn wych. Mi gwrddais â llawer o bobl dymunol a dawnus iawn, yn cynnwys yr arlywydd Iwerddon, pwy oedd yn yr un dosbarth â fi, mi ddysgais mwy o Wyddeleg a chaneuon, a mi wrandais gerddoriaeth a barddoniaeth ardderchog. Bydda i’n mynd yn ôl y blwyddyn nesa yn ddi-os.

I got back to Brighton on Saturday night after a long, long journey by bus, plane and train. The summer school at Oideas Gael was fantastic. I met many interesting and talented people, including the president of Ireland, who was in the same class as me, I learnt more Irish and Irish songs, and I heard some excellent music and poetry. I’ll definitely be back there next year.

I’ll try and write a bit more about my adventures soon.

Urban Irish

According to some of the people I met in Ireland last week, Irish might become a mainly urban language in the future. At the moment the majority of regular Irish speakers live in remote, rural areas, the Gaeltachtaí. These areas are suffering from depopulation because there are few opportunities for young people, who tend to move elsewhere to study and work. Some return, but many don’t. In some of the rural Gaeltachtaí the language remains strong, however in others the numbers of people using Irish as their main language is shrinking.

Not all Gaeltachtaí are in rural areas though – in West Belfast there is a thriving and growing community of Irish speakers, which was established in the late 1960s by six Irish-speaking families. In 1970 the first Irish medium primary school in Northern Ireland, Bunscoil Phobal Feirste, opened its doors, and the first Irish medium nursery school, Naíscoil, was set up in 1978. Since then numerous Irish medium nursery and primary schools have opened, and there are three secondary schools as well. There is also a daily Irish language newspaper – Lá Nua – and an Irish language community radio station – Raidió Fáilte. One of the people I met in Glencolmcille works for this radio station and he did a number of short interviews with people attending the summer school, including myself.

According to Wikipedia, the varieties of Irish native to Northern Ireland became extinct as spoken languages when the last native speaker of Rathlin Irish died in 1985. However over 10% of the population now have some knowledge of Irish – mainly the Donegal dialect of Ulster Irish. The Irish speakers in Belfast and Northern Ireland in general seem determined to keep the language alive there whatever obstacles are put in their way, and there is no shortage of obstacles.

Cathlab Multilingual Phrasebook

I received an email today from a nurse who works in a hospital in Melbourne, Australia and who is compiling a collection of multilingual phrases to assist communication with non-English speaking patients while an interpreter is being sought. He is looking for more translations and sounds files. Can you help? His contact details are on the site.

In other news, tomorrow I’m off to Ireland to take part in the Irish Language & Culture Summer School at Oideas Gael in Glencolmcille. I’ll be away for a week and won’t be blogging during that time.

Amárach beidh mé ag dul go hÉirinn chun páirt a ghlachadh san Scoil Shamhraidh i dTeanga & Cultúr in Oideas Gael i nGleann Cholm Cille. Beidh mé as baile ar feadh seachtaine agus ní bheidh mé ag scríobh ar mo bhlog i gcaitheamh an t-am seo.

Éire / Iwerddon

Beidh mé dhul go hÉirinn amárach chun páirt a ghlacadh san Scoil Shamhraidh i dTeanga & Cultúr in Oideas Gael i nGleann Cholm Cille. Beidh mé ansin ar feadh seachtaine. Anuraidh agus arú anuraidh, rinne mé cúrsaí i dteanga amháin, ach i mbliana, chinn mé cúrsa i dteanga agus cultúr a dhéanamh mar athrú. Beidh ranganna Gaeilge ann ar maidin, agus i ndiaidh lón beidh gníomhaíochtaí cultúrtha ann mar feadóg stáin a sheinm, damhsa, amhránaíocht ar an Sean-nós, srl.

Photo of Glencolmcille

Bydda i’n mynd i Iwerddon yfory i gymryd rhan yn yr ysgol haf iaith a diwylliant yn Oideas Gael yn Glencolmcille. Bydda i’n yno am wythnos. Y llynedd ac y blwyddyn cyn hynny, nes i cwrs mewn iaith un unig, ond eleni dw i wedi penderfynu gwneud cwrs mewn iaith a diwylliant am newid. Bydd dosbarthiadau Gwyddeleg yn y bore, ac ar ôl cinio bydd gweithgareddau diwylliant fel canu’r chwiban, dawnsio, canu, ayyb.

Photo of Glencolmcille

Bidh mi a’ dol gu Eirinn a-màireach a’ gabhail pàirt aig an Sgoil Shamhraidh Cànain is Cultar ann an Oideas Gael ann an Gleann Cholm Cille. Bidh mi ann an sin airson aon t-seachdain a-mhàin. An-uiridh agus an bliadhna roimh sin, rinn mi cùrsa cànain a-mhain, ach am bliadhna seo shocraich mi cùrsa cànain is cultar a’ dhèanamh airson atharrachadh. Bidh clasaichean ann sa mhadainn, agus an dèidh lòn bidh gnìomhan cultair mar feadag a chluich, dannsa, seinn, etc.

Rivers of white and run arounds

Continuing yesterday’s theme of typography, here are a few more interesting typographic terms I came across today:

River of white
– a column of white space that occurs when word space in quite a few successive lines of type happen to end up below each other, as mentioned by P Terry Hunt in the comments on yesterday’s post.

Run around
– this when you fit the text around a picture or other design element.

Pagination
– this means either arranging the type and other elements so that they will be output in page format, or numbering the pages.

This is a term I heard frequently when I worked in the design department as a lone web developer surrounded by graphic designers. Since then the internet side of the company has expanded considerably.

Gutter
– the white space between columns on a page.

Widow
– either a single short line at the top of the page or column which is the end of a sentence or a paragraph, or a single word or syllable standing as the last line of a paragraph.

Source: http://www.typography-1st.com