Skip to content

AHA

October 6, 2013
By cosh in Uncategorized

I still have a blog :). Let’s create sth useful!

By cosh in misc

Well, thats a real nice evening…

1. First time we fired up our chimney :)

14032010268

2. My Chili plant cultivation started to shout out of the Jiffy

chili

Tags: ,

By cosh in sones

During the last week we enabled the audience of the MSDN/TechNet cinema to get in touch with the sones GraphDB. Our demo showed the German Corpus based on one million sentences, 812K words and 118K sources. In my last post i showed a little capture of the VisualGraph handling on the MIcrosoft Surface table. In contrast. this one is about the type scheme of the GraphDB in comparism to the MySQL model.

MySQL model [Extracted from LCC documentation]

word

The most important table of the data base schema is the word list, called words.

sentence

The actual corpus is a collection of sentences stored in the table with the name sentences.

source and inv_so

Sometimes it is interesting to know from where a particularly peculiar example was drawn. For
research purposes it is also important to know that it is not an artificial example. Therefore the table
sources stores from which websites or other sources a given sentence was obtained and table inv_so
allows to look this information up conveniently.

image

co_n and co_s

As mentioned in the introduction, information about which words co-occur with each other is
very useful. The two tables co_n and co_s store this information. co_n stores, which words cooccurred
directly next to each other (bigrams). This expresses mostly typical uses of words with
each other. co_s on the other hand stores, which words co-occurred anywhere within sentences.
This expresses typically related or associated words.

inv_w

In order to efficiently find out in which sentences a given word occurred, the table inv_w has to
be accessed. It stores relations between word numbers and sentence numbers.

GraphDB type scheme

TextElement

The TextElement is the generalization of all further types. It consists of one attribute named “Content”. So the “Content” value of the word Microsoft is “Microsoft” :).

Source

A Source might be a plain text or a website. It consists one attribute with a list of Sentences.

Sentence

A Sentence is part of a source and contains words. So there have to be two attributes. On the one hand WordsInSentence which represents a weighted list of words (the weight is of type Integer and represents the position within the sentence) and on the other hand a BackwardEdge attribute named IsInSource. It points to Sources which contain the actual Sentence.

image

Word

A Word is a part of a Sentence. The IsInSentence attribute represents this relation. It is a BackwardEdge to Sentences that contain the actual word. Furthermore there are the neighbourship relations LeftNeighbour and RightNeighbour which are realized as weighted lists that point to other words (the weight is the significance of the relation between the words). Cooccurrences are analogue to neighbourships.

Queries

The following queries intent to find the Top10 cooccurrences of the word “Laptop”.

SQL:

select w.word as wort, k.sig as sig from co_s k, words w where k.w1_id=
(SELECT w_id FROM words w where word = “Laptop”) and k.w2_id=w.w_id
order by k.sig desc limit 10;

GQL:

from Word select TOP(Cooccurrences, 10) where Content = ‘Laptop’;

Tags: , , , ,

By cosh in sones

Yesterday we finished our Surface Demo for the upcoming Cebit 2k10 so far. In my opinion it looks pretty dynamic and it is really a great fun to play with. Actually we had just two days for the whole design. Thx to the guys from Microsoft and UID who helped us developing this cool piece of software.

 

Some more details of the GraphDB will follow soon.

Tags: , , ,

Muxing *.mkv for PS3

August 25, 2009
By cosh in toolz

Wohoo, /me did it. Finally! I found the easiest way to transform any *.mkv file to a PS3-playable format. The solution is to use tsMuxeR and simply mux it to m2ts. The only thing you have to consider is to change the level of video tracks to 4.1. That’s it.

tsMuxeR

Tags: , ,

GeekNight # 1

August 25, 2009
By cosh in sones

Last Thursday, some developers of Sones including me went to the GeekNight at Berlin. The main topic was “OpenSocial” and there were some interesting talks about. The second part was the integration in [a-z]*VZ. All in all it was pretty interesting, especially the discussions with some Studi/Mein/SchülerVZ after the official part. I am really looking forward to the next GeekNight.

 

Wheeeeeeere is cooooosh?

Tags: ,

Frhed – Hex editor

August 23, 2009
By cosh in toolz

Do you know what’s really annoying? Right… Opening an 5GB video file with a stupid hex-editor. While searching for an appropriate editor i found the Frhed (“Freier Hexeditor”). Among other features it enables the user to load a file partially. Well, this helped me out and i was able to edit my video file in no-time ;). 

openPartialfrhed

Tags: ,

Hello world!

August 18, 2009
By admin in misc

Wohoo, it works. Nice.