Report on extensive experiments here .
Europarl parallel corpus (11 languages, common part, release v3)
The common part was extracted using English sentences to determine the set of sentences that has a translation in all the 11 languages. The extracted data has been checked and cleaned up.
Training data: 347,614 lines
Development set: 500 lines
Test set: 38,123 lines
References: 1 reference per line in the test set.