Determine the language of the text. Complex case



Image source: AnnaElli



The Antiplagiat system works with texts in different languages. Most of the papers submitted for review are written in Russian, English or Kazakh. Now the Antiplagiat index contains documents in more than 50 languages.



Fifteen of them have full support at all stages of document processing. In the near future we are planning to seriously expand this list. Our tireless researchers learn to translate even from fantastic languages . Languages ​​of text are important in several stages of document processing.



You need to know the language for the following operations:



  • splitting text into words;
  • search and fixes for technical workarounds;
  • merge hyphenation;
  • processing of apostrophes and other punctuation marks;
  • calculation of text statistics;
  • search for borrowings.


, . , , «». . , – .





, NTextCat / CLD3 /CLD2. CLD2 :



  • (~200 /c);
  • ;
  • ( );
  • ; , , , ;
  • C# ;
  • ( 80).




, , .





, , . . , :



  • / (, , ) “” ;
  • ;
  • , , …;
  • — , , .


CLD2





, – . , , , . . , : , .



, CLD2, . CLD2, .





: ( ).



1: CLD2.



2: , 4.



3: CLD2.



4: .



: , , . , , , « ».



, ( 2)



, . 1-2 (CLD2 ). , . , , , , , - CLD2.



… ( 3)



3.0: , , , .



3.1:



:



  1. , ( , .. ), , .
  2. CLD2 .


3.2: , . , , , .

3.3: . CLD2 .





( 4)



, , , . . , , - :



  1. , ;
  2. , ;
  3. .


( ). -. . , , . , . : « «-27».» 4 : «», «"», «», «-27".».





, , , . . , «» , . , 1-2 , . . , . , , . — , , , , : .



, .





, , . . , , , .



- . , , , . — , . . ( 4 ) , , .



, , - . , .





. CLD2:





, . — . — , , .



, CLD2.



, , .





, .





, «» «» , «» — «» – , «Jim» — «him» – , «» , , . CLD2 , .



:





«» . , ( CLD2) , , .





CLD2 , . . . , — .





?



, , . : , , , , . ( !). , «»: , . , , . ( ). , , . , .





– . – . – . .



, . -, CLD2 , 5. -, , . . , .



, , ...




All Articles