The institution of legislative regulation of the content of information resources has aggravated the problem of automatic detection and blocking of prohibited content. We propose an approach to solving this problem. In this approach, a thematic analysis of websites is complemented by a genre one, which allows identification of the activity carried out through a website and, therefore, brings about a more accurate recognition and localization of the illicit content. The decision on the presence of prohibited content on a website page is made on the basis of both analysis of the page text content and results of thematic and genre analysis of the site as a whole. Software and Russian-language resources for the detection of prohibited content related to the topic "Drug addiction and drugs" have been developed.
|Number of pages||8|
|Journal||CEUR Workshop Proceedings|
|Publication status||Published - 1 Jan 2017|
- Filtering prohibited content
- Thematic text analysis
- Website classification
- Website genre analysis