How to write and place on the site a photobank for> 100,000 paintings

Suppose you have> 100'000 images that need to be sorted and conveniently posted on the web for mass viewing. It can be anything - a gallery of all art created by mankind (in the task that I did), or a historical photo archive of the city of Moscow, or stills from movies, or a common archive of holiday photos from a major travel agency, or a website for stock illustrations and photos, or Archive of images at large mass media for many years - in which it is necessary to restore order, organize navigation and access for employees from the internal network.



I will tell you how it is advisable to program.



Keywords and their inheritance



The modern approach used by all photo banks and galleries is to supply illustrations with a set of tags (keywords). I have developed this approach in two directions: (1) tags can be inherited (the user was looking for berries - he found a picture with the tag "cherry"), and (2) tags can be attached at the level of not single illustrations, but directories.



The downside to this tagging approach is that you search based on keywords, ignoring the plot of the picture. The dragon killed by the girl and the dragon that killed the girl are two different plots, but the same for the word list: Dragon, Girl, Death, and Winner (if there was a battle). The keyword approach will not allow retrieving a sample for the query "Dead Dragon" that does not include an image of a dragon defeating a slain enemy.



The main tags are those that are visible to the user in the alphabetical directory. Additional tags are those that are available to the user only by manually entering the names of these tags in the text of the search string. I consider the optimal number of tags: main = 1/75, additional = 1/195, of the number of images.



Designate plurals (riders, mountains, etc.) of tags in file names as <tag name>! (i.e. exclamation mark). You will need a dictionary of how tags can be called - plural, feminine / masculine, synonymous words, misspellings.



Keep the dictionary of tags in 4 files: Marks.csv - main tags, Other.csv - additional tags, Wrong.csv - incorrect spellings, synonyms, plural tag names, Artists.csv - authors. In the files Marks.csv and Other.csv, after the tag identifier and the main name in Russian, the parent tags are listed (i.e. inheritance) .



Marks.csv



Arwen; ( );Person,Girl,Elf,LordOfTheRings
ThorinOakenshield; ;Person,Male,Beard,LordOfTheRings


It is written here that Arwen is a persona, a girl, an elf, a character in The Lord of the Rings; Thorin Oakenshield - a person, a man, wears a beard, the character of "The Lord of the Rings". Accordingly, when the user searches for "The Lord of the Rings", all the images of Arwen and Thorin will be found. When looking for "beard" - among other things, there will be Thorin. When you search for "Thorin", it will also be found, since this abbreviated spelling is in Wrong.csv.



Folder structure



If we overlay the “show girls” or “show the sun” sample on 100,000 images, the number of results will be too large. But this will not happen if the images are split into folders. For example, in the root directory there is a Dragons folder, inside it is a Yellow folder, inside it is a Girls folder (i.e. images with girls), and inside it (across all subfolders) there are 200 images. In this case, not these 200 images will appear in the search results, but the folder containing them. It's better for the user too.



Here, however, there is a problem of closely related ties. Kings almost always wear crowns in images, but not in all cases. Let's say there is a folder called Kings, and it contains 3000 images, of which 2500 are in crowns. Here, in regards to the crown - the simple approach of showing the folder doesn't work.



I think the optimal number of folders = 1/28 of the number of images



As you understand, if the file is already in the Dragons / Yellow / Girls folder, then you do not need to add these tags to the file name, add only those tag identifiers that do not follow to the file name from its storage location.







Multilingual, icons, texts, virtual subfolders



A file _.jpg with a size of 200 (width) * 280 (height) is created inside each folder - this is the folder icon when viewing it (the text is displayed over it), both when the user is in the parent folder, and when the user browses the search results (if found this folder). Keyword icons have the same resolution.



Also, in many folders a file _.txt is created, consisting of the following lines:



Artefact \ _. Txt (fragment)



=Mielofon
=Mjolnir
=Palantir
 =ThanosGlove
=Glass-Potion
by-DavisonCarvalho=*
TheWitcher/Wolf-Head-Logo| 
DisneyPrincess/Moana/HeartOfTeFiti|  
SuperHeroes/Hellraiser/HellraiserBox| 
-m|Artefact


Here we see the types of records:



  1. Flasks = Glass-Potion - aliases for subfolders. In the illustration above, we see that the pseudonym for the Japan folder was not recorded, and when viewing the folder, it is not translated into Russian. Two tags - Glass and Potion (Glass and Potion) - are translated in one word.
  2. by-DavisonCarvalho = * - no alias required
  3. SuperHeroes / Hellraiser / HellraiserBox | Lemarshan's Box is a virtual subfolder. A subfolder in another directory will also be displayed here under the given name.
  4. -m | Artefact - the folder represents the Artifact tag. If text is attached to this tag, it will be written under the illustrations.


Size on disk



Now 111'000 images occupy 65GB of disk space. And this despite the fact that in many cases it is necessary to make a heavier png format from them:



  • ( ), paint-.
  • - , .
  • .webp, .png, (, , ).
  • .png, .jpg, .gif. .




index.php - launched without parameters, it displays the gallery root folder, alphabet and search string. By clicking on a subfolder in the root folder, it goes to it. By clicking on a letter of the alphabet, it goes to the main tags starting with this letter. When you enter text into the search string, it goes to the tag identified by this text.



i.php is a tool for viewing one selected image. Allows you to navigate to tags from the list that this image matches.

img - root folder of web gallery

m - folder with generated thumbnails of all images. Thumbnails are 200 in height, width in proportion to the image. The structure of the m folder follows the structure of the img folder. The m folder is created programmatically before uploading each version of the gallery.

Tags - for each keyword, contains a file with the result of its search in directories.

Marks - file types:



  1. For each keyword, contains its thumbnail file
  2. For most keywords, contains a file with their textual description or a thematic story, anecdote
  3. For some keywords, contains one or more html-text thematic stories
  4. Also, this folder contains files like <letter code> .txt - alphabetically sorted lists of keywords for each letter of the Russian alphabet.


The procedure for uploading a new version of the gallery to the site



A specially written program (using Delphi and the Graphics32 library) does the following:



  1. — , ( .. ), ( Wrong.csv), _.txt, , .
  2. . , : .jpg .png ..
  3. . . — , .
  4. .


Then, both the gallery folder and these materials are uploaded to the server.



The web gallery engine does not use a DBMS.



Hosting



I use Avahost hosting , 100GB of disk space costs 500 rubles per month. As you can see, with a collection size of 65GB, + thumbnails and so on, and a hosting size of 100GB, the upgrade is never seamless. There is not enough space to first unload a completely new version and then seamlessly switch to it, there is an inevitable interval of site downtime of several hours. I now do updates once a month.



Files are sent to hosting in the form of archives. The cPanel system currently used on all hosting systems can unpack only zip archives. It is advisable to use files up to 2.5GB in length, otherwise after the completion of uploading a file to a folder via the cPanel web interface, the upload progress bar (the initial color is blue) may turn red instead of green. What is the difference, I did not understand (the file seems to be uploaded normally even in this case), but in this case I re-upload. For some folders, this results in folders having to be split into several separate zip archives.



Earlier I tried to create hosting at home, bought a used netbook on Avito for 2000 rubles. Set it up, everything works. A couple of days pass - it does not work. I reboot - no use. Then, it worked again, then again it didn't. I changed the netbook (I bought another, more powerful one, also on Avito, for 3000 rubles) and began to use another software - the same thing. I changed three providers (Seven Sky> Akado> MGTS) - the same thing. In short, the equipment standing at the providers cuts off apparently home hosting, and the providers themselves do not know about it. Or what other reasons. Go to hosters, don't do hosting at home. Indie hosting sucks. Even a primitive router for the interaction of network games is better to pile on php and put on hosting than to keep it at home or in the office, and wait for something to break for no reasonable reason.



Note to the hostess (about hosting)



In addition to technical characteristics (of which only one is really needed - the number of gigabytes, everything else is the numbers for everyone on their own scale, I came to the conclusion that the characteristics are better for Avachosts), there is such a parameter - abuse-resistance. "Abuse" is a complaint. Moreover, the reason for the complaint may arise out of the blue, for example, at the studio of Artemy Lebedev . Therefore, a normal hosting has the parameter bullet-resistance, resistance to complaints. (Not to be confused with special hosting, where you can host anything at all, even though the phishing page of a Sberbank with an invitation to enter your personal account - these are separate offices, I don't understand them).



Monetization



Let's say that you are a major media outlet, and you decide to make a significant part of your photos (accumulated over decades) public. For example, using the technology described above. How can you make money from this (except for branding by imposing watermarks on photos, as well as selling them)? Well, if you are the media then you know, I'll tell you for the rest.



Most monetization schemes give you 10 kopecks from the average site visitor per day (including both those who visited the site once and those who visited several times a day). Similarly, it gives the author of the site and YAN (Yandex Advertising Network). To earn more, you need to involve people in religious sects or sell miraculous talismans, I don't do that. Aggregators of such advertisements are easy to find on the Internet, they pay to achieve results (a person bought a Kirby vacuum cleaner or became a member of a sect). Moreover, it's a shame: I don't do this, but Yandex every now and then drives something like this through my site. As a result, people still sometimes sell bullshit at a high price (through Yandex), but I get 6 to 10 times less from this.



Many people I know have an ad block or something like that by default and Yandex ads are not visible. And they themselves did not put it. Why so - I do not know.



Yandex allows you to withdraw the amount upon reaching 3000 rubles.



Also, the site owner can register at miralinks.ru and post articles. The address of the article and links to it must be posted forever, i.e. make sure that their placement is not too toxic. It is permissible for new articles to replace the previous ones in the next pages of history.



You can sell the placement of banners, and otherwise in accordance with the meaning of the resource.



Where can I see this technology in action (what project am I doing)?



I am making a site corchaosis.ru - a kind of wiki analogue for graphics.



Why hasn't it been possible to promote it yet (as I think):



- People only need a means of achieving achievements.



Even if people go to an art gallery to look at paintings, they still care about material achievement. I visited the Tretyakov Gallery. I saw Swan Lake.



If a web resource does not bring a person closer to material achievements, then they do not go to it.

People themselves may think otherwise, that they like paintings. It does not matter. If we do something about people, we must be "more difficult" than people. Understand and realize more. If a fox eats chickens and mice, then the fox must be more perfect than chickens. From the level of representation of the chicken, the results of a fox cannot be achieved.



- People need interactive.



WEB 1.0 is dead.



If you can't offer interactivity, then nobody needs you.



You are not being watched. It's about getting results again. Kobvoy doesn't go into the jungle for tourism, he goes into the jungle to start his own ranch. While the site does not have the tools to create their own ranch (portfolio, etc.), cowboys are not interested in the jungle.



Where to get a ready-made engine



In principle, I have described everything you need to do to do it. You can write to me.



The local exe file is written in Delphi + Graphics32, the server side is two .php files.



All Articles