Auto Scraper v0.6

RetroPie has a new website and forum. Please visit https://retropie.org.uk/ for the new site. The new forum is located at https://retropie.org.uk/forum/. This forum is left here as a read-only archive.

Tagged: 

This topic contains 117 replies, has 20 voices, and was last updated by Profile photo of rafaelr rafaelr 1 year, 2 months ago.

Viewing 35 posts - 36 through 70 (of 118 total)
  • Author
    Posts
  • #82131
    Profile photo of ceuse
    ceuse
    Participant

    Hey mate, i saw you fixed the error with backslashs & image subfolders. although i kinda liked the way it organised with sub-image folders. would be great if you put a switch in (-seperate_folders or smth)

    also another gbc hashfile / problem gbc rom :

    MD5 Checksum: 1F1FB3CF8783F880BC796D667BE60231
    SHA-1 Checksum: DD6E952B730C4BD85F8734156D43A2616B68C053
    SHA-256 Checksum: 4B9EDBB8BFA01AF9FE525E3B645D396493AF7E5ABAFFD1245D11DE698A23257F
    SHA-512 Checksum: 68E26E68D3B76B44579C50CE78C80948C474C995C36170A5FF9E6742A428DE2FF9A3009D64CEFC6A06050FC384FAEC8D1174960439CBA599692A0802C37E7BDF
    Generated by MD5 & SHA Checksum Utility @ http://raylin.wordpress.com/downloads/md5-sha-1-checksum-utility

    it scrapes this image/game : https://thegamesdb.net/game/238/
    but it should actialy scrape : https://thegamesdb.net/game/20734/

    hope this info helps (since it shows something else then not found)

    #82154
    Profile photo of sselph
    sselph
    Participant

    I hopefully fixed the GBC entries in the DB. I also added zip support. Right now it is a little dumb in that it searches the files in the zip for the first one with a valid extension and attempts to hash it. It doesn’t look to see if the extension is one that should be zipped (currently only MD).

    ceuse, I’ll take a look at separating the images into sub folders, and will make sure the script generates the folder structure for you. It originally did this because of a bug in the parsing of paths in windows so when I fixed the path issues this side effect went away.

    Auto-scraper: https://github.com/sselph/scraper

    #82252
    Profile photo of sselph
    sselph
    Participant

    While theGamesDB was up today I scrapped all the games my script can match and propped up a service to serve images and xml to mimic the API. Now if the DB is down you can use -use_cache to query my service. To save bandwidth I only downloaded the thumbnail sized images but it is better than nothing.

    Auto-scraper: https://github.com/sselph/scraper

    #82268
    Profile photo of ceuse
    ceuse
    Participant

    Hey mate tried the newest beta (sega master system support)

    sadly nothing scraped (amd64 release used)

    example game :
    https://thegamesdb.net/game/2679/
    hash :

    MD5 Checksum: 0713F2E55A1EEA0D9E2FB7044740261B
    SHA-1 Checksum: 2A9090ED365E7425CA7A59F87B942C16B376F0A3
    SHA-256 Checksum: B9E65FF66D3F82006E9A3D0DDF98DC2525960903D7F8F557CE81185F28BCD9E2
    SHA-512 Checksum: E7F178C80224F87066F756B4C7777F69433AF1B9EDDA86BF9C6A4BC676758906523E382896CB8F9107A0A99E1D63AE4C3336FB3BDB5B16BBB097C96AFD1FA6D1
    Generated by MD5 & SHA Checksum Utility @ http://raylin.wordpress.com/downloads/md5-sha-1-checksum-utility

    hope my posts help you out. doesnt meen to anoy you just trying to help πŸ™‚

    #82272
    Profile photo of sselph
    sselph
    Participant

    Haha oops. I forgot to actually push the new CSV file. It should work now. If you’ve run the script in the past 30m just rm /tmp/hash.csv to clear the cached copy of the hashes.

    The feedback definitely helps.

    Auto-scraper: https://github.com/sselph/scraper

    #82273
    Profile photo of ceuse
    ceuse
    Participant

    works great now :-).
    btw did you edit anything about the gbc hash.csv yet? and how about that subfolder switch now? πŸ˜‰
    your awesome mate. slowly but surely my collection gets imaged πŸ™‚

    #82285
    Profile photo of sselph
    sselph
    Participant

    Yeah the GBC issues should be much better. I think there was ~60% coverage in theGamesDB. I also just added in your new flag -nested_img_dir to tell it to create the nested sub directories under images and it should create them all for you.

    Auto-scraper: https://github.com/sselph/scraper

    #82289
    Profile photo of ceuse
    ceuse
    Participant

    I Love you *blush* cant wait for the next release just saw the last commitments cant wait for the next release πŸ™‚

    #82317
    Profile photo of ceuse
    ceuse
    Participant

    tried the new version πŸ™‚ 32x and gg worked great. tourbographx-16 seems like an awefull low find rate

    another example :

    https://thegamesdb.net/game/23253/

    hash from file my rom :

    
    MD5 Checksum: 628F051FD0F90EB422664A1AAA53670B
    SHA-1 Checksum: 37FD9288404B749739DB1CA228EEE220932040FB
    SHA-256 Checksum: 9B074EEBF105E34F18735D54A5EC345AC3418E819C7833C7B0FAB8F1C23B4078
    SHA-512 Checksum: DCBD7249EEA90532FD9F95CC7E7EEF299A9F74C5DB8FB5D1850C62408986C19612C2F3983493B0262C84075C19CFBDF0D16811BFEC5219FA43BCADB703805BBE
    Generated by MD5 & SHA Checksum Utility @ http://raylin.wordpress.com/downloads/md5-sha-1-checksum-utility

    hope it helps. keep up the good work (i love the nested tag πŸ™‚

    edit :
    i think i broke my installation πŸ™ complete ui is broken cant even get in the menue anymore everything blank. i think i will reinstall and copy everything over again. all started with white rom pictures and got worse from there. anyone had this problem earlier?

    • This reply was modified 2 years, 5 months ago by Profile photo of ceuse ceuse.
    #82327
    Profile photo of sselph
    sselph
    Participant

    Yeah turbographx-16 was a low hit rate. Of the hashes I did have ~50% matched and appears the hashes from no-intro aren’t complete. That hash you have doesn’t match the hash I have for that game. If you want you can send me all the sha1 hashes along with the file names for your collection and I can update my DB. shopt -s globstar && shasum **/*.pce from a linux machine will do the trick easy enough or if the tool you are using can do a similar output that’d work. You can send it to me in a PM or email.

    Auto-scraper: https://github.com/sselph/scraper

    #82339
    Profile photo of ceuse
    ceuse
    Participant

    a binary update fixed my installation πŸ™‚ im slowly getting the hang of linux

    also send you the Hashfile via pn

    Edit : ok as soon as i readd the turbograph games & gamelist the complete emulationstation gets fucked up.

    after i Removed them again its stilled messed up (i notice by alot of [][][][] in the options for sort by in the gamelist. also quite a few white pictures in the gamelist. strange stuff) gonna update binaries again and stay away from turbographfx games for a while

    • This reply was modified 2 years, 5 months ago by Profile photo of ceuse ceuse.
    • This reply was modified 2 years, 5 months ago by Profile photo of ceuse ceuse.
    • This reply was modified 2 years, 5 months ago by Profile photo of ceuse ceuse.
    #82347
    Profile photo of sselph
    sselph
    Participant

    Odd is this only with the gamelist.xml there? If so, maybe there is something about certain file names or the downloaded data that I’m not escaping correctly and/or ES isn’t probably handling? I’ll try and reconstruct your gamelist.xml from the information you sent and see if I see something that might cause that. I know ES doesn’t like unicode so I may double check and make sure to remove anything that can’t be encoded in acsii.

    Auto-scraper: https://github.com/sselph/scraper

    #82348
    Profile photo of ceuse
    ceuse
    Participant

    Unicode could be a thing. prolly need to check my gamelist.xmls which i edited manually if it gets encoded wrong. Is UTF-8 without BOM the right encodeing or better ansi?

    edit: first google result said ansi would be better. prolly that was my problem since i edited the gamelist.xml manually also. well first the emulationstation needs finish compile ^^

    edit2: it also seems that its a memory issue when i added another system to the gui. gotta have to reallocate memory according to friend google

    edit3: yep it was the number of systems. 12 are working fine, when i added 2 more it broke.

    • This reply was modified 2 years, 5 months ago by Profile photo of ceuse ceuse.
    • This reply was modified 2 years, 5 months ago by Profile photo of ceuse ceuse.
    • This reply was modified 2 years, 5 months ago by Profile photo of ceuse ceuse.
    • This reply was modified 2 years, 5 months ago by Profile photo of ceuse ceuse.
    • This reply was modified 2 years, 5 months ago by Profile photo of ceuse ceuse.
    #82493
    Profile photo of ceuse
    ceuse
    Participant

    sooo. how about atari 2600 and gba support ? πŸ™‚

    #82494
    Profile photo of sselph
    sselph
    Participant

    I thought I had GBA support already but never really tested it so maybe there are issues. I can look into the atari support but was currently looking at what it would take to add MAME but with something like 28000 games it might be easier to add atari.

    Auto-scraper: https://github.com/sselph/scraper

    #82525
    Profile photo of ceuse
    ceuse
    Participant

    me neither, didnt knew gba worked aswell since it wasnt descriped anywhere πŸ™‚ works quite well πŸ™‚ and atari i ask because i got alot roms form that system πŸ˜‰

    #82644
    Profile photo of sselph
    sselph
    Participant

    I added some Atari 2600 hashes to the mix. The coverage wasn’t as good as I would’ve liked. I’m in the process of refactoring so I can pull in other data sources which will hopefully fill in some of the gaps with some basic data. Sorry it took a little longer than normal, don’t have as much time these days.

    Auto-scraper: https://github.com/sselph/scraper

    #82744
    Profile photo of sselph
    sselph
    Participant

    Added OpenVGDB as an alternative datasource. It should hopefully fill in some of the gaps in theGamesDB’s DB. It will use this DB when it can’t find data in my original hash.csv lookup. I’ll get back to trying to add MAME now.

    Auto-scraper: https://github.com/sselph/scraper

    #82757
    Profile photo of ceuse
    ceuse
    Participant

    thanks for the atari code πŸ™‚ ovgdb seems rather bad though. checked with atari2600 and they got no images and mostly wrong data. dont know if atari2600 is just bad there or ovdb is just bad

    #82762
    Profile photo of sselph
    sselph
    Participant

    That is unfortunate because it looked like it could be a good alternative. The stuff I looked at was okay but didn’t look at atari. I wanted to start trying to get basic information even if it was not as complete but the last thing I want is something being incorrectly identified. I can filter out results from ovgdb that don’t have description and images which might help. I can also switch -use_ovgdb flag to default to false but in the meantime you can do -use_ovgdb=false to disable it.

    Auto-scraper: https://github.com/sselph/scraper

    #82960
    Profile photo of ceuse
    ceuse
    Participant

    sooooo…. how about even more systems? πŸ™‚ atari 5200 & 7800? or whatever else is missing πŸ™‚

    #83207
    Profile photo of ceuse
    ceuse
    Participant

    ok i tried the scraper again today and found some strange errors while using it :

    any idea what this is?

    edit ok i just think atari 2600 generaly is broken.. alot of double scrapes. completly wrong scrapes and even got a ps4 picture scraped .. strange stuff

    • This reply was modified 2 years, 5 months ago by Profile photo of ceuse ceuse.
    #84310
    Profile photo of imsuperduckie
    imsuperduckie
    Participant

    Tried out both version 54 and 53 and am getting the following errors. Can someone help?

    2014/12/30 13:47:15 ERR: error processing 2020 Super Baseball (U).smc: image: unknown format

    #84317
    Profile photo of sselph
    sselph
    Participant

    This error is from Go trying to detect the format of one of the images it downloaded and can’t match it to jpeg or png. If the error is happening consistently, maybe there is something odd with the image or thumbnail on thegamesdb, you could try the -use_cache to use my copy of the images. Maybe Google’s caching service will have fixed the issue.

    Auto-scraper: https://github.com/sselph/scraper

    #84323
    Profile photo of techstep
    techstep
    Participant

    Doesn’t scrap atari st. The hash updates and then it just stops.

    #84324
    Profile photo of techstep
    techstep
    Participant

    Atari2600 scraped perfectly for me. 375 games all with pictures and descriptions.

    #84340
    Profile photo of nolageek
    nolageek
    Participant

    Running it now, cant wait! πŸ™‚

    Question, do we have to have the scraper file in each of the rom directories, or can we just keep it in /roms/ and then run ‘scraper snes’ to process the snes directory?

    Edit: I see this is already an option – I should have done ‘./scraper -?’ before posting! πŸ™‚

    • This reply was modified 2 years, 3 months ago by Profile photo of nolageek nolageek.
    #84342
    Profile photo of nolageek
    nolageek
    Participant

    snes seemed to work really well, atari2600, not so much. Probably 80%-90% failed to process the .bin file almost all had “hash not found” errors. nes is having a few errors with hashes not being found.

    Do these mean there’s an issue with the rom file?

    #84353
    Profile photo of sselph
    sselph
    Participant

    I originally designed this to build the gamelist.xml on my desktop then copy everything over to the pi so I added options to give the directories on the local and remote systems but the default, to make things easy, was to copy it to the rom dir. It would be possible to add an option to detect if it is running on a retropie installation to be smarter about directories.

    The hash not found means that it hashed the rom file and didn’t find a match in the list I compiled. This means the hash wasn’t part of the no-intro set which only has known good rom hashes or I didn’t find the game on thegamesdb. With atari the coverage is not great, I only found ~%50 of the no-intro roms in thegamesdb.

    Auto-scraper: https://github.com/sselph/scraper

    #84367
    Profile photo of nolageek
    nolageek
    Participant

    This script has been a game changer for me (almost literally!), thanks so much!

    One request I have would be a way to include our own database file some kind of way? I have a quite a few homebrew games that I’ve downloaded and I’d like to have those not be overwritten if I have to run this again, since I have to add them by hand.

    #84368
    Profile photo of sselph
    sselph
    Participant

    I’m glad it helped. For the DB the simple solution might be to just provide the base xml file that would be appended with missing information. The other option is to have you provide a leveldb or sqlite type db with the hash and information but that might be a lot of overhead.

    Auto-scraper: https://github.com/sselph/scraper

    #84436
    Profile photo of proxycell
    proxycell
    Participant

    if i combined -use_cache -use_gdb _use_ovgdb together, what would the process of the app be?

    and would using -skip_check still use gdb but just not check first?

    #84438
    Profile photo of sselph
    sselph
    Participant

    -use_cache affects the -use_gdb in that it doesn’t actually download data from the gdb but uses my cached version of the gdb data. -use_cache with -use_gdb=false doesn’t change anything. -use_ovgdb and -use_gdb together will check GDB first then fall back to the OpenVGDB if there isn’t a match in my hashes. So all three would use my cached version of the GDB if I have a matching hash in my DB then fallback to the OpenVGDB if there isn’t a match.

    -skip_check
    I just noticed some issues with gdb check but how it is supposed to work is if you are using gdb and and not using the cache then I try and determine if the gdb is up first by trying to fetch the game with id=1. I do this so I can give a nice looking error upfront. The -skip_check flag is there in case there are issues with fetching this game but the user knows for sure that the GDB is otherwise working. Right now the GDB check runs even if you aren’t using the GDB, I’ll fix that.

    Auto-scraper: https://github.com/sselph/scraper

    #84655
    Profile photo of imsuperduckie
    imsuperduckie
    Participant

    Thanks, the -use_cache works perfectly.

    Any chance you have the scraper built for mame yet? If not, what is the preferred method for mame on the pie? I’ve tried the built in scraper and like everyone else mentioned, it’s pretty much crap. Not to mention, i only got descriptions and blank/black for thumbnail.

    Any advice is appreciated. Thanks!

    #84657
    Profile photo of sselph
    sselph
    Participant

    I’ve been working on MAME but it has been slow going for a few reasons. We had our first child a few months ago which reduced the amount of free time, I’m not familiar with MAME, the process and datasources are different from what I have now, and there are just so many titles. I plan to add a mame/fba mode that switches from hash matching to name matching and uses a different source of data for the images.

    Floob has a video on getting MAME/FBA gamelists built:

    Auto-scraper: https://github.com/sselph/scraper

Viewing 35 posts - 36 through 70 (of 118 total)

Forums are currently read only - please visit the new RetroPie forums at https://retropie.org.uk/forums/

Skip to toolbar