Adlice forum

Software feedback => MRF => Topic started by: jumpingwhale on August 24, 2017, 04:47:57 AM

Title: MRF non-ascii charset support
Post by: jumpingwhale on August 24, 2017, 04:47:57 AM
Hello tigzy. I have bunch of samples with non-ascii character file name.
Uploading large amount of samples trough WebUI at once is bit difficult, so I wrote small python script to use API.

During upload, I found there was no response from MRF handling non-ascii file named samples.
If I use WebUI, file upload succeeded but file names are corrupted.

Is there any solution about this problem?
Title: Re: MRF non-ascii charset support
Post by: Tigzy on August 24, 2017, 01:55:50 PM
Hey :)
Can you give me an example of non-ascii file name? (picture preferred, I don't know if the forum supports it)
Also, can you show me the upload script, I'll check there's no error.
Title: Re: MRF non-ascii charset support
Post by: jumpingwhale on August 28, 2017, 09:38:27 AM
Hey :)
Can you give me an example of non-ascii file name? (picture preferred, I don't know if the forum supports it)
Also, can you show me the upload script, I'll check there's no error.




First of all thx for your fast feedback





This is non-ascii file name example. `DB구입문의 연락처.doc` and its virustotal report.
https://www.virustotal.com/ko/file/106f6241cc72c38b53ba33ac0fc484695cd676594847f8dee9962e0aa56cacc0/analysis/1502953541/ (https://www.virustotal.com/ko/file/106f6241cc72c38b53ba33ac0fc484695cd676594847f8dee9962e0aa56cacc0/analysis/1502953541/)





This is upload result using web browser.

(https://yourocean.iptime.org/photo/webapi/thumb.php?api=SYNO.PhotoStation.Thumb&method=get&version=1&size=large&public_share_id=RX1kIIw0&id=photo_2f_75706c6f61645f62795f7765622e706e67&rotate_version=0&mtime=1503635488)





I used this python script to upload file. I know this script is not well coded but in case of uploading, works fine.(except vtsubmit, cksubmit and tags...)


Code: [Select]
TOKEN='MyToken'

def upload(filetoupload, vtsubmit=False, tags=None):
# URL setup
_get_param = (('token', TOKEN),
  ('action', 'uploadfiles'))
_url = '?'.join([URL, urllib.parse.urlencode(_get_param)])

# POST params setup
_metadata = {'index': 0,
'vtsubmit': vtsubmit,
'cksubmit': not vtsubmit,
'tags': tags}
_file = {'upload_file': open(filetoupload, 'rb'), }
_post_param = {'files_data': (_metadata, )}

_res = requests.post(_url, files=_file, data=_post_param)

if _res.status_code is 200:
return _res.content


for file in filelist:
upload(file, vtsubmit=True, tags='Malware, doc')


I felt difficulties coding this and even `tags` still not working. In case of `vtsubmit` and `cksubmit`, I don't know even what these mean... It was first time for me to upload files using its handle(or file object?) without reading its binary. Would you please suggest sample code for people like me? it would be great helpful.

I found your sample script on your last post. Why don't you add them to api document page?
Title: Re: MRF non-ascii charset support
Post by: Tigzy on August 30, 2017, 02:34:39 PM
Hey,
I'll add the upload script example to the documentation, that's indeed a good idea.

The character issue is due to the mysql database storing the data as ASCII.
We'll try to see if adding utf8 encoding/decoding solves the issue, it's added to the backlog (todo list for next version).

Is it working better (tags) with my upload script?
Title: Re: MRF non-ascii charset support
Post by: jumpingwhale on August 31, 2017, 03:31:42 AM
Hey,
I'll add the upload script example to the documentation, that's indeed a good idea.

The character issue is due to the mysql database storing the data as ASCII.
We'll try to see if adding utf8 encoding/decoding solves the issue, it's added to the backlog (todo list for next version).

Is it working better (tags) with my upload script?


Surely better, it works fine. Now I'm dealing with 'getfiles' API.
What I want to do is adding tag to certain samples. To achieve this, I follow steps below.


In step 1. API returns only one page number of samples which configured in config.php file(As you mentioned in API page, 'by default').

How can I 'getfiles' more than 1 page? my 'getfiles_by_tags' script described below

Code: [Select]
    def getfiles_by_tags(self, tags):
        # setup url
        _get_param = (('token', self.token), ('action', 'getfiles'), ('tags', tags))
        _url = '?'.join([self.url, urllib.parse.urlencode(_get_param)])

        _res = requests.get(_url)
       
        if _res.status_code is 200:
            _result = _res.json()
            for file in _result['files']:
                yield file['md5']
        else:
            return False
Title: Re: MRF non-ascii charset support
Post by: Tigzy on September 01, 2017, 04:11:15 PM
Unfortunately the current getfiles API doesn't allow to retrieve all the samples, for performances reason.
Imagine you have 1,000,000 samples, it's really dangerous to return everything.

What I would do here is to do the getfiles call into a loop with an incrementing page number like this:

Code: [Select]
thepage = 0
while True:
  results = query ( http://localhost/mrf/api.php?action=getfiles&token=bd5ca2c860ff601186cc50a42b213bbe&page=thepage&tags=test )
  if results.empty:
    break
  # do something with results
  thepage = thepage + 1
Title: Re: MRF non-ascii charset support
Post by: jumpingwhale on September 04, 2017, 04:00:55 AM
There was `page` variable in `getfiles` api... Thanks a lot, I was dang illiterate person.
Title: Re: MRF non-ascii charset support
Post by: Tigzy on September 05, 2017, 08:56:19 PM
No problem, anytime :)