I have tried to detect several types of formats and currently only the
Microsoft Office ones are
those that cannot be detected accurately.
If Tika's detect(File file) method is used ms files are detected as follows
I guess the result from detection is the expected one.
doc - "application/msword"
But If Tika's detect(InputStream is) method is used the picture is not the
The results are:
doc - "application/x-tika-msoffice"
docx - "application/x-tika-ooxml"
Files for the test are created from MS Office 2007.
I couldn't find out why I get different results on same files.
Please let me know If I do something wrong or if there is some adequate
reason for this behaviour.