I'd like to ask for suggestions regarding suitable datastracture for
storing textual metadata along with a plain text string.
The input format I have is simply text with custom tags like <tag_name
tag_value>; I'd prefer to have this human readable format the original
For the application, the tags are extracted and stored in a custom
datastructure along with the plain text.The queries should either
return the tagset for a given text position (index) or reversely the
text indices for a given tag-value combination..
(I posted some more detailed remarks earlier, as I was beginning with this topic
Meanwhile I managed to get working code using python native
datastructures (nested defaultdicts and sets); however after some time
I am now planning to adapt this code for a web-based program and am
considering alternative datastructures, for simplicity I started with
sqlite (which should actually suffice for the data volume and traffic
I put together some code, which works as expected, but I suspect
somehow, that there must be better ways of doing it.
Two things I am not quite clear about are using the placeholders for
the data identifiers and "chaining" the SELECT parameters.
I Couldn't find a way to use "?" placeholder for table or column
names, hence I ended up using string interpolation for them and
placeholders for the data values, like.
curs.execute('SELECT * FROM "%s" WHERE "%s"==?' % (text_name,
is there a better way or is it not supposed to supply these
For getting the matching text indices given the tags, tag_values
combination I ended up with a clumsy query:
combined_query_list = ['SELECT "%s" FROM "%s" WHERE "%s"==?' %
(index_col, text_name, tag) for tag in tags]
sql_query = " INTERSECT ".join(combined_query_list)
which produces e.g.:
SELECT "ind" FROM "n" WHERE "KC"==? INTERSECT SELECT "ind" FROM "n"
select_begin = 'SELECT "%s" FROM "%s" WHERE ' % (index_col, text_name)
where_subquery = " AND ".join('"%s"==?' % (tag,) for tag in tags)
sql_query = select_begin + where_subquery
with the resulting query string like:
SELECT "ind" FROM "n" WHERE "KC"==? AND "VN"==? ('12', '1')
(BTW, are these queries equivalent, as the outputs suggest, or are
there some distinctions to be aware of?)
Anyway, I can't really believe, this would be the expected way ...
(I only marginally looked into sqlalchemy, which might simplify this a
bit, is this true? - Currently I only use the standard lib, depending
on the available server setup (python 2.6)).
Thanks in advance for any suggestions or pointers on both the sql
usage as well as the general datatype question.