Commit 37c1b6db authored by Ivan Tyagov's avatar Ivan Tyagov

During ingestion the relation between Data Set and Data Stream is set on Data...

During ingestion the relation between Data Set and Data Stream is set on Data Ingestion Line. This seemed enough for small Data Sets (few thousands of Data Streams). With large Data Sets this is inefficient as getting list of Data Streams for a Data Set takes sometimes tens of minutes.
Thus we needed a faster way to loop up: which is an explicitl relation over set category and proper MySQL indexing.
This commit is a trade-off between cost of storage (i.e. a category) compared to scalability and performance of a system.
parent 49a4d496
...@@ -4,7 +4,7 @@ VALUES ...@@ -4,7 +4,7 @@ VALUES
<dtml-in prefix="loop" expr="_.range(_.len(uid))"> <dtml-in prefix="loop" expr="_.range(_.len(uid))">
( (
<dtml-sqlvar expr="uid[loop_item]" type="int">, <dtml-sqlvar expr="uid[loop_item]" type="int">,
<dtml-sqlvar expr="DataStream_getSetUid[loop_item]" type="int" optional>, <dtml-sqlvar expr="getSetUid[loop_item]" type="int" optional>,
<dtml-sqlvar expr="getSize[loop_item]" type="string" optional>, <dtml-sqlvar expr="getSize[loop_item]" type="string" optional>,
<dtml-sqlvar expr="getVersion[loop_item]" type="string" optional> <dtml-sqlvar expr="getVersion[loop_item]" type="string" optional>
) )
......
...@@ -15,7 +15,7 @@ ...@@ -15,7 +15,7 @@
<value> <string>uid\n <value> <string>uid\n
getSize\n getSize\n
getVersion\n getVersion\n
DataStream_getSetUid</string> </value> getSetUid</string> </value>
</item> </item>
<item> <item>
<key> <string>cache_time_</string> </key> <key> <string>cache_time_</string> </key>
......
...@@ -117,6 +117,9 @@ data_ingestion.start() ...@@ -117,6 +117,9 @@ data_ingestion.start()
data_operation = operation_line.getResourceValue() data_operation = operation_line.getResourceValue()
data_stream = input_line.getAggregateDataStreamValue() data_stream = input_line.getAggregateDataStreamValue()
# A Data Stream should point to its Data Set
data_stream.setSetValue(data_set)
# if not split (one single ingestion) validate the data stream # if not split (one single ingestion) validate the data stream
if eof == reference_end_single: if eof == reference_end_single:
data_stream.validate() data_stream.validate()
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment