- 20 Jun, 2014 1 commit
-
-
Vinzenz Feenstra authored
Signed-off-by: Vinzenz Feenstra <evilissimo@gmail.com>
-
- 19 Jun, 2014 1 commit
-
-
Kevin Modzelewski authored
Implementation is pretty straightforward for now: - find all names that get accessed from a nested function - if any, create a closure object at function entry - any time we set a name accessed from a nested function, update its value in the closure - when evaluating a functiondef that needs a closure, attach the created closure to the created function object. Closures are currently passed as an extra argument before any python-level args, which I'm not convinced is the right strategy. It's works out fine but it feels messy to say that functions can have different C-level calling conventions. It felt worse to include the closure as part of the python-level arg passing. Maybe it should be passed after all the other arguments? Closures are currently just simple objects, on which we set and get Python-level attributes. The performance (which I haven't tested) relies on attribute access being made fast through the hidden-class inline caches. There are a number of ways that this could be improved: - be smarter about when we create the closure object, or when we update it. - not create empty pass-through closures - give the closures a pre-defined shape, since we know at irgen-time what names can get set. could probably avoid the inline cache machinery and also have better code.
-
- 18 Jun, 2014 4 commits
-
-
Kevin Modzelewski authored
-
Kevin Modzelewski authored
-
Kevin Modzelewski authored
Enable the VTune JIT support in llvm, and add it as a jit listener. I think it's mostly confirming my suspicion that the slowdown is cache-related... it's not being very helpful with determining why (it's in some function that it can't analyze). I updated the memory allocator to have strong thread-affinity (ie a thread now generally gets back memory that it had previously freed), but that doesn't seem to have any effect. Going to punt on further investigations for now, pretty happy though that there's an overall speedup with the grwl, even if there are still issues.
-
Kevin Modzelewski authored
Turns out a large amount of thread contention was coming from these shared counters -- disable some of them and add some thread-local caching
-
- 17 Jun, 2014 9 commits
-
-
Kevin Modzelewski authored
insert full blocks back at the end of the free list to hopefully reduce the amount of times we have to check them
-
Kevin Modzelewski authored
-
Kevin Modzelewski authored
Now, threads will claim an entire block at a time, and put it in a thread-local cache. In the common case, they can allocate out of this block, and only need to take out a lock if they run out.
-
Kevin Modzelewski authored
-
Kevin Modzelewski authored
-
Kevin Modzelewski authored
Trying to add a generic PerThread class has involved a journey into the wonderful world of template programming, including C++11 variadic templates.
-
Kevin Modzelewski authored
-
Kevin Modzelewski authored
Make rules more general, and generate rules that can't be patterns Made it easy to add a pyston_grwl target
-
Kevin Modzelewski authored
Add some basic locking to: - code generation (one lock for all of it) - garbage collection (spin lock for allocations, global serialization for collections) - lists (mutex per list object) Can run the GRWL on some simple tests (microbenchmarks/thread_contention.py and thread_uncontended.py) Performance is not great yet
-
- 11 Jun, 2014 4 commits
-
-
Kevin Modzelewski authored
-
Kevin Modzelewski authored
And then use that register info rather than sending a signal to the thread; this lets the thread that called AllowThreads avoid receiving signals ex during a syscall. I'm not sure if this is valid though; are we really guaranteed that the thread can't invalidate the saved state?
-
Kevin Modzelewski authored
-
Kevin Modzelewski authored
Once the GRWL is added, will also be for GC safepoints.
-
- 10 Jun, 2014 4 commits
-
-
Kevin Modzelewski authored
We always want to crawl the entire stack, and it's possible to determine the extents of the stack, so just do a scan over the entire memory range. Also, change the way the interpreter keeps track of its roots; we don't really need to associate the roots with a specific interpreter frame. This should hopefully clear up the weirdness about libunwind trying to unwind through the pthreads assembly code, and potentially also make stack crawling faster.
-
Kevin Modzelewski authored
I think threading now "works" ie doesn't crash Pyston, though we don't release the GIL until the thread exits.
-
Kevin Modzelewski authored
-
Kevin Modzelewski authored
-
- 09 Jun, 2014 6 commits
-
-
Kevin Modzelewski authored
Don't worry, a non-gil version is coming. Use a gil for now so we can start working on the other threading requirements, such as GC.
-
Kevin Modzelewski authored
-
Kevin Modzelewski authored
Well, the thread-creating should work, but nothing is threadsafe.
-
Kevin Modzelewski authored
Remove read from freed memory
-
Kevin Modzelewski authored
-
Kevin Modzelewski authored
Update chr to be compatible with CPython
-
- 08 Jun, 2014 1 commit
-
-
Krzysztof Klinikowski authored
-
- 07 Jun, 2014 10 commits
-
-
Kevin Modzelewski authored
Not all exposed to python code yet This commit is pretty large because it contains two separate but interrelated changes: - Rewrite the function argument handling code (callCompiledFunction and resolveCLFunc) into a single callFunc that does its own rewriting, and support the new features. -- this required a change of data representations, so instead of having each function consist of variants with unrelated signatures, we can only have a single signature, but multiple type specializations of that signature - To do that, had to rewrite all of the stdlib functions that used signature-variation (ex range1 + range2 + range3) to be a single function that took default arguments, and then took action appropriately.
-
Kevin Modzelewski authored
Can emit add/sub/etc instructions with 32-bit operands in addition to 8-bit now Can get the RSP and RBP in rewriter1, for accessing scratch space Change some debugging output Fix a gc bug: if an object gets new'd, and takes a parameter that gets new'd, the sequence is 1) object space gets allocated 2) parameter space gets allocated 3) parameter gets constructed 4) object gets constructed The bug is that the object construction is what initializes the GC header, so if step #3 causes a collection, it can see that the allocation from step #1 has an invalid header. As a workaround, always zero out the header in allocation, and skip blocks with zeroed headers. The real solution is probably to have the GC manage the header itself rather than expecting the user to; this would mean that gc_alloc would take the allocation kind, put that into the header, and then return a pointer to the post-header data section of the allocation.
-
Kevin Modzelewski authored
Experimenting with vim+make integration Changed installation instructions slightly Added rlwrap support for repl
-
Kevin Modzelewski authored
-
Marius Wachtler authored
Currently many places in the codebase create AST_Jump objects but do not initalice the line ,col numbers. Set them to -1, in order to not call SetCurrentDebugLocation with random locations.
-
Marius Wachtler authored
-
Kevin Modzelewski authored
Fix crash: register string iterator class
-
Kevin Modzelewski authored
Tried up to 210379 with the same segfaults.
-
Kevin Modzelewski authored
-
Kevin Modzelewski authored
-